欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2025, Vol. 42 ›› Issue (2): 268-275.DOI: 10.7523/j.ucas.2023.071

• 电子信息与计算机科学 • 上一篇    

带有谱解耦正则的交叉熵损失的解

扈崟汉, 郭田德, 韩丛英   

  1. 中国科学院大学数学科学学院, 北京 100049
  • 收稿日期:2022-12-29 修回日期:2023-09-01 发布日期:2023-09-01
  • 通讯作者: 韩丛英,E-mail:hancy@ucas.ac.cn
  • 基金资助:
    国家自然科学基金重点项目(11991022,U19B2040)和中央高校基本科研业务费专项资助

Solutions of cross-entropy loss with spectral decoupling regularization

HU Yinhan, GUO Tiande, HAN Congying   

  1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-12-29 Revised:2023-09-01 Published:2023-09-01

摘要: 研究在过参数化线性模型下,不同强度的谱解耦正则对模型的影响。在没有权重衰减的情况下,证明不同强度的谱解耦正则得到的模型是等价的。在存在较小权重衰减时,用目标函数的二阶泰勒展开得到一个近似解,分析该近似解发现减小谱解耦正则有增强权重衰减的作用,并且在二分类问题中直接等价于增大权重衰减的系数。最后,通过实验验证该结论。

关键词: 交叉熵损失, 谱解耦正则, 权重衰减, 梯度饥饿, 神经网络

Abstract: In this paper, we study the effect of spectral decoupling with different strengths on over-parameterized models. In the absence of weight decay, we show that the models obtained by spectral decoupling of different strengths are equivalent. When there is a small weight decay, we use the second-order Taylor expansion of the objective function to obtain an approximate solution. Analyzing the approximate solution, we find that reducing the spectral decoupling has the effect of enhancing the weight decay, which is directly equivalent in the binary classification problem. Finally, we verify our analytical conclusions through experiments.

Key words: cross-entropy loss, spectral decoupling regularization, weight decay, gradient starvation, neural network

中图分类号: