[1] Krizhevsky A, Sutskever I, Hinton G E.ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. DOI: 10.1145/3065386. [2] Hendrycks D, Mu N, Cubuk E D, et al. AugMix: A simple data processing method to improve robustness and uncertainty[EB/OL].2019: arXiv: 1912.02781.(2020-02-17)[2023-07-15]. https://arxiv.org/abs/1912.02781. [3] Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks[EB/OL].2017: arXiv: 1706.06083.(2019-09-04)[2023-07-15]. https://arxiv.org/abs/1706.06083. [4] Jacot A, Gabriel F, Hongler C.Neural tangent kernel: Convergence and generalization in neural networks[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. December 3-8, 2018, Montréal, Canada. New York: ACM, 2018: 8580-8589. DOI: 10.5555/3327757.3327948. [5] Daniel S, Elad H, Shpigel N M, et al.The implicit bias of gradient descent on separable data[J]. Journal of Machine Learning Research, 2018, 19(1): 2822-2878. DOI: 10.5555/3291125.3309632. [6] Pezeshki M, Kaba S O, Bengio Y, et al. Gradient starvation: a learning proclivity in neural networks[EB/OL].2020: arXiv: 2011.09468.(2021-11-24)[2023-07-15]. https://arxiv.org/abs/2011.09468. [7] Lewkowycz A, Gur-Ari G.On the training dynamics of deep networks with L2 regularization[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. December 6-12, 2020, Vancouver, BC, Canada. New York: ACM, 2020: 4790-4799. DOI: 10.5555/3495724.3496126. [8] Geirhos R, Jacobsen J H, Michaelis C, et al.Shortcut learning in deep neural networks[J]. Nature Machine Intelligence, 2020, 2(11): 665-673. DOI: 10.1038/s42256-020-00257-z. [9] Shah H, Tamuly K, Raghunathan A, et al. The pitfalls of simplicity bias in neural networks[EB/OL].2020: arXiv: 2006.07710.(2020-10-28)[2023-07-15]. https://arxiv.org/abs/2006.07710. [10] Xu Z Q J, Zhang Y Y, Luo T, et al. Frequency principle: Fourier analysis sheds light on deep neural networks[EB/OL].2019: arXiv: 1901.06523.(2019-09-20)[2023-07-15]. https://arxiv.org/abs/1901.06523. [11] Bartlett P L, Long P M, Lugosi G, et al.Benign overfitting in linear regression[J]. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117(48): 30063-30070. DOI: 10.1073/pnas.1907378117. [12] Hastie T, Montanari A, Rosset S, et al.Surprises in high-dimensional ridgeless least squares interpolation[J]. Annals of Statistics, 2022, 50(2): 949-986. DOI: 10.1214/21-aos2133. [13] Shamir O. The implicit bias of benign overfitting[EB/OL].2022: arXiv: 2201.11489.(2022-05-29)[2023-07-15]. https://arxiv.org/abs/2201.11489. [14] Hsu D, Muthukumar V, Xu J.On the proliferation of support vectors in high dimensions[J]. Journal of Statistical Mechanics Theory and Experiment, 2022, 2022(11): 114011. DOI: 10.1088/1742-5468/ac98a9. [15] Vidya M, Adhyyan N, Vignesh S, et al.Classification vs regression in overparameterized regimes: Does the loss function matter?[J]. Journal of Machine Learning Research, 2021, 22(1):10104-10172. DOI: 10.5555/3546258.3546480. |