SA-YOLO: self-adaptive loss object detection method under imbalance samples

doi:10.7523/j.ucas.2023.013

Abstract

Abstract: The phenomenon of sample imbalance refers to the excessive number of background easy samples in the dataset but too few foreground hard samples, which means the sample suffers from inter-class imbalance and hard-easy imbalance. Most of the existing object detection methods are two-stage detectors based on proposed regions or one-stage detectors based on regression. When applied to imbalanced samples, it is impossible to avoid the over-dependence of the prediction bounding box generated in training on a large number of negative samples, which leads to overfitting of the model and low detection accuracy, poor accuracy and generalization. In order to achieve efficient and accurate object detection under imbalanced samples, a new SA-YOLO self-adaptive loss object detection method is proposed in the paper. 1) To address the sample imbalance problem, we propose the SA-Focal Loss function, which adjusts the loss adaptively for different datasets and training stages to balance inter-class samples and hard-easy samples. 2) In this paper, we construct the CSPDarknet53-SP network architecture based on the multi-scale feature prediction mechanism, which enhances the extraction ability of global features of difficult small target samples and improves the detection accuracy of difficult samples. To verify the performance of the SA-YOLO method, extensive simulation experiments are conducted on the sample imbalance dataset and the COCO dataset respectively. The results show that compared with the optimal metrics of YOLO series method, SA-YOLO reaches 91.46% of mAP in the imbalance dataset, which improves 10.87%, and the enhancement of AP₅₀ for all kinds of objects is more than 2%, with excellent specialization; mAP₅₀ in the COCO dataset is upgraded by 1.58%, and all indexes are not lower than the optimal value, with good effectiveness.

Key words: imbalanced sample, self-adaptive loss, SA-YOLO algorithm, SA-Focal Loss function, CSPDarknet53-SP network architecture

CLC Number:

TP391

SU Yapeng, CHEN Gaoshu, ZHAO Tong. SA-YOLO: self-adaptive loss object detection method under imbalance samples[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(3): 411-426.

References

[1] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. June 23-28, 2014, Columbus, OH, USA. IEEE, 2014: 580-587. DOI:10.1109/CVPR.2014.81.
[2] Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision(ICCV). December 7-13, 2015, Santiago, Chile. IEEE, 2016: 1440-1448. DOI:10.1109/ICCV.2015.169.
[3] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI:10.1109/TPAMI.2016.2577031.
[4] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 779-788. DOI:10.1109/CVPR.2016.91.
[5] Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. 2018, arXiv: 1804.02767. (2018-04-08)[2022-12-15].https://arxiv.org/abs/1804.02767.
[6] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. 2020, arXiv: 2004.10934. (2020-04-23)[2022-12-15].https://arxiv.org/abs/2004.10934.
[7] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. 2022, arXiv: 2207.02696. (2022-07-06)[2022-12-15]. https://arxiv.org/abs/2207.02696.
[8] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 6517-6525. DOI:10.1109/CVPR.2017.690.
[9] Li B, Yao Y Q, Tan J R, et al. Equalized focal loss for dense long-tailed object detection [EB/OL]. 2022, arXiv: 2201.02593. (2022-06-30)[2022-12-15]. https://arxiv.org/abs/2201.02593.
[10] Zhu X G, Li L, Zhang W G, et al. Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. August 19-25, 2017, Melbourne, Australia. New York: ACM, 2017: 3595-3601. DOI:10.5555/3172077.3172392.
[11] 周志华. 机器学习[M]. 北京：清华大学出版社, 2016.
[12] Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 761-769. DOI:10.1109/CVPR.2016.89.
[13] Pang J M, Chen K, Shi J P, et al. Libra R-CNN: towards balanced learning for object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). June 15-20, 2019, Long Beach, CA, USA. IEEE, 2020: 821-830. DOI:10.1109/CVPR.2019.00091.
[14] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2):318-327. DOI:10.1109/TPAMI.2018.2858826.
[15] 张胜男, 许燕斌, 董峰. 自适应阈值收缩算子的稀疏正则化图像重建算法[J]. 中国科学院大学学报, 2020, 37(2): 242-247. DOI:10.7523/j.issn.2095-6134.2020. 02.014.
[16] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1. December 3-6, 2012, Lake Tahoe, Nevada. New York: ACM, 2012: 1097-1105. DOI: 10.5555/2999134.2999257.
[17] Uijlings J R, van de Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2): 154-171. DOI:10.1007/s11263-013-0620-5.
[18] 矫腾章, 胡玉新, 吕鹏, 等. 一种在轨海上多运动舰船目标检测和跟踪方法[J]. 中国科学院大学学报, 2020, 37(3): 368-378. DOI:10.7523/j.issn.2095-6134.2020. 03.010.
[19] Tang J, Hou H J, Sheng G H, et al. Transformer fault diagnosis model with unbalanced samples based on SMOTE algorithm and focal loss[C]//2021 4th International Conference on Energy, Electrical and Power Engineering(CEEPE). April 23-25, 2021, Chongqing, China. IEEE, 2021: 693-697. DOI:10.1109/CEEPE51765.2021.9475 723.
[20] 肖振久, 孔祥旭, 宗佳旭, 等. 自适应聚焦损失的图像目标检测算法[J]. 计算机工程与应用, 2021, 57(23): 185-192. DOI:10.3778/j.issn.1002-8331.2104-0321.
[21] 傅博文, 唐向宏, 肖涛. Focal损失在图像情感分析上的应用研究[J]. 计算机工程与应用, 2020, 56(10): 179-184. DOI:10.3778/j.issn.1002-8331.2003-0028.
[22] 孟曦婷, 计璐艳, 赵永超, 等. 基于深度学习的多尺度导弹发射井目标检测[J]. 中国科学院大学学报, 2021, 38(6): 800-808. DOI:10.7523/j.issn.2095-6134.2021. 06.010.
[23] 李彬, 汪诚, 丁相玉, 等. 改进YOLOv4的表面缺陷检测算法[J/OL]. 北京航空航天大学学报, 2023,49(3):710-717. DOI:10.13700/j.bh.1001-5965.2021.0301.
[24] 王俊岭, 邓玉莲, 李英, 等. 排水管道检测与缺陷识别技术综述[J].科学技术与工程, 2020, 20(33): 13520-13528. DOI:10.3969/j.issn.1671-1815.2020.33.002.
[25] 王庆, 姚俊, 谭文禄, 等. 基于Faster R-CNN的排水管道缺陷检测研究[J]. 软件导刊, 2019, 18(10): 40-44, 49. DOI:10.11907/rjdk.191817.
[26] 中华人民共和国住房和城乡建设部. 城镇排水管道检测与评估技术规程: CJJ 181—2012[S]. 北京: 中国建筑工业出版社, 2012.