[1] Barron J L, Fleet D J, Beauchemin S S. Performance of optical flow techniques[J]. International Journal of Computer Vision, 1994, 12(1): 43-77. DOI: 10.1007/BF01420984. [2] 刘鑫, 刘辉, 强振平, 等. 混合高斯模型和帧间差分相融合的自适应背景模型[J]. 中国图象图形学报, 2008, 13(4): 729-734. DOI: 10.11834/jig.20080422. [3] Moeslund T B, Granum E. A survey of computer vision-based human motion capture[J]. Computer Vision and Image Understanding, 2001, 81(3): 231-268. DOI: 10.1006/cviu.2000.0897. [4] Barnich O, Droogenbroeck M V. ViBE: a powerful random technique to estimate the background in video sequences[C]//2009 IEEE International Conference on Acoustics, Speech and Signal Processing. April 19-24, 2009. Taipei, China. IEEE, 2009. DOI: 10.1109/ICASSP.2009.4959741. [5] 袁益琴, 何国金, 王桂周, 等. 背景差分与帧间差分相融合的遥感卫星视频运动车辆检测方法[J]. 中国科学院大学学报, 2018, 35(1): 50-58. DOI: 10.7523/j.issn.2095-6134.2018.01.007. [6] 黄萍萍, 王峰, 向俞明, 等. 基于V-CSK视频遥感卫星运动目标检测跟踪方法[J]. 中国科学院大学学报, 2021, 38(3): 392-401. DOI: 10.7523/j.issn.2095-6134.2021.03.013. [7] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031. [8] He K M, Gkioxari G, Dollar P, et al. Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision (ICCV). October 22-29, 2017. Venice. IEEE, 2017: 2961-2969. DOI: 10.1109/iccv.2017.322. [9] 王凤随, 王启胜, 陈金刚, 等. 基于注意力机制和Soft-NMS的改进Faster R-CNN目标检测算法[J]. 激光与光电子学进展, 2021, 58(24): 405-416. DOI: 10.3788/LOP202158.2420001. [10] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016. Las Vegas, NV, USA. IEEE, 2016: 779-788. DOI: 10.1109/cvpr.2016.91. [11] Redmon J, Farhadi A. YOLOv3: an incremental improvement[EB/OL]. arXiv: 1804.02767. (2018-04-08) [2022-10-07]. https://arxiv.org/abs/1804.02767. [12] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. arXiv: 2004.10934. (2020-04-23)[2022-10-07]. https://arxiv.org/abs/2004.10934. [13] 刘峰, 郭猛, 王向军. 基于跨尺度融合的卷积神经网络小目标检测[J]. 激光与光电子学进展, 2021, 58(6): 213-221. DOI: 10.3788/LOP202158.0610012. [14] Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detector[M]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2. [15] Fu C Y, Liu W, Ranga A, et al. DSSD: Deconvolutional single shot detector[EB/OL]. arXiv: 1701.06659. (2017-01-23)[2022-10-07]. https:arxiv.org/abs/1701.06659. [16] 耿鹏志, 杨智雄, 张家钧, 等. 基于SSD的行人鞋子检测算法[J]. 激光与光电子学进展, 2021, 58(6): 184-191. DOI: 10.3788/LOP202158.0610009. [17] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 936-944. DOI: 10.1109/CVPR.2017.106. [18] Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 8759-8768. DOI: 10.1109/CVPR.2018.00913. [19] 汪亚妮, 汪西莉. 基于注意力和特征融合的遥感图像目标检测模型[J]. 激光与光电子学进展, 2021, 58(2): 363-371. DOI: 10.3788/LOP202158.0228003. [20] Zhu X K, Lyu S C, Wang X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). October 11-17, 2021, Montreal, BC, Canada. IEEE, 2021: 2778-2788. DOI: 10.1109/ICCVW54120.2021.00312. [21] Woo S, Park J, Lee J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 3-19. DOI: 10.1007/978-3-030-01234-2_1. [22] Hou Q B, Zhou D Q, Feng J S. Coordinate attention for efficient mobile network design[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021: 13708-13717. DOI: 10.1109/CVPR46437.2021.01350. [23] Liu Z, Lin Y T, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2022: 9992-10002. DOI: 10.1109/ICCV48922.2021.00986. [24] Long X, Deng K P, Wang G Z, et al. PP-YOLO: an effective and efficient implementation of object detector[EB/OL]. arXiv: 2007.12099. (2020-08-03)[2022-10-07]. https://arxiv.org/abs/2007.12099v3. [25] Wang C Y, Yeh I H, Liao H Y M. You only learn one representation: unified network for multiple tasks[EB/OL]. arXiv: 2105.04206. (2021-05-10)[2022-10-07]. https://arxiv.org/abs/2105.04206v1. [26] Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. arXiv: 2207.02696. (2022-07-06)[2022-10-07]. https://arxiv.org/abs/2207.02696. |