P2P-Loc:点到点微小人物定位

doi:10.7523/j.ucas.2023.072

摘要/Abstract

摘要： 边界框是视觉目标定位任务中最常用的标注方法。然而，由于边界框标注对大量精确标注的边界框的依赖，导致其在一些实际场景中难以应用。针对此问题，提出一种新的基于点标注的框架用来定位人体目标，将每个人标注为一个粗略点(CoarsePoint)而不是精确的边界框从而简化标注流程，该点可以是目标范围内的任何点。尽管这极大简化了数据标注的流程和代价，但CoarsePoint标注不可避免地降低了标签的可靠性，并在训练过程中造成网络混乱。因此，提出一种点自优化方法，以自我调整的方式迭代更新点标注。实验结果表明，所提方法有效减轻了标签的不确定性并逐步提高了定位性能，实现目标定位性能的同时可节省高达80%的标注成本。

关键词: 微小人体目标定位, 点监督, 检测精度, 点到点, 边界框标注

Abstract: Bounding-box annotation form has been the most frequently used method for visual object localization tasks. However, bounding-box annotation relies on a large amount of precisely annotating bounding boxes, and it is expensive and laborious. It is impossible to be employed in practical scenarios and even redundant for some applications (such as tiny person localization) that the size would not matter. Therefore, we propose a novel point-based framework for the person localization task by annotating each person as a coarse point (CoarsePoint) instead of an accurate bounding box that can be any point within the object extent. Then, the network predicts the person’s location as a 2D coordinate in the image. Although this greatly simplifies the data annotation pipeline, the CoarsePoint annotation inevitably decreases label reliability (label uncertainty) and causes network confusion during training. As a result, we propose a point self-refinement approach that iteratively updates point annotations in a self-paced way. The proposed refinement system alleviates the label uncertainty and progressively improves localization performance. Experimental results show that our approach has achieved comparable object localization performance while saving up to 80% of annotation cost.

Key words: tiny person localization, point-based supervision, detection accuracy, point to point, bounding-box annotation

中图分类号:

TP391.4

杨溢, 余学辉, 王岿然, 余文文, 王子鹏, 邹佳凌, 韩振军, 焦建彬. P2P-Loc:点到点微小人物定位[J]. 中国科学院大学学报, 2025, 42(4): 554-564.

YANG Yi, YU Xuehui, WANG Kuiran, YU Wenwen, WANG Zipeng, ZOU Jialing, HAN Zhenjun, JIAO Jianbin. P2P-Loc: point to point tiny person location[J]. Journal of University of Chinese Academy of Sciences, 2025, 42(4): 554-564.

参考文献

[1] Enzweiler M, Gavrila D M. Monocular pedestrian detection: survey and experiments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(12): 2179-2195.DOI: 10.1109/TPAMI.2008.260.
[2] Dollár P, Wojek C, Schiele B, et al. Pedestrian detection: an evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743-761. DOI: 10.1109/TPAMI.2011.155.
[3] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. June 16-21, 2012, Providence, RI, USA. IEEE, 2012: 3354-3361. DOI: 10.1109/CVPR.2012.6248074.
[4] Zhang S S, Benenson R, Schiele B. CityPersons: a diverse dataset for pedestrian detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 4457-4465. DOI: 10.1109/CVPR.2017.474.
[5] Mao J Y, Xiao T T, Jiang Y N, et al. What can help pedestrian detection?[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 6034-6043. DOI: 10.1109/CVPR.2017.639.
[6] Havyarimana V, Xiao Z, Sibomana A, et al. A fusion framework based on sparse Gaussian-wigner prediction for vehicle localization using GDOP of GPS satellites[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(2): 680-689.DOI: 10.1109/TITS.2019.2891585.
[7] Yin H, Wang Y, Ding X Q, et al. 3D LiDAR-based global localization using Siamese neural network[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(4): 1380-1392.DOI: 10.1109/TITS.2019.2905046.
[8] Choi S, Kim J H. Leveraging localization accuracy with off-centered GPS[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(6): 2277-2286.DOI: 10.1109/TITS.2019.2915108.
[9] Akilan T, Jonathan Wu Q M. sEnDec: an improved image to image CNN for foreground localization[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4435-4443.DOI: 10.1109/TITS.2019.2940547.
[10] Zhang S S, Benenson R, Omran M, et al. Towards reaching human performance in pedestrian detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 973-986.DOI: 10.1109/TPAMI.2017.2700460.
[11] Yu X H, Gong Y Q, Jiang N, et al. Scale match for tiny person detection[C]//2020 IEEE Winter Conference on Applications of Computer Vision (WACV). March 1-5, 2020, Snowmass, CO, USA. IEEE, 2020: 1246-1254. DOI: 10.1109/WACV45572.2020.9093394.
[12] Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23, 2018, Salt Lake City, UT, USA. IEEE, 2018: 3974-3983. DOI: 10.1109/CVPR.2018.00418.
[13] Han B, Wang Y H, Yang Z, et al. Small-scale pedestrian detection based on deep neural network[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(7): 3046-3055.DOI: 10.1109/TITS.2019.2923752.
[14] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
[15] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 936-944. DOI: 10.1109/CVPR.2017.106.
[16] Zhou X Y, Wang D Q, Krähenbühl P. Objects as points[EB/OL]. 2019. arXiv: 1904.07850. (2019-04-16)[2023-02-22].https://arxiv.org/abs/1904.07850.
[17] Ye T, Zhang X, Zhang Y, et al. Railway traffic object detection using differential feature fusion convolution neural network[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(3): 1375-1387.DOI: 10.1109/TITS.2020.2969993.
[18] Hassaballah M, Kenk M A, Muhammad K, et al. Vehicle detection and tracking in adverse weather using a deep learning framework[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(7): 4230-4242.DOI: 10.1109/TITS.2020.3014013.
[19] Yang P Y, Zhang G F, Wang L, et al. A part-aware multi-scale fully convolutional network for pedestrian detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(2): 1125-1137. DOI: 10.1109/TITS.2019.2963700.
[20] Camara F, Bellotto N, Cosar S, et al. Pedestrian models for autonomous driving part I: low-level models, from sensing to tracking[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(10): 6131-6151. DOI: 10.1109/TITS.2020.3006768.
[21] Camara F, Bellotto N, Cosar S, et al. Pedestrian models for autonomous driving part II: high-level models of human behavior[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(9): 5453-5472. DOI: 10.1109/TITS.2020.3006767.
[22] Baek J, Hyun J, Kim E. A pedestrian detection system accelerated by kernelized proposals[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(3): 1216-1228. DOI: 10.1109/TITS.2019.2904836.
[23] Bilen H, Pedersoli M, Tuytelaars T. Weakly supervised object detection with convex clustering[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 7-12, 2015, Boston, MA, USA. IEEE, 2015: 1081-1089. DOI: 10.1109/CVPR.2015.7298711.
[24] Bilen H, Vedaldi A. Weakly supervised deep detection networks[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 2846-2854. DOI: 10.1109/CVPR.2016.311.
[25] Song H O, Girshick R, Jegelka S, et al. On learning to localize objects with minimal supervision[C]//International Conference on Machine Learning. PMLR, 2014: 1611-1619. https://proceedings.mlr.press/v32/songb14.html.
[26] Siva P, Xiang T. Weakly supervised object detector learning with model drift detection[C]//2011 International Conference on Computer Vision. November 6-13, 2011, Barcelona, Spain. IEEE, 2012: 343-350. DOI: 10.1109/ICCV.2011.6126261.
[27] Wang C, Huang K Q, Ren W Q, et al. Large-scale weakly supervised object localization via latent category learning[J]. IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society, 2015, 24(4): 1371-1385. DOI: 10.1109/TIP.2015.2396361.
[28] Zhou B L, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 2921-2929. DOI: 10.1109/CVPR.2016.319.
[29] Deselaers T, Alexe B, Ferrari V. Weakly supervised localization and learning with generic knowledge[J]. International Journal of Computer Vision, 2012, 100(3): 275-293. DOI: 10.1007/s11263-012-0538-3.
[30] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context[C]//European Conference on Computer Vision. Cham: Springer, 2014: 740-755.10.1007/978-3-319-10602-1_48.
[31] Ionescu C, Papava D, Olaru V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1325-1339. DOI: 10.1109/TPAMI.2013.248.
[32] Papadopoulos D P, Uijlings J R R, Keller F, et al. Training object class detectors with click supervision[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017: 180-189. DOI: 10.1109/CVPR.2017.27.
[33] Ribera J, Güera D, Chen Y H, et al. Locating objects without bounding boxes[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20, 2019, Long Beach, CA, USA. IEEE, 2020: 6472-6482. DOI: 10.1109/CVPR.2019.00664.
[34] Choe J, Oh S J, Lee S, et al. Evaluating weakly supervised object localization methods right[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020: 3130-3139. DOI: 10.1109/CVPR42600.2020.00320.
[35] Zhu P F, Wen L Y, Bian X, et al. Vision meets drones: a challenge[EB/OL]. 2018. arXiv: 1804.07437. (2018-04-28)[2023-02-22].https://arxiv.org/abs/1804.07437.
[36] Yang Z, Liu S H, Hu H, et al. RepPoints: point set representation for object detection[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019, Seoul, Korea (South). IEEE, 2020: 9656-9665. DOI: 10.1109/ICCV.2019.00975.
[37] Sun P Z, Zhang R F, Jiang Y, et al. Sparse R-CNN: end-to-end object detection with learnable proposals[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021: 14449-14458. DOI: 10.1109/CVPR46437.2021.01422.