Hierarchical Gaussian loss based object detection in open world

doi:10.7523/j.issn.2095-6134.2021.04.014

Abstract

Abstract: Object detection targets both object localization and categorization. Most of existing object detection methods are category-dependent, and can not deal with the detection task in open world with unknown categories. In aware that the difficulties of localization and categorization are different during transfer from known categories to unknown ones, and the localization is more universal than categorization, and inspired by the process of human categorization on unknown objects, we propose a Gaussian hierarchical loss model which applies an hierarchical modeling of object categories in object detection. KL divergence is used to describe the hierarchical relationship among categories while learning all the class multidimensional Gaussian distributions and enhance the transfer ability from the known classes to the unknown classes. Therefore the proposed method can extend the existing object detection methods to unknown categories in open world. Experimental results show that the proposed method can improve the detection ability of unknown categories without losing the performance on known categories.

Key words: object detection, open world, hierarchical categories relationship, Gaussian hierarchical loss

CLC Number:

TP311

WANG Lin, CHEN Xilin. Hierarchical Gaussian loss based object detection in open world[J]. Journal of University of Chinese Academy of Sciences, 2021, 38(4): 538-548.

References

[1] Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision:Santiago, Chile:IEEE Press, 2015:1440-1448.
[2] Ren S Q, He K M, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[3] Redmon J, Divvala S, Girshick R, et al. You only look once:unified, real-time object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition:Las Vegas, USA:IEEE Press, 2016:779-788.
[4] Liu W, Anguelov D, Erhan D, et al. SSD:single shot multibox detector[C]//European Conference on Computer Vision:Amsterdam, The Netherlands:Springer International Publishing, 2016:21-37.
[5] Everingham M, Gool L V, Williams C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2):303-338.
[6] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:common objects in context[C]//European Conference on Computer Vision:Zurich, Swiss:Springer International Publishing, 2014:740-755.
[7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition:Columbus, USA:IEEE Press, 2014:580-587.
[8] Uijlings J, Sande K, Gevers T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2):154-171.
[9] Zitnick C, Doll'ar P. Edge boxes:locating object proposals from edges[C]//European Conference on Computer Vision:Zurich, Swiss:Springer International Publishing, 2014:391-405.
[10] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[11] Dai J F, Li Y, He K M, et al. R-FCN:object detection via region-based fully convolutional networks[C]//Neural Information Processing Systems:Barcelona, Spain:2016:379-387.
[12] Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//IEEE Conference on Computer Vision and Pattern Recognition:Honolulu, USA:IEEE Press, 2017:6517-6525.
[13] 闫凯, 沈汀, 陈正超,等. 基于深度学习的SSD模型尾矿库自动提取[J]. 中国科学院大学学报, 2020, 37(3):360-367.
[14] Lin T Y, Goyal P, Girshick R, et al. focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2):318-327.
[15] Zhu C C, He Y H, Savvides M. Feature selective anchor-free module for single-shot object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition:Long Beach, USA:IEEE Press, 2019:840-849.
[16] Tian Z, Shen C H, Chen H, et al. FCOS:fully convolutional one-stage object detection[C]//IEEE International Conference on Computer Vision:Seoul, Korea:IEEE Press, 2019:9627-9636.
[17] Kong T, Sun F C, Liu H P, et al. FoveaBox:beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29:7389-7398.
[18] Pang J M, Chen K, Shi J P, et al. Libra R-CNN:towards balanced learning for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition:Long Beach, USA:IEEE Press, 2019:821-830.
[19] He Y H, Zhu C C, Wang J R, et al. Bounding box regression with uncertainty for accurate object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition:Long Beach, USA:IEEE Press, 2019:2888-2897.
[20] Guo J Y, Han K, Wang Y H, et al. Hit-Detector:hierarchical trinity architecture search for object detection[C]//IEEE Conference on Computer Vision and Pattern Recognition:Seattle, USA:IEEE Press, 2020:11405-11414.
[21] Srivastava N, Salakhutdinov R. Discriminative transfer learning with tree-based priors[C]//Neural Information Processing Systems:Harrahs and Harveys, Lake Tahoe, USA:2013:2094-2102.
[22] Yan Z C, Zhang H, Piramuthu R, et al. HD-CNN:hierarchical deep convolutional neural networks for large scale visual recognition[C]//IEEE International Conference on Computer Vision. Santiago, Chile:IEEE Press, 2015:2740-2748.
[23] Ristin M, Gall J, Guillaumin M, et al. From categories to subcategories:large-scale image classification with partial class label refinement[C]//The IEEE Conference on Computer Vision and Pattern Recognition:Boston, USA:IEEE Press, 2015:231-239.
[24] Deng J, Ding N, Jia Y P, et al. Large-scale object classification using label relation graphs[C]//European Conference on Computer Vision:Zurich, Swiss:Springer International Publishing, 2014:48-64.
[25] Ding N, Deng J, Murphy K, et al. Probabilistic label relation graphs with Ising models[C]//IEEE International Conference on Computer Vision:Santiago, Chile:IEEE Press, 2015:1161-1169.
[26] Chen T S, Wu W X, Gao Y F, et al. Fine-Grained representation learning and recognition by exploiting hierarchical semantic embedding[C]//The 26th ACM International Conference on Multimedia. New York, USA:ACM Press, 2018:2023-2031.
[27] Sfar A, Boujemaa N, Geman D. Confidence sets for fine-grained categorization and plant species identification[J]. International Journal of Computer Vision, 2015, 111(3):255-275.
[28] Lee K, Lee K, Min K, et al. Hierarchical novelty detection for visual object recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE Press, 2018:1034-1042.
[29] Wu C X, Lenz I, Saxena A. Hierarchical semantic labeling for task-relevant RGB-D perception[C]//Robotics:Science and Systems:Berkeley, USA:2014:1-9.
[30] Zhang H W, Zha Z J, Yang Y, et al. Attribute-Augmented semantic hierarchy[C]//ACM Transactions on Multimedia Computing, Communications and Applications, 2014, 11(1s):1-21.
[31] Wang J, Yan F, Aker A, et al. A poodle or a dog? Evaluating automatic image annotation using human descriptions at different levels of granularity[C]//Third Workshop on Vision and Language:Dublin, Ireland:Dublin City University and the Association for Computational Linguistics, 2014:38-45.
[32] Li A X, Luo T G, Lu Z W, et al. Large-Scale Few-Shot learning:knowledge transfer with class hierarchy[C]//IEEE Conference on Computer Vision and Pattern Recognition:Long Beach, USA:IEEE Press, 2019:7212-7220.
[33] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision,2015, 115(3):211-252.
[34] Miller G A, Beckwith R, Fellbaum C, et al. Introduction to WordNet:an on-line lexical database[J]. International Journal of Lexicography, 1990, 3(4):235-244.
[35] Wan W T, Zhong Y Y, Li T P, et al. Rethinking feature distribution for loss functions in image classification[C]//IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE Press, 2018:9117-9126.