基于多专家和MDNet的视觉目标跟踪方法

doi:10.7523/j.ucas.2021.0002

摘要/Abstract

摘要： 近年来，随着深度学习技术的不断发展，基于深度学习的目标跟踪算法取得了较大成功。但由于视频中，背景、光照及目标的表观不断变化，且伴有遮挡的发生，给视频中的目标跟踪带来很大困难。传统方法主要通过在线更新跟踪器的方式解决这个问题。但是视频信息内容复杂多变，在线更新和维持一个跟踪器很难应对后续视频中复杂的数据，容易导致误差积累。为解决这个问题，基于已有跟踪器MDNet，提出一种基于多专家跟踪器的目标跟踪方法。首先通过MDNet学习所有视频中目标的共有特征，使其能够较好地描述目标。然后在跟踪过程中，根据跟踪结果动态地构建多个专家跟踪器，以增加跟踪器的鲁棒性。最后根据每个专家的评价函数选择最佳的专家跟踪器，用于跟踪当前帧中的目标。实验表明，与MDNet相比, 所提方法显著地提升了跟踪性能。

关键词: 视觉目标跟踪, 多专家, 多决策整合, MDNet

Abstract: In recent years, with the continuous development of deep learning technology, deep learning based visual object tracking algorithms have achieved great success. However, in the video, the background, illumination, and the appearance of the target are constantly changing, accompanied by the occurrence of occlusion. This brings great difficulties for visual object tracking. Most of the traditional methods tried to online update the tracker to adapt to the changes in the video. However, the content of the video is complex and changeable, and it is difficult to update and maintain one tracker online to deal with the complex data in the subsequent video, which can easily lead to the accumulation of errors. To solve this problem, based on the existing tracker MDNet, we propose a multi-expert tracker based tracing method. First, the common features of all targets in the video are learned through MDNet, so that the learned features can describe the target better. Then in the tracking process, multiple expert trackers are dynamically constructed according to the tracking results to increase the robustness of the trackers. Finally, the best expert tracker is selected according to the evaluation function of each expert and is used for tracking in the current frame. Experiments show that the proposed method achieves effective tracking results on 25 videos with abrupt changes. Compared with MDNet, the proposed method greatly improves the performance.

Key words: visual object tracking, multiple experts, multiple decisions fusion, MDNet

中图分类号:

TP181

张知明, 李国荣, 黄庆明. 基于多专家和MDNet的视觉目标跟踪方法[J]. 中国科学院大学学报, 2022, 39(6): 836-844.

ZHANG Zhiming, LI Guorong, HUANG Qingming. Visual object tracking based on multiple experts and MDNet[J]. Journal of University of Chinese Academy of Sciences, 2022, 39(6): 836-844.

参考文献

[1] Isard M, Blake A. Condensation： conditional density propagation for visual tracking[J]. International Journal of Computer Vision, 1998, 29(1): 5-28.
[2] Comaniciu D, Ramesh V, Meer P. Kernel-based object tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(5): 564-577.DOI:10.1109/TPAMI.2003.1195991.
[3] Ross D A, Lim J, Lin R S, et al. Incremental learning for robust visual tracking[J]. International Journal of Computer Vision, 2008, 77(1): 125-141.DOI:10.1007/s11263-007-0075-7.
[4] Mei X, Ling H B. Robust visual tracking using l1 minimization[C]//2009 IEEE 12th International Conference on Computer Vision. September 29-October 2, 2009, Kyoto, Japan. IEEE, 2009: 1436-1443.DOI:10.1109/ICCV.2009.5459292.
[5] Zhang T Z, Ghanem B, Liu S, et al. Robust visual tracking via multi-task sparse learning[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. June 16-21, 2012, Providence, RI,USA. IEEE, 2012: 2042-2049.DOI:10.1109/CVPR.2012.6247908.
[6] Jia X, Lu H C, Yang M H. Visual tracking via adaptive structural local sparse appearance model[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. June 16-21, 2012. Providence, RI,USA. IEEE, 2012: 1822-1829.DOI:10.1109/CVPR.2012.6247880.
[7] Collins R T, Liu Y X, Leordeanu M. Online selection of discriminative tracking features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1631-1643.DOI:10.1109/TPAMI.2005.205.
[8] Henriques J F, Caseiro R, Martins P, et al. Exploiting the circulant structure of tracking-by-detection with kernels[C]//Computer Vision-ECCV 2012. Cham: Springer International Publishing, 2012: 702-715.DOI:10.1007/978-3-642-33765-9_50.
[9] Danelljan M, Häger G, Khan F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//2015 IEEE International Conference on Computer Vision (ICCV). December 7-13, 2015, Santiago, Chile. IEEE, 2015: 4310-4318.DOI:10.1109/ICCV.2015.490.
[10] Du F, Liu P, Zhao W, et al. Correlation-guided attention for corner detection based visual tracking[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020: 6835-6844.DOI:10.1109/CVPR42600.2020.00687.
[11] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. June 23-28, 2014， Columbus, OH, USA. IEEE, 2014: 580-587.DOI:10.1109/CVPR.2014.81.
[12] Wang N Y, Yeung D Y. Learning a deep compact image representation for visual tracking [C]// Proceedings of the 26th International conference on Neural Information Processing Systems: Vol 1. New York: ACM, 2013: 809-817.
[13] Wang L J, Ouyang W L, Wang X G, et al. Visual tracking with fully convolutional networks[C]//2015 IEEE International Conference on Computer Vision (ICCV). December 7-13, 2015, Santiago, Chile. IEEE, 2015: 3119-3127.DOI:10.1109/ICCV.2015.357.
[14] Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016， Las Vegas, NV, USA. IEEE, 2016: 4293-4302.DOI:10.1109/CVPR.2016.465.
[15] Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. The Journal of Machine Learning Research, 2010, 11: 3371-3408.
[16] Nam H, Hong S, Han B. Online graph-based tracking[C]//Computer Vision-ECCV 2014. Cham: Springer International Publishing, 2014: 112-126.DOI:10.1007/978-3-319-10602-1_8.
[17] Wu Y, Lim J, Yang M H. Online object tracking: A benchmark[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. June 23-28, 2013, Portland, OR, USA, IEEE, 2013: 2411-2418.DOI:10.1109/CVPR.2013.312.
[18] Zhong W, Lu H C, Yang M H. Robust object tracking via sparsity-based collaborative model[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. June 16-21, 2012. Providence, RI,USA. IEEE, 2012: 1838-1845.DOI:10.1109/CVPR.2012.6247882.
[19] Hare S, Golodetz S, Saffari A, et al. Struck: structured output tracking with kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2096-2109.DOI:10.1109/TPAMI.2015.2509974.
[20] Kalal Z, Matas J, Mikolajczyk K. P-N learning: Bootstrapping binary classifiers by structural constraints[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. June 13-18, 2010, San Francisco, CA, USA. IEEE, 2010: 49-56.DOI:10.1109/CVPR.2010.5540231.
[21] Kwon J, Lee K M. Visual tracking decomposition[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. June 13-18, 2010, San Francisco, CA, USA. IEEE, 2010: 1269-1276.DOI:10.1109/CVPR.2010.5539821.
[22] Kwon J, Lee K M. Tracking by sampling trackers[C]//2011 International Conference on Computer Vision. November 6-13, 2011, Barcelona, Spain. IEEE, 2011: 1195-1202.DOI:10.1109/ICCV.2011.6126369.