[1] Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets:deep networks for video classification[C]//Computer Vision & Pattern Recognition. IEEE, 2015:4694-4702.
[2] Lin D, Lu C, Liao R, et al. Learning important spatial pooling regions for scene classification[C]//Computer Vision & Pattern Recognition. IEEE, 2014:3726-3733.
[3] Smolyanskiy N, Kamenev A, Birchfield S. On the importance of stereo for accurate depth estimation:an efficient semi-supervised deep neural network approach[C]//Computer Vision & Pattern Recognition Workshops. IEEE, 2018:1007-1015.
[4] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. IEEE, 2012:1097-1105.
[5] Ren S, He K, Girshick R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems. IEEE, 2015:91-99.
[6] Chen L C, Papandreou G, Kokkinos I, et al. Deeplab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2018, 40(4):834-848.
[7] Bourdev L, Malik J. Poselets:body part detectors trained using 3d human pose annotations[C]//International Conference on Computer Vision. IEEE, 2009:1365-1372.
[8] Bo Y, Fowlkes C C. Shape-based pedestrian parsing[C]//Computer Vision & Pattern Recognition. IEEE, 2011:2265-2272.
[9] Yamaguchi K, Kiapour M H, Ortiz L E, et al. Parsing clothing in fashion photographs[C]//Proceedings of the 2010 IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2012:3570-3577.
[10] Rauschert I, Collins R T. A generative model for simultaneous estimation of human body shape and pixel-level segmentation[C]//European Conference on Computer Vision. IEEE, 2012:704-717.
[11] Dong J, Chen Q, Xia W, et al. A deformable mixture parsing model with parselets[C]//Computer Vision & Pattern Recognition. IEEE, 2013:3408-3415.
[12] Liang X, Gong K, Shen X, et al. Look into person:joint body parsing & pose estimation network and a new benchmark[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2018, 41(4):871-885.
[13] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 39(4):640-651.
[14] Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[C]//arXiv preprint, arXiv, 2017:1706.05587.
[15] Chen L C, Yang Y, Wang J, et al. Attention to scale:scale-aware semantic image segmentation[C]//Computer Vision & Pattern Recognition. IEEE, 2016:3640-3649.
[16] Nie X, Feng J, Yan S. Mutual learning to adapt for joint human parsing and pose estimation[C]//European Conference on Computer Vision. IEEE, 2018:502-517.
[17] Luo Y, Zheng Z, Zheng L, et al. Macro-micro adversarial network for human parsing[C]//European Conference on Computer Vision. IEEE, 2018:418-434.
[18] Zhao J, Li J, Nie X, et al. Self-supervised neural aggregation networks for human parsing[C]//Computer Vision & Pattern Recognition Workshops. IEEE, 2017:7-15.
[19] Liu T, Ruan T, Huang Z, et al. Devil in the details:towards accurate single and multiple human parsing[C]//arXiv preprint, arXiv, 2018:1809.05996.
[20] Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//IEEE Conference on Computer Vision & Pattern Recognition. IEEE, 2017:2881-2890. |