[1] Gao J Y, Yang Z H, Nevatia R. RED: reinforced encoder-decoder networks for action anticipation[C]//Proceedings of the British Machine Vision Conference 2017 (BMVC). September 4-7, 2017, London, UK. British Machine Vision Association, 2017: 92.1-92.11. DOI:10.5244/c.31.92. [2] Xu M Z, Gao M F, Chen Y T, et al. Temporal recurrent networks for online action detection[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019, Seoul, Korea (South). IEEE, 2019: 5531-5540. DOI:10.1109/ICCV.2019.00563. [3] Eun H, Moon J, Park J, et al. Learning to discriminate information for online action detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020: 806-815. DOI:10.1109/CVPR42600.2020.00089. [4] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI:10.1162/neco.1997.9.8.1735. [5] Tran D, Nourdev L D, Fergus R, et al. C3D: generic features for video analysis[EB/OL]. arXiv: 1412.0767v1. (2014-12-02) [2021-05-10].https://doi.org/10.48550/arXiv.1412.0767. [6] Jiang G Y, Liu J, Zamir Roshan A, et al. THUMOS challenge: action recognition with a large number of classes[EB/OL]. (2014-08-20) [2021-05-16]. http://crcv.ucf.edu/THUMOS14. 2014. [7] de Geest R, Gavves E, Ghodrati A, et al. Online action detection[M]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016: 269-284. DOI:10.1007/978-3-319-46454-1_17. [8] De Geest R, Tuytelaars T. Modeling temporal structure with LSTM for online action detection[C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV). March 12-15, 2018, Lake Tahoe, NV, USA. IEEE, 2018:1549-1557. DOI:10.1109/WACV.2018.00173. [9] Cho K, van Merrienboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar. Stroudsburg, PA, USA: Association for Computational Linguistics, 2014: 1724-1734. DOI:10.3115/v1/d14-1179. [10] Lee H Y, Huang J B, Singh M, et al. Unsupervised representation learning by sorting sequences[C]//2017 IEEE International Conference on Computer Vision (ICCV). October 22-29, 2017, Venice, Italy. IEEE, 2017: 667-676. DOI:10.1109/ICCV.2017.79. [11] Luo D Z, Liu C, Zhou Y, et al. Video cloze procedure for self-supervised spatio-temporal learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11701-11708. DOI:10.1609/aaai.v34i07.6840. [12] Xu D J, Xiao J, Zhao Z, et al. Self-supervised spatiotemporal learning via video clip order prediction[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20, 2019, Long Beach, CA, USA. IEEE, 2019: 10326-10335. DOI:10.1109/CVPR.2019.01058. [13] Kim D, Cho D, Kweon I S. Self-supervised video representation learning with space-time cubic puzzles[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 8545-8552. DOI:10.1609/aaai.v33i01. 33018545. [14] Jayaraman D, Grauman K. Slow and steady feature analysis: higher order temporal coherence in video[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 3852-3861. DOI:10.1109/CVPR.2016.418. [15] Wang L M, Xiong Y J, Wang Z, et al. Temporal segment networks: towards good practices for deep action recognition[EB/OL]. arXiv.1608.00859. (2016-08-02) [2021-05-16]. https://doi.org/10.48550/arXiv.1608.00859. [16] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering[M]//Advances in Neural Information Processing Systems 14: Proceedings of the 2001 Conference.The MIT Press, 2002:585-591. DOI:10.7551/mitpress/1120.003.0080. [17] Heilbron F C, Escorcia V, Ghanem B, et al. ActivityNet: a large-scale video benchmark for human activity understanding[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 7-12, 2015, Boston, MA, USA. IEEE, 2015: 961-970. DOI:10.1109/CVPR.2015.7298698. [18] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 770-778. DOI:10.1109/CVPR.2016.90. [19] Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal Covariate Shift[C]//2015 International Conference on Machine Learning (ICML). July 6-11, 2015, Lille, France. PMLR, 2015: 448-456. |