欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (6): 827-835.DOI: 10.7523/j.ucas.2021.0049

• 电子信息与计算机科学 • 上一篇    下一篇

基于双流LSTM与自监督学习的 在线动作检测算法

朱嘉桐, 卿来云, 黄庆明   

  1. 中国科学院大学计算机科学与技术学院, 北京 100049
  • 收稿日期:2021-03-22 修回日期:2021-05-31 发布日期:2022-11-11
  • 通讯作者: 卿来云,E-mail:lyqing@ucas.ac.cn
  • 基金资助:
    国家自然科学基金(61872333)资助

Two stream LSTM based on self-supervised learning for online action detection

ZHU Jiatong, QING Laiyun, HUANG Qingming   

  1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-03-22 Revised:2021-05-31 Published:2022-11-11

摘要: 在线动作检测对安防和人机交互等应用非常重要,该问题要求模型在动作刚开始时就能检测到,而不是等待整个事件完整结束。由于在线动作检测只能基于观测到的部分视频进行判断,因此相比动作识别和动作检测等任务,模型需要挖掘更多信息辅助决策。基于在线动作检测问题中常用的长短时记忆网络(LSTM)模型,构建双流LSTM模型(2S-LSTM),并将在图像领域中被广泛使用的自监督学习思想引入到在线动作检测问题中。首先,双流网络2S-LSTM模型分别对RGB流与光流的时序信息采用LSTM建模。同时基于自监督学习的思想构建出2个新型的损失函数——时序相似度损失与光流稳定损失用于模型的训练。实验表明,与过去的在线动作检测方法RED、TRN、IDN相比,本文的模型在TVSeries与THUMOS’14这2个数据集上都取得了较好的结果。

关键词: 自监督学习, 双流LSTM(2S-LSTM), 在线动作检测, 时序相似度损失, 光流稳定损失

Abstract: Online action detection plays very important role in many applications such as security and human-computer interaction. This mission requires that the system can detect the action when it just started, instead of waiting for the entire action comes to an end. Since in online action detection problem models can only make judgments based on the observed part of the video, so compared to other tasks such as action recognition and action detection, the model needs to dig out more from history information to assist decision-making for current frame. Based on the long short-term memory (LSTM) model commonly used in online action detection problems, this paper constructs a two-stream LSTM model called 2S-LSTM, and introduces the self-supervised learning idea, which is widely used in the image field, into the online action detection problem. First, the two-stream network 2S-LSTM model uses LSTM to model the temporal information of RGB flow and optical flow respectively. Moreover, based on the idea of self-supervised learning we construct two new loss functions:temporal similarity loss and optical flow stability loss for training. Experiments show that, compared with the past online motion detection methods such as RED, TRN, and IDN, our model in has achieved better results on both the TVSeries and THUMOS’14 datasets.

Key words: self-supervised learning, two-stream LSTM networks(2S-LSTM), online action detection, temporal similarity loss, optical flow stability loss

中图分类号: