欢迎访问中国科学院大学学报,今天是

中国科学院大学学报

• •    

基于记忆增强和扩散模型的骨架视频异常检测*

张栖1, 李元1†, 周彦钊1, 焦建彬2   

  1. 1 中国科学院大学电子电气与通信工程学院,北京 100049;
    2 中国科学院大学应急管理科学与工程学院,北京 100049
  • 收稿日期:2024-11-14 修回日期:2025-04-02
  • 通讯作者: E-mail:liyuan23@ucas.ac.cn
  • 基金资助:
    *中国科学院战略先导A项目(XDA27000000)资助

Skeleton-based video anomaly detection with memory enhancement and diffusion model

ZHANG Qi1, LI Yuan1, ZHOU Yanzhao1, JIAO Jianbin2   

  1. 1 School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China;
    2 School of Emergency Management Science and Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-11-14 Revised:2025-04-02

摘要: 视频异常检测(VAD)是一种旨在识别视频中异常事件的技术,在公共安全和视频内容理解方面有广泛应用价值。传统的VAD方法主要依赖于从视频帧的整体或特定区域提取像素特征。为了降低非结构化噪音的影响,基于人体骨架VAD方法备受关注。然而在开集场景下,此类方法面临2大挑战:对异常行为的错误重构及对正常行为拓展能力的不足。针对这2问题,本文提出一种基于记忆增强与扩散模型的骨架视频异常检测框架(MEDM-SVAD)。该框架通过引入记忆增强模块扩展模型对正常样本的记忆能力,避免异常行为因重构误差较小而被误判;此外,利用扩散模型显著提高对域外正常行为的泛化能力。在HR-STC、HR-Avenue和HR-UBnormal这3个公开数据集上的实验结果表明,相较于现有最先进的MoCoDAD算法,所提方法MEDM-SVAD在AUC指标上分别取得78.5、90.1和69.7的结果,都实现不同程度的性能提升,验证了其在多种场景下的有效性与优越性。

关键词: 视频异常检测, 骨架特征, 扩散模型, 记忆增强

Abstract: Video Anomaly Detection (VAD) is a technique aimed at identifying abnormal events within videos and has significant applications in public safety and video content understanding. Traditional VAD methods primarily rely on extracting pixel features from either the entire video frame or specific regions. To reduce the impact of unstructured noise, VAD methods based on human skeletal data have garnered considerable attention. However, in open-set scenarios, these methods face two major challenges: erroneous reconstruction of abnormal behaviors and insufficient generalization to diverse normal behaviors. To address these issues, this paper proposes a skeleton-based video anomaly detection framework with memory enhancement and diffusion model (MEDM-SVAD). This framework incorporates a memory enhancement module to expand the model's memory capacity for normal samples, preventing abnormal behaviors from being misclassified due to minimal reconstruction error. Additionally, the diffusion model significantly improves the framework's ability to generalize to out-of-domain normal behaviors. Experimental results on three public datasets, HR-STC, HR-Avenue, and HR-UBnormal, show that the proposed method MEDM-SVAD achieves AUC scores of 78.5, 90.1, and 69.7, respectively, demonstrating various levels of performance improvement over the current state-of-the-art MoCoDAD algorithm, thus verifying its effectiveness and superiority across multiple scenarios.

Key words: video anomaly detection, skeleton-based features, diffusion model, memory-enhanced

中图分类号: