Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences ›› 2024, Vol. 41 ›› Issue (6): 810-820.DOI: 10.7523/j.ucas.2023.008

• Research Articles • Previous Articles     Next Articles

Field dynamic small object detection network based on double frame fusion

ZHAO Xiaohan1,2, ZHANG Zebin1, LI Baoqing1   

  1. 1. Key Laboratory of Microsystem Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 201800, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-11-02 Revised:2023-02-08 Online:2024-11-15

Abstract: Detecting dynamic small objects in complex environments in the field remains a challenging problem for defense and military applications due to factors such as more background interference in the field surveillance sensing systems, fewer pixels of small targets, and the lack of relevant open datasets. In order to solve this problem, a YOLOv5-based object detection network with double frame feature fusion (YOLO-DFNet) is proposed. Firstly, a double frame feature fusion module(D-F fusion) is introduced to process the adjacent frame features from the backbone network, calculating attention in channel, time, and space dimensions successively, to extract motion features. Secondly, a temporal trapezoidal fusion network based on an attention mechanism(TTFN_AM) is designed between the neck network and the detection head to focus on dynamic objects within receptive fields of different sizes, thereby improving the detection effect of small objects with large displacement. The experimental results on field motion small object dataset (FMSOD) show that the mean average precision (mAP) on different IoUs of the proposed YOLO-DFNet is 3.9 percentage points higher than that of YOLOv5, and also outperforms other object detection models such as Tph-YOLOv5 and YOLOv7.

Key words: object detection, field monitoring sensor network, dynamic small object, double-frame feature fusion, spatial-temporal attention

CLC Number: