Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences ›› 2025, Vol. 42 ›› Issue (2): 236-247.DOI: 10.7523/j.ucas.2024.025

• Research Articles • Previous Articles    

Cross-modal retrieval method based on MFF-SFE for remote sensing image-text

ZHONG Jinyan1,2, CHEN Jun1,3,4, LI Yu1, WU Yewei1, GE Xiaoqing1   

  1. 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China;
    2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China;
    4. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-01-18 Revised:2024-04-17

Abstract: Remote sensing image-text cross-modal retrieval technology can quickly obtain valuable information from massive remote sensing data. However, existing remote sensing image-text retrieval methods have limitations in utilizing multi-scale information within remote sensing images, and the weak recognition of target information leads to relatively low retrieval accuracy. To address these issues, this paper proposes a new method for remote sensing image-text cross-modal retrieval. This method mainly comprises a multi-scale feature fusion module and a salient feature enhancement module, which are designed to integrate multi-scale information of remote sensing images and enhance the expression of target information in remote sensing images, so as to improve the precision of remote sensing image-text cross-modal retrieval. Experimental validation was conducted on two publicly available remote sensing image-text datasets. The results demonstrate that the proposed method outperforms other methods across most evaluation metrics in the remote sensing image-text cross-modal retrieval task and exhibits the best overall retrieval performance.

Key words: cross-modal retrieval, remote sensing images, deep learning, multi-scale feature

CLC Number: