欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2025, Vol. 42 ›› Issue (2): 236-247.DOI: 10.7523/j.ucas.2024.025

• 电子信息与计算机科学 • 上一篇    

基于MFF-SFE的遥感图文跨模态检索方法

钟金彦1,2, 陈俊1,3,4, 李宇1, 吴业炜1, 葛小青1   

  1. 1. 中国科学院空天信息创新研究院, 北京 100094;
    2. 中国科学院大学电子电气与通信工程学院, 北京 100049;
    3. 中国科学院计算机网络信息中心, 北京 100083;
    4. 中国科学院大学计算机科学与技术学院, 北京 100049
  • 收稿日期:2024-01-18 修回日期:2024-04-17 发布日期:2024-05-22
  • 通讯作者: 李宇,E-mail:liyu202615@aircas.ac.cn
  • 基金资助:
    中国科学院青年促进会(E0331804)资助

Cross-modal retrieval method based on MFF-SFE for remote sensing image-text

ZHONG Jinyan1,2, CHEN Jun1,3,4, LI Yu1, WU Yewei1, GE Xiaoqing1   

  1. 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China;
    2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China;
    4. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-01-18 Revised:2024-04-17 Published:2024-05-22

摘要: 遥感图文跨模态检索技术能够从海量的遥感数据中快速获取有价值的信息,但现有遥感图文检索方法对遥感图像中的多尺度信息利用不足、目标信息识别效果不佳,检索精度相对较低。为此,提出一种新的遥感图文跨模态检索方法。该方法主要包括一个多尺度特征融合模块和一个显著特征增强模块,分别用于融合遥感图像的多尺度信息、加强对遥感图像目标信息的表达能力,从而提高遥感图文跨模态检索精度。在2个公开的遥感图像文本数据集上进行实验验证,结果表明,在遥感图文跨模态检索任务中,该方法在大部分评价指标上都优于其他方法,具有最佳的总体检索性能。

关键词: 跨模态检索, 遥感图像, 深度学习, 多尺度特征

Abstract: Remote sensing image-text cross-modal retrieval technology can quickly obtain valuable information from massive remote sensing data. However, existing remote sensing image-text retrieval methods have limitations in utilizing multi-scale information within remote sensing images, and the weak recognition of target information leads to relatively low retrieval accuracy. To address these issues, this paper proposes a new method for remote sensing image-text cross-modal retrieval. This method mainly comprises a multi-scale feature fusion module and a salient feature enhancement module, which are designed to integrate multi-scale information of remote sensing images and enhance the expression of target information in remote sensing images, so as to improve the precision of remote sensing image-text cross-modal retrieval. Experimental validation was conducted on two publicly available remote sensing image-text datasets. The results demonstrate that the proposed method outperforms other methods across most evaluation metrics in the remote sensing image-text cross-modal retrieval task and exhibits the best overall retrieval performance.

Key words: cross-modal retrieval, remote sensing images, deep learning, multi-scale feature

中图分类号: