[1] Chi M M, Plaza A, Benediktsson J A, et al. Big data for remote sensing: challenges and opportunities[J]. Proceedings of the IEEE, 2016, 104(11): 2207-2219. DOI: 10.1109/JPROC.2016.2598228. [2] Kaur P, Pannu H S, Malhi A K. Comparative analysis on cross-modal information retrieval: a review[J]. Computer Science Review, 2021, 39: 100336. DOI: 10.1016/j.cosrev.2020.100336. [3] Chen C, Zou H X, Shao N Y, et al. Deep semantic hashing retrieval of remotec sensing images[C]//IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. Valencia, Spain. IEEE, 2018: 1124-1127. DOI: 10.1109/IGARSS.2018.8519276. [4] Ye F M, Luo W, Dong M, et al. SAR image retrieval based on unsupervised domain adaptation and clustering[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(9): 1482-1486. DOI: 10.1109/LGRS.2019.2896948. [5] Guo M, Zhou C H, Liu J H. Jointly learning of visual and auditory: a new approach for RS image and audio cross-modal retrieval[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019, 12(11): 4644-4654. DOI: 10.1109/JSTARS.2019.2949220. [6] Shi Z W, Zou Z X. Can a machine generate humanlike language descriptions for a remote sensing image?[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(6): 3623-3634. DOI: 10.1109/TGRS.2017.2677464. [7] Wang G A, Hu Q H, Cheng J, et al. Semi-supervised generative adversarial hashing for image retrieval[C]//European Conference on Computer Vision. Cham: Springer, 2018: 491-507.10.1007/978-3-030-01267-0_29. [8] Lu J S, Batra D, Parikh D, et al. ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[EB/OL]. 2019.arXiv: 1908.02265. http://arxiv.org/abs/1908.02265. [9] Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[EB/OL]. 2021. arXiv: 2103.00020. (2021-02-26)[2024-04-01]. http://arxiv.org/abs/2103.00020. [10] Abdullah T, Bazi Y, Al Rahhal M M, et al. TextRS: deep bidirectional triplet network for matching text to remote sensing images[J]. Remote Sensing, 2020, 12(3): 405. DOI: 10.3390/rs12030405. [11] Lv Y F, Xiong W, Zhang X H, et al. Fusion-based correlation learning model for cross-modal remote sensing image retrieval[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 19: 6503205. DOI: 10.1109/LGRS.2021.3131592. [12] Mikriukov G, Ravanbakhsh M, Demir B. Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing[EB/OL]. 2022. arXiv: 2201.08125. http://arxiv.org/abs/2201.08125. [13] Yuan Z Q, Zhang W K, Tian C Y, et al. Remote sensing cross-modal text-image retrieval based on global and local information[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5620616. DOI: 10.1109/TGRS.2022.3163706. [14] Cheng Q M, Zhou Y Z, Fu P, et al. A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 4284-4297. DOI: 10.1109/JSTARS.2021.3070872. [15] Zheng F Z, Li W P, Wang X, et al. A cross-attention mechanism based on regional-level semantic features of images for cross-modal text-image retrieval in remote sensing[J]. Applied Sciences, 2022, 12(23): 12221. DOI: 10.3390/app122312221. [16] Tang X, Wang Y J, Ma J J, et al. Interacting-enhancing feature transformer for cross-modal remote-sensing image and text retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5611715. DOI: 10.1109/TGRS.2023.3280546. [17] Yuan Z Q, Zhang W K, Fu K, et al. Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4404119. DOI: 10.1109/TGRS.2021.3078451. [18] Wang Y J, Ma J J, Li M T, et al. Multi-scale interactive transformer for remote sensing cross-modal image-text retrieval[C]//IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. Kuala Lumpur, Malaysia. IEEE, 2022: 839-842. DOI: 10.1109/IGARSS46834.2022.9883252. [19] 张若愚, 聂婕, 宋宁, 等. 基于布局化-语义联合表征遥感图文检索方法[J]. 北京航空航天大学学报, 2024, 50(2): 671-683. DOI:10.13700/j.bh.1001-5965.2022.0527. [20] Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. February 4-9, 2017, San Francisco, California, USA. ACM, 2017: 4278-4284. DOI: 10.5555/3298023.3298188. [21] Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. 2018. arXiv: 1810.04805. (2018-10-11)[2024-04-01]. http://arxiv.org/abs/1810.04805.pdf. [22] Qu B, Li X L, Tao D C, et al. Deep semantic understanding of high resolution remote sensing image[C]//2016 International Conference on Computer, Information and Telecommunication Systems (CITS). Kunming, China. IEEE, 2016: 1-5. DOI: 10.1109/CITS.2016.7546397. [23] Faghri F, Fleet D J, Kiros J R, et al. VSE++: improving visual-semantic embeddings with hard negatives[EB/OL]. 2017. arXiv: 1707.05612. (2017-07-18)[2024-04-01].http://arxiv.org/abs/1707.05612. [24] Lee K H, Chen X, Hua G, et al. Stacked cross attention for image-text matching[C]//European Conference on Computer Vision. Cham: Springer, 2018: 212-228.10.1007/978-3-030-01225-0_13. [25] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031. [26] Wang Z H, Liu X H, Li H S, et al. CAMP: cross-modal adaptive message passing for text-image retrieval[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South). IEEE, 2019: 5763-5772. DOI: 10.1109/ICCV.2019.00586. [27] Wang T, Xu X, Yang Y, et al. Matching images and text with multi-modal tensor fusion and re-ranking[C]//Proceedings of the 27th ACM International Conference on Multimedia. October 21-25, 2019, Nice, France. ACM, 2019: 12-20. DOI: 10.1145/3343031.3350875. |