一种融合Transformer和UNet的森林覆盖信息提取方法

doi:10.7523/j.ucas.2023.049

摘要/Abstract

摘要： 森林覆盖信息提取是森林遥感应用的重要内容之一，它对于森林资源管理、生态环境保护和气候变化研究等具有重要意义。传统的基于卷积神经网络的方法虽然能够有效地提取局部特征，但难以捕获远程依赖关系和全局上下文信息。为解决这个问题，提出一种融合Transformer和UNet的森林覆盖信息提取方法，简称为DiUNet。该方法将Transformer模块嵌入到UNet网络中，以增强其对远程依赖和全局上下文信息的感知能力。此外，针对森林覆盖信息的破碎、无规则和尺度不一等特点，通过利用相对位置编码增加位置信息，提升了模型对不同层次和尺度空间信息的捕获能力。构建一个基于Landsat 8和CDL数据层的森林覆盖信息数据集，并对该数据集进行深入实验分析。在对比实验中，DiUNet在精确度、召回率、F₁分数、交并比和频权交并比等指标中取得的结果最佳，分别为91.22%、92.66%、91.94%、85.08%和81.65%，同时在泛化实验中也取得了不错的结果。表明DiUNet方法在森林覆盖信息提取方面优于现有的方法，且具有较高的鲁棒性和泛化性。

关键词: 语义分割, UNet, Transformer, 森林覆盖信息, 森林遥感

Abstract: Forest cover information extraction is one of the essential tasks in forest remote sensing applications, which is of great significance for forest resource management, ecological environment protection, and climate change research. Traditional convolutional neural network-based methods can effectively extract local features, but struggle to capture long-range dependencies and global context information. To address this issue, we propose a method for forest cover information extraction that fuses Transformer and UNet, referred to as DiUNet. This approach embeds Transformer modules into the UNet network to enhance its perception of long-range dependencies and global context information. Meanwhile, considering the fragmentation, irregularity, and inconsistent scale of forest cover information, our method enhances the model’s ability to capture spatial information by using relative position encoding to increase the positional information, enabling the model to capture features at different levels and scales. We constructed a forest cover information dataset based on Landsat 8 and CDL data layers and conducted in-depth experimental analyses on this dataset. In the comparative experiments, DiUNet achieved the best results in accuracy, recall, F₁ score, intersection-over-union, and frequency-weighted intersection-over-union indices, which were 91.22%, 92.66%, 91.94%, 85.08%, and 81.65%, respectively. The model also performed well in generalization experiments. The experimental results show that the DiUNet method outperforms existing methods in forest cover information extraction and has high robustness and generalization capabilities.

Key words: semantic segmentation, UNet, Transformer, forest cover information, forest remote sensing

中图分类号:

TP753

廖凌岑, 刘巍, 刘士彬. 一种融合Transformer和UNet的森林覆盖信息提取方法[J]. 中国科学院大学学报, 2025, 42(3): 350-360.

LIAO Lingcen, LIU Wei, LIU Shibin. A method to extract forest cover information by fusing Transformer and UNet[J]. Journal of University of Chinese Academy of Sciences, 2025, 42(3): 350-360.

参考文献

[1] Myroniuk V, Bell D M, Gregory M J, et al. Uncovering forest dynamics using historical forest inventory data and Landsat time series[J]. Forest Ecology and Management, 2022, 513: 120184. DOI: 10.1016/j.foreco.2022.120184.
[2] 吴炳方, 蒙继华, 李强子. 国外农情遥感监测系统现状与启示[J]. 地球科学进展, 2010, 25(10): 1003-1012. DOI: 10.11867/j.issn.1001-8166.2010.10.1003.
[3] Song J, Gao S H, Zhu Y Q, et al. A survey of remote sensing image classification based on CNNs[J]. Big Earth Data, 2019, 3(3): 232-254. DOI：10.1080/20964471.2019. 1657720.
[4] Zhang L P, Zhang L F, Du B. Deep learning for remote sensing data: a technical tutorial on the state of the art[J]. IEEE Geoscience and Remote Sensing Magazine, 2016, 4(2): 22-40. DOI: 10.1109/MGRS.2016.2540798.
[5] 闫雪静, 刘巍, 刘士彬, 等. 遥感影像区域覆盖数据集筛选方法研究[J]. 中国科学院大学学报,2023,40(4):523-530.DOI: 10.7523/j.ucas.2022.006.
[6] Kussul N, Lavreniuk M, Skakun S, et al. Deep learning classification of land cover and crop types using remote sensing data[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(5): 778-782. DOI: 10.1109/LGRS.2017.2681128.
[7] Singh S, Suresh M, Jain K. Land information extraction with boundary preservation for high resolution satellite image[J]. International Journal of Computer Applications, 2015, 120(7): 39-43. DOI: 10.5120/21243-4014.
[8] 罗开盛, 李仁东, 常变蓉. 利用面向对象分类技术的大尺度土地覆被调查方法[J]. 中国科学院大学学报, 2013, 30(6): 770-778. DOI: 10.7523/j.issn.2095-6134.2013. 06.009.
[9] 张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用[J]. 计算机学报, 2019, 42(3): 453-482. DOI: 10.11897/SP.J.1016.2019.00453.
[10] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. DOI: 10.1007/978-3-319-24574-4_28.
[11] 杨丹, 李崇贵, 常铮, 等. 应用U-Net模型和多时相Landsat-8影像对森林植被的分类[J]. 东北林业大学学报, 2021, 49(9): 55-59, 66. DOI: 10.13759/j.cnki.dlxb.2021.09.011.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. December 4-9, 2017, Long Beach, California, USA. New York: ACM, 2017: 6000-6010. DOI: 10.5555 /3295222.3295349.
[13] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. 2020: arXiv: 2010.11929. (2020-10-22)[2023-04-25]. https://arxiv.org/abs/2010.11929.
[14] Zheng S X, Lu J C, Zhao H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021: 6877-6886. DOI: 10.1109/CVPR46437.2021.00681.
[15] Chen J N, Lu Y Y, Yu Q H, et al. Transunet: transformers make strong encoders for medical image segmentation[EB/OL]. 2021. arXiv: 2102.04306. (2021-02-08)[2023-04-25]. https://arxiv.org/abs/2102.04306.
[16] Hassani A, Walton S, Shah N, et al. Escaping the big data paradigm with compact transformers[EB/OL]. 2021. arXiv: 2104.05704. (2021-04-12)[2023-04-25]. https://arxiv.org/abs/2104.05704.
[17] Zhang Y D, Liu H Y, Hu Q. TransFuse: fusing transformers and CNNs for medical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2021: 14-24. DOI: 10.1007/978-3-030-87193-2_2.
[18] Xu W J, Xu Y F, Chang T, et al. Co-scale conv-attentional image transformers[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 9961-9970. DOI: 10.1109/ICCV48922.2021.00983.
[19] Liu Z, Lin Y T, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 9992-10002. DOI: 10.1109/ICCV48922.2021.00986.
[20] Hassani A, Shi H. Dilated neighborhood attention transformer[EB/OL]. 2022. arXiv: 2209.15001. (2022-09-29)[2023-04-25]. https://arxiv.org/abs/2209.15001.
[21] Hassani A, Walton S, Li J C, et al. Neighborhood attention transformer[EB/OL]. 2022. arXiv: 2204.07143. (2022-04-14)[2023-04-25]. https://arxiv.org/abs/2204.07143.
[22] Cao H, Wang Y Y, Chen J, et al. Swin-unet: unet-like pure transformer for medical image segmentation[C]//Karlinsky L, Michaeli T, Nishino K. European Conference on Computer Vision. Cham: Springer, 2023: 205-218. DOI: 10.1007/978-3-031-25066-8_9.
[23] Touvron H, Cord M, Sablayrolles A, et al. Going deeper with image transformers[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021: 32-42. DOI: 10.1109/ICCV48922.2021.00010.
[24] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence. May 24, 2016, IEEE, 2016: 640-651. DOI: 10.1109/TPAMI.2016.2572683.
[25] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. DOI: 10.1109/TPAMI.2017. 2699184.
[26] Kingma D P, Ba J. Adam: a method for stochastic optimization[EB/OL]. 2014. arXiv: 1412.6980. (2014-12-22)[2023-04-25]. https://arxiv.org/abs/1412.6980.