基于多尺度语义先验的街景图像修复算法

doi:10.7523/j.ucas.2023.089

摘要/Abstract

摘要： 城市街景图像作为重要的空间数据形式，在地图服务、城市环境三维重建与制图等领域具有广泛的应用价值。由于采集的街景图像面临干扰目标遮挡和隐私安全等问题，因而需要进行精细的预处理。针对街景图像预处理中的问题，提出基于多尺度语义先验特征引导的街景图像修复算法，用于生成更加真实自然的静态街景图像。首先，构建语义先验网络学习输入图像缺失区域的多尺度语义先验以增强上下文信息，语义增强生成网络利用先验转移模块自适应地融合多尺度语义先验和图像特征，同时引入多级注意力转移机制细化图像纹理信息；算法采用马尔可夫判别网络，通过对抗训练区分生成图像与真实图像，使得重建后的街景图像具有更强的真实感。基于Apolloscape数据集的实验证明，该算法在图像语义结构连贯性和细节纹理等方面取得了显著提升，算法的提出在解决街景图像中隐私问题的同时也可为实景化城市应用提供更可靠的基础数据。

关键词: 城市街景, 图像修复, 城市实景化, 对抗生成网络, 深度学习, 移动目标去除

Abstract: Urban street view imagery, as crucial forms of spatial data, has a wide range of applications in mapping services, urban 3D reconstruction, and cartography. However, since the collected street view images often face challenges such as distracting target occlusion and privacy concerns, necessitating meticulous preprocessing. Addressing these challenges, we propose an image inpainting algorithm based on multi-scale semantic priori guided for generating more realistic and natural static street view images. Firstly, a semantic prior network is designed to learn the multi-scale semantic priors of the missing regions of the input image to enhance the contextual information. The semantic enhancement generator adaptively fuses the multi-scale semantic priors and image features and at the same time introduces a multilevel attention shifting mechanism to refine the texture information of the image. Finally, a Markov discriminator is adopted to distinguish the generated image from the real image by adversarial training, which makes the reconstructed street scene image more realistic. Experiments on the Apolloscape dataset demonstrate that the images generated by our algorithm have achieved significant improvements in semantic structural coherence and detailed texture, solving the privacy problem in street view while providing a more reliable data base for realistic city applications.

Key words: street view image, image inpainting, realization of urban complex environment, generative adversarial network, deep learning, moving object removal

中图分类号:

曾建顺, 吕炎杰, 覃驭楚. 基于多尺度语义先验的街景图像修复算法[J]. 中国科学院大学学报, 2025, 42(4): 496-507.

ZENG Jianshun, LYU Yanjie, QIN Yuchu. Multi-scale semantic prior features guided street view image inpainting algorithm[J]. Journal of University of Chinese Academy of Sciences, 2025, 42(4): 496-507.

参考文献

[1] 吴江寿, 王洪深, 张洁. 大数据时代数字化城市管理智慧应用实践[J]. 地理信息世界, 2015, 22(3): 107-110. DOI: 10.3969/j.issn.1672-1586.2015.03.020.
[2] 郭华东. 大数据时代的“数字地球”[J]. 中国战略新兴产业, 2016(17): 94. DOI: 10.19474/j.cnki.10-1156/f.000054.
[3] 王锋, 潘德吉, 王俊. 城市三维模型海量数据动态组织调度方法[J]. 中国科学院大学学报, 2015, 32(3): 409-415. DOI: 10.7523/j.issn.2095-6134.2015.03.018.
[4] 汪淼. 城市三维街景地理信息服务平台设计与应用[J]. 测绘通报, 2016(12): 108-110, 123. DOI: 10.13474/j.cnki.11-2246.2016.0413.
[5] 杨伟, 谢维成, 蒋文波, 等. 基于自相似性车载采集城市街景图像的重建[J]. 计算机应用, 2017, 37(3): 817-822. DOI: 10.11772/j.issn.1001-9081.2017.03.817.
[6] 胡兵兵, 唐华, 吴幼龙. 基于互信息约束的生成对抗网络分类模型[J]. 中国科学院大学学报, 2022, 39(4): 551-560. DOI: 10.7523/j.ucas.2020.0037.
[7] 池凌鸿, 郭立, 郁理, 等. 一种采用自适应机制的分层置信传播算法[J]. 中国科学院研究生院学报, 2011, 28(5): 630-635. DOI: 10.7523/j.issn.2095-6134.2011.5.010.
[8] Elharrouss O, Almaadeed N, Al-Maadeed S, et al. Image inpainting: a review[J]. Neural Processing Letters, 2020, 51(2): 2007-2028. DOI: 10.1007/s11063-019-10163-0.
[9] Barnes C, Shechtman E, Finkelstein A, et al. PatchMatch: a randomized correspondence algorithm for structural image editing[C]//ACM Transactions on Graphics, Proceedings of SIGGRAPH 2009. DOI: 10.1145/1531326.1531330.
[10] Ding D, Ram S, Rodríguez J J. Image inpainting using nonlocal texture matching and nonlinear filtering[J]. IEEE Transactions on Image Processing, 2019, 28(4): 1705-1719. DOI: 10.1109/TIP.2018.2880681.
[11] Bertalmio M, Sapiro G, Caselles V, et al. Image inpainting[C]//2000 Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. New York:ACM, 2000: 417-424. DOI: 10.1145/344779.344972.
[12] Liu Z X, Wan W G. Image inpainting algorithm based on KSVD and improved CDD[C]//2018 International Conference on Audio, Language and Image Processing (ICALIP). July 16-17, 2018, Shanghai, China. IEEE, 2018: 413-417. DOI: 10.1109/ICALIP.2018.8455425.
[13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. DOI: 10.1145/3065386.
[14] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. DOI: 10.1145/3422622.
[15] Pathak D, Krähenbühl P, Donahue J, et al. Context encoders: feature learning by inpainting[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016: 2536-2544. DOI: 10.1109/CVPR.2016.278.
[16] Iizuka S, Simo-Serra E, Ishikawa H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 1-14. DOI: 10.1145/3072959.3073659.
[17] Yu J H, Lin Z, Yang J M, et al. Generative image inpainting with contextual attention[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA. IEEE, 2018: 5505-5514. DOI: 10.1109/CVPR.2018.00577.
[18] Liu G L, Reda F A, Shih K J, et al. Image inpainting for irregular holes using partial convolutions[C]//2018 Computer Vision - ECCV 2018: 15th European Conference. September 8-14, Munich, Germany. New York: ACM, 2018: 89-105. DOI: 10.1007/978-3-030-01252-6_6.
[19] Yu J H, Lin Z, Yang J M, et al. Free-form image inpainting with gated convolution[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27 - November 2, 2019, Seoul, Korea (South). IEEE, 2020: 4470-4479. DOI: 10.1109/ICCV.2019.00457.
[20] Yu T, Guo Z Y, Jin X, et al. Region normalization for image inpainting[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12733-12740. DOI: 10.1609/aaai.v34i07.6967.
[21] Suvorov R, Logacheva E, Mashikhin A, et al. Resolution-robust large mask inpainting with Fourier convolutions[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). January 3-8, 2022, Waikoloa, HI, USA. IEEE, 2022: 3172-3182. DOI: 10.1109/WACV51458.2022.00323.
[22] Li W B, Lin Z, Zhou K, et al. MAT: mask-aware transformer for large hole image inpainting[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 18-24, 2022, New Orleans, LA, USA. IEEE, 2022: 10748-10758. DOI: 10.1109/CVPR52688.2022.01049.
[23] Liao L, Xiao J, Wang Z, et al. Guidance and evaluation: semantic-aware image inpainting for mixed scenes[C]//2020 European Conference on Computer Vision. August 23-28, 2020, Glasgow, UK. Cham: Springer, 2022: 683-700. DOI: 10.1007/978-3-030-58583-9_41.
[24] Nazeri K, Ng E, Joseph T, et al. EdgeConnect: generative image inpainting with adversarial edge learning[C]//2019 IEEE/ CVF International Conference on Computer Vision Workshop. October 27-28, 2019, Seoul, Korea (South). IEEE, 2019: 10-12. DOI: 10.1109/ICCVW.2019.00408.
[25] Pinto F, Romanoni A, Matteucci M, et al. SECI-GAN: semantic and Edge Completion for dynamic objects removal[C]//2020 25th International Conference on Pattern Recognition (ICPR). January 10-15, 2021, Milan, Italy. IEEE, 2021: 10441-10448. DOI: 10.1109/ICPR48806.2021.9413320.
[26] Cao C J, Fu Y W. Learning a sketch tensor space for image inpainting of man-made scenes[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. 2022: 14489-14498. DOI: 10.1109/ICCV48922.2021.01424.
[27] Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//2015 International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer, 2015: 234-241. DOI: 10.1007/978-3-319-24574-4_28.
[28] Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. DOI: 10.1109/TPAMI.2017.2699184.
[29] Zeng Y H, Fu J L, Chao H Y, et al. Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(7): 3266-3280. DOI: 10.1109/TVCG.2022.3156949.
[30] Kuznetsova A, Rom H, Alldrin N, et al. The open images dataset V4[J]. International Journal of Computer Vision, 2020, 128(7): 1956-1981. DOI: 10.1007/s11263-020-01316-z.
[31] Ridnik T, Ben-Baruch E, Zamir N, et al. Asymmetric loss for multi-label classification[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. 2022: 82-91. DOI: 10.1109/ICCV48922.2021.00015.
[32] Park T, Liu M Y, Wang T C, et al. Semantic image synthesis with spatially-adaptive normalization[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20, 2019, Long Beach, CA, USA. IEEE: 2332-2341. DOI: 10.1109/CVPR.2019.00244.
[33] Yi Z L, Tang Q, Azizi S, et al. Contextual residual aggregation for ultra high-resolution image inpainting[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE: 7505-7514. DOI: 10.1109/CVPR42600.2020.00753.
[34] Miyato T, Kataoka T, Koyama M, et al. Spectral normalization for generative adversarial networks[EB/OL].2018.arXiv:1802.05957.(2018-02-16)[2023-11-19]. https://arxiv.org/abs/1802.05957.pdf.
[35] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL].2014.arXiv:1409.1556.(2014-11-18)[2023-11-19]. https://arxiv.org/abs/1409.1556.pdf.
[36] Zhu M Y, He D L, Li X, et al. Image inpainting by end-to-end cascaded refinement with mask awareness[J]. IEEE Transactions on Image Processing: a Publication of the IEEE, 2021, 30: 4855-4866. DOI: 10.1109/TIP.2021.3076310.