欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2025, Vol. 42 ›› Issue (4): 496-507.DOI: 10.7523/j.ucas.2023.089

• 电子信息与计算机科学 • 上一篇    

基于多尺度语义先验的街景图像修复算法

曾建顺1,2, 吕炎杰3, 覃驭楚3   

  1. 1. 中国科学院空天信息创新研究院 数字地球国家重点实验室, 北京 100094;
    2. 中国科学院大学电子电气与 通信工程学院, 北京 100049;
    3. 可持续发展大数据国际研究中心, 北京 100094
  • 收稿日期:2023-09-21 修回日期:2023-11-30 发布日期:2023-12-12
  • 通讯作者: 覃驭楚,E-mail:qinyc@aircas.ac.cn
  • 基金资助:
    中国科学院A类战略性先导科技专项(XDA19030102)资助

Multi-scale semantic prior features guided street view image inpainting algorithm

ZENG Jianshun1,2, LYU Yanjie3, QIN Yuchu3   

  1. 1. State Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China;
    2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences,Beijing 100049, China;
    3. International Research Center of Big Data for Sustainable Development Goals,Beijing 100094, China
  • Received:2023-09-21 Revised:2023-11-30 Published:2023-12-12

摘要: 城市街景图像作为重要的空间数据形式,在地图服务、城市环境三维重建与制图等领域具有广泛的应用价值。由于采集的街景图像面临干扰目标遮挡和隐私安全等问题,因而需要进行精细的预处理。针对街景图像预处理中的问题,提出基于多尺度语义先验特征引导的街景图像修复算法,用于生成更加真实自然的静态街景图像。首先,构建语义先验网络学习输入图像缺失区域的多尺度语义先验以增强上下文信息,语义增强生成网络利用先验转移模块自适应地融合多尺度语义先验和图像特征,同时引入多级注意力转移机制细化图像纹理信息;算法采用马尔可夫判别网络,通过对抗训练区分生成图像与真实图像,使得重建后的街景图像具有更强的真实感。基于Apolloscape数据集的实验证明,该算法在图像语义结构连贯性和细节纹理等方面取得了显著提升,算法的提出在解决街景图像中隐私问题的同时也可为实景化城市应用提供更可靠的基础数据。

关键词: 城市街景, 图像修复, 城市实景化, 对抗生成网络, 深度学习, 移动目标去除

Abstract: Urban street view imagery, as crucial forms of spatial data, has a wide range of applications in mapping services, urban 3D reconstruction, and cartography. However, since the collected street view images often face challenges such as distracting target occlusion and privacy concerns, necessitating meticulous preprocessing. Addressing these challenges, we propose an image inpainting algorithm based on multi-scale semantic priori guided for generating more realistic and natural static street view images. Firstly, a semantic prior network is designed to learn the multi-scale semantic priors of the missing regions of the input image to enhance the contextual information. The semantic enhancement generator adaptively fuses the multi-scale semantic priors and image features and at the same time introduces a multilevel attention shifting mechanism to refine the texture information of the image. Finally, a Markov discriminator is adopted to distinguish the generated image from the real image by adversarial training, which makes the reconstructed street scene image more realistic. Experiments on the Apolloscape dataset demonstrate that the images generated by our algorithm have achieved significant improvements in semantic structural coherence and detailed texture, solving the privacy problem in street view while providing a more reliable data base for realistic city applications.

Key words: street view image, image inpainting, realization of urban complex environment, generative adversarial network, deep learning, moving object removal

中图分类号: