基于GAN反演的无缝图像补全技术

doi:10.7523/j.ucas.2022.075

摘要/Abstract

摘要： 图像补全技术广泛应用于对象消除、媒体编辑，旨在平滑地恢复受损图像。基于生成对抗网络(GAN)反演将预训练的GAN模型作为有效先验，以真实的合成材质填充缺失区域。然而，现有GAN反演方法忽视了图像补全是具有硬约束的生成任务，使拼接图像有颜色、语义的不连续问题。针对此问题设计新的双向感知生成器和预调制网络来无缝地补全图像，其中双向感知生成器充分利用扩展隐藏空间，帮助模型从数据表征层面感知输入图像的非缺失区域，预调制网络利用多尺度结构进一步为风格向量提供判别性更强的语义。在Places2和CelebA-HQ数据集上进行实验，结果表明该方法不仅搭建GAN反演和图像补全之间的桥梁，而且优于目前主流算法，在FID指标上降低49.2%。

关键词: 图像补全, 生成对抗网络, GAN反演, 深度学习, 对象消除

Abstract: Image completion is widely used in unwanted object removal and media editing, which aims to find a semantically consistent way to recover corrupted images. This paper is based on generative adversarial network (GAN) inversion, which leverages a pre-trained GAN model as an effective prior to filling in the missing regions with photo-realistic textures. However, existing GAN inversion methods ignore that image completion is a generative task with hard constraints, making final images have noticeable color and semantic discontinuity issues. This paper designs a novel bi-directional perceptual generator and pre-modulation network to seamlessly fill in the images. The bi-directional perceptual generator uses extended latent space to help the model perceive the non-missing regions of the input images in terms of data representations. The pre-modulated networks utilize a multiscale structure further providing more discriminative semantics for the style vectors. In this paper, experiments are conducted on Places2 and CelebA-HQ datasets to verify that the proposed method builds a bridge between GAN inversion and image completion and outperforms current mainstream algorithms, especially in FID metrics up to 49.2% enhancement at most.

Key words: image completion, generative adversarial network, GAN inversion, deep learning, unwanted object removal

中图分类号:

TP391

喻永生, 罗铁坚. 基于GAN反演的无缝图像补全技术[J]. 中国科学院大学学报, 2024, 41(5): 705-714.

YU Yongsheng, LUO Tiejian. Seamless image completion via GAN inversion[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(5): 705-714.

参考文献

[1] 赵露露,沈玲,洪日昌.图像修复研究进展综述[J].计算机科学,2021,48(3):14-26. DOI:10.11896/jsjkx.210100048.
[2] Pathak D, Krähenbühl P, Donahue J, et al. Context encoders:feature learning by inpainting[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. June 27-30, 2016, Las Vegas, NV, USA. IEEE, 2016:2536-2544. DOI:10.1109/CVPR.2016.278.
[3] Ronneberger O, Fischer P, Brox T. U-net:convolutional networks for biomedical image segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, 2015:234-241. DOI:10.1007/978-3-319-24574-4_28.
[4] Iizuka S, Simo-Serra E, Ishikawa H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4):1-14. DOI:10.1145/3072959.3073659.
[5] Liu G L, Reda F A, Shih K J, et al. Image inpainting for irregular holes using partial convolutions[C]//Computer Vision-ECCV 2018, 2018:85-100. DOI:10.1007/978-3-030-01252-6_6.
[6] Yu J H, Lin Z, Yang J M, et al. Free-form image inpainting with gated convolution[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019, Seoul, Korea (South). IEEE, 2019:4470-4479. DOI:10.1109/ICCV.2019.00457.
[7] Yan Z Y, Li X M, Li M, et al. Shift-net:image inpainting via deep feature rearrangement[C]//Computer Vision-ECCV 2018, 2018:1-17. DOI:10.1007/978-3-030-01264-9_1.
[8] Mirza M, Osindero S. Conditional generative adversarial nets[EB/OL]. arXiv:1411.1784v1,(2014-11-06)[2022-04-11]. https://arxiv.org/abs/1411.1784.
[9] Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training gans[EB/OL]. arXiv:1606.03498v1,(2016-06-10)[2022-04-11]. https://arxiv.org/abs/1606.03498.
[10] Arjovsky M, Chintala S, Bottou L. Wasserstein Gan[EB/OL]. arXiv:1701.07875v3,(2017-12-06)[2022-04-11]. https://arxiv.org/abs/1701.07875.
[11] Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[EB/OL]. arXiv:1704.00028v3,(2017-12-25)[2022-04-11]. https://arxiv.org/abs/1704.00028.
[12] Ledig C, Theis L, Huszár F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. July 21-26, 2017, Honolulu, HI, USA. IEEE, 2017:105-114. DOI:10.1109/CVPR.2017.19.
[13] Karras T, Aila T, Laine S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. arXiv:1710.10196v3,(2018-02-26)[2022-04-11]. https://arxiv.org/abs/1710.10196.
[14] Mao X D, Li Q, Xie H R, et al. Least squares generative adversarial networks[C]//2017 IEEE International Conference on Computer Vision. October 22-29, 2017, Venice, Italy. IEEE, 2017:2813-2821. DOI:10.1109/ICCV.2017.304.
[15] Miyato T, Kataoka T, Koyama M, et al. Spectral normalization for generative adversarial networks[EB/OL]. arXiv:1802.05957v1,(2018-02-16)[2022-04-11]. http://arxiv.org/abs/1802.05957.
[16] Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis[EB/OL]. arXiv:1809.11096v2,(2019-02-25)[2022-04-11]. http://arxiv.org/abs/1809.11096.
[17] Karras T, Laine S, Aila T M. A style-based generator architecture for generative adversarial networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20, 2019, Long Beach, CA, USA. IEEE, 2019:4396-4405. DOI:10.1109/CVPR.2019.00453.
[18] Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of StyleGAN[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020:8107-8116. DOI:10.1109/CVPR42600.2020.00813. DOI:10.1109/CVPR42600.2020.00813.
[19] Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//2017 IEEE International Conference on Computer Vision. October 22-29, 2017, Venice, Italy. IEEE, 2017:1510-1519. DOI:10.1109/ICCV.2017.167.
[20] Zhu J Y, Krähenbühl P, Shechtman E, et al. Generative visual manipulation on the natural image manifold[C]//Computer Vision-ECCV 2016, 2016:597-613. DOI:10.1007/978-3-319-46454-1_36.
[21] Xia W H, Zhang Y L, Yang Y J, et al. GAN inversion:a survey[EB/OL]. arXiv:2101.05278v5,(2022-03-22)[2022-04-11]. https://arxiv.org/abs/2101.05278.
[22] Richardson E, Alaluf Y, Patashnik O, et al. Encoding in style:a StyleGAN encoder for image-to-image translation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021:2287-2296. DOI:10.1109/cvpr46437.2021.00232.
[23] Xu Y H, Shen Y J, Zhu J P, et al. Generative hierarchical features from synthesizing images[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021:4430. DOI:10.1109/cvpr46437.2021.00441.
[24] Gu J J, Shen Y J, Zhou B L. Image processing using multi-code GAN prior[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020:3009-3018. DOI:10.1109/cvpr42600.2020.00308.
[25] Wang H P, Yu N, Fritz M. Hijack-GAN:unintended-use of pretrained, black-box GANs[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20-25, 2021, Nashville, TN, USA. IEEE, 2021:7868-7877. DOI:10.1109/cvpr46437.2021.00778.
[26] Abdal R, Qin Y P, Wonka P. Image2StyleGAN:how to embed images into the StyleGAN latent space?[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27-November 2, 2019, Seoul, Korea (South). IEEE, 2019:4431-4440. DOI:10.1109/ICCV.2019.00453.
[27] Zhou B L, Lapedriza A, Khosla A, et al. Places:a 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6):1452-1464. DOI:10.1109/TPAMI.2017.2723009.
[28] Li J Y, Wang N, Zhang L F, et al. Recurrent feature reasoning for image inpainting[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 13-19, 2020, Seattle, WA, USA. IEEE, 2020:7757-7765. DOI:10.1109/CVPR42600.2020.00778.
[29] Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment:from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4):600-612. DOI:10.1109/TIP.2003.819861.
[30] Heusel M, Ramsauer H, Unterthiner T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[EB/OL]. arXiv:1706.08500v6,(2018-01-12)[2022-04-11]. https://arxiv.org/abs/1706.08500.
[31] Nazeri K, Ng E, Joseph T, et al. EdgeConnect:structure guided image inpainting using edge prediction[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). October 27-28, 2019, Seoul, Korea (South). IEEE, 2019:3265-3274. DOI:10.1109/ICCVW.2019.00408.
[32] Liu H Y, Jiang B, Song Y B, et al. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations[C]//Computer Vision-ECCV 2020, 2020:725-741. DOI:10.1007/978-3-030-58536-5_43.
[33] Guo X F, Yang H Y, Huang D. Image inpainting via conditional texture and structure dual generation[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021:14114-14123. DOI:10.1109/ICCV48922.2021.01387.
[34] Zeng Y, Lin Z, Lu H C, et al. CR-fill:generative image inpainting with auxiliary contextual reconstruction[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17, 2021, Montreal, QC, Canada. IEEE, 2021:14144-14153. DOI:10.1109/ICCV48922.2021.01390.
[35] Wu H K, Zheng S, Zhang J G, et al. GP-GAN:towards realistic high-resolution image blending[C]//Proceedings of the 27th ACM International Conference on Multimedia. 2019:2487-2495. DOI:10.1145/3343031.3350944.