生成对抗网络(Generative Adversarial Network)在正面人脸图像生成方面大放异彩,生成的正面人脸极其逼真受到研究人缘的青睐。但其强大的图像生成能力源自于其训练和使用过程中巨大的计算量,GAN结构越复杂,其计算量需求,这极大地限制了其交互式部署。为增强其部署的便利性,减少GAN的计算量需求,本文提出了一种通用的压缩算法,该算法对人脸转正GAN进行了压缩,减少了GAN中生成器的推理时间和模型大小。本文的实验证明了本文算法在相较于原网络大幅减少了计算量的情况下,压缩后的GAN网络仍然保持了较好的图片质量。 GAN performed extremely well in the frontal face image generation, the generated frontal face is very realistic favored by most researches. However, its powerful image generation ability comes from the huge calculation power and storage space required, and the more complex the GAN structure, the greater the demand for computation, which greatly limits its interactive deployment applications. To enhance the convenience of its deployment and reduce the computational requirements of the GAN, the paper proposes a general compression algorithm. The algorithm compresses the GAN model size of face frontalization and reduces the inference time. The experiments in this paper show that the compressed GAN network still obtains better image quality under the condition that the computation and storage load are greatly reduced compared with the original networks.
GAN,图像生成,人脸正面化,模型压缩, GAN Image Generation Face Frontalization Model Compression摘要
Efficient Compression of Face Frontalization GAN Model
Lei Wei, Weigen Qiu, Lichen Zhang
Guangdong University of Technology, Guangzhou Guangdong
Received: Feb. 23rd, 2021; accepted: Mar. 18th, 2021; published: Mar. 26th, 2021
ABSTRACT
GAN performed extremely well in the frontal face image generation, the generated frontal face is very realistic favored by most researches. However, its powerful image generation ability comes from the huge calculation power and storage space required, and the more complex the GAN structure, the greater the demand for computation, which greatly limits its interactive deployment applications. To enhance the convenience of its deployment and reduce the computational requirements of the GAN, the paper proposes a general compression algorithm. The algorithm compresses the GAN model size of face frontalization and reduces the inference time. The experiments in this paper show that the compressed GAN network still obtains better image quality under the condition that the computation and storage load are greatly reduced compared with the original networks.
Keywords:GAN, Image Generation, Face Frontalization, Model Compression
人脸正面化GAN的目标是学习从侧面人脸X到正面人脸Y的映射函数函数G,可使用成对的数据进行训练( { x i , y i } i = 1 N , x i ∈ X 且 y i ∈ Y )。N表示训练图片的数目。模型学习目标形式化如下,其中 E x , y ≜ E x , y ~ p data ( x , y ) , ‖ ‖ 1 表示L1正则模。
给定可能的通道配置 { c 1 , c 2 , ⋯ , c K } ,其中K是要修剪的层数,本文试图使用神经体系结构搜索找到最佳的通道配置 { c 1 ∗ , c 2 ∗ , ⋯ , c K ∗ } = arg min c 1 , c 2 , ⋯ , c K l ,s.t. MACs < Ft,其中Ft是计算约束。通常,直接的方法是遍历所有可能的通道配置,训练它们直至收敛,然后从中评估并选择性能最佳的生成器。但是,随着K的增加,可能配置的数量呈指数增长,并且每种配置可能需要在每个阶段设置学习率和权重的不同超参数,非常耗时。
3.3. 解耦训练和搜索
为了解决这个问题,本文模型遵循一步法神经体系结构搜索方法的最新研究 [23] [25] [36],将模型训练与体系结构搜索分离。首先训练一个支持不同通道数的“once-for-all”网络 [25]。具有不同数目通道的每个子网都经过相同的训练过程,可以独立运行。子网与“once-for-all”网络共享权重。图2说明了整个框架。假设原始教师生成器具有 { c k 0 } k = 1 K 个通道。对于给定的信道编号配置 { c k } k = 1 K , c k ≤ c k 0 ,通过从“once-for-all”的相应权重张量中提取开始的 { c k } k = 1 K 通道来获得子网的权重。遵循Guo等人的理论 [36],在每个训练步骤中,使用一定的通道数配置对子网进行随机抽样,计算输出和梯度,并使用学习目标来更新提取的权重(等式5)。由于前几个通道的权重被更频繁地更新,因此它们在所有权重中扮演着更为关键的角色。
魏 雷,邱卫根,张立臣. 人脸转正GAN模型的高效压缩Efficient Compression of Face Frontalization GAN Model[J]. 计算机科学与应用, 2021, 11(03): 661-671. https://doi.org/10.12677/CSA.2021.113068
参考文献ReferencesFerrari, C., Lisanti, G., Berretti, S. and Del Bimbo, A. (2016) Effective 3D Based Frontalization for Unconstrained Face Recognition. 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, 1047-1052.
<br>https://doi.org/10.1109/ICPR.2016.7899774Hassner, T., Harel, S., Paz, E. and Enbar, R. (2015) Effective Face Frontalization in Unconstrained Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 4295-4304.Jeni, L.A. and Cohn, J.F. (2016) Person-Independent 3d Gaze Estimation Using Face Frontalization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Las Vegas, 26 June-1 July 2016, 87-95.Booth, J., Roussos, A., Ponniah, A., et al. (2018) Large Scale 3D Morphable Models. International Journal of Computer Vision, 126, 233-254. <br>https://doi.org/10.1007/s11263-017-1009-7Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A. and Dunaway, D. (2016) A 3d Morphable Model Learnt from 10,000 Faces. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 26 June-1 July 2016, 5543-5552.Cao, J., Hu, Y., Zhang, H., et al. (2018) Learning a High Fidelity Pose Invariant Model for High-Resolution Face Frontalization.Huang, R., Zhang, S., Li, T., et al. (2017) Beyond Face Rotation: Global and Local Perception Gan for Photorealistic and Identity Preserving Frontal View Synthesis. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2439-2448. vision. 2439-2448.
<br>https://doi.org/10.1109/ICCV.2017.267Tian, Y., Peng, X., Zhao, L., et al. (2018) CR-GAN: Learning Com-plete Representations for Multi-View Generation. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Main Track, Stockholm, 13-19 July 2018, 942-948. <br>https://doi.org/10.24963/ijcai.2018/131Yin, X., Yu, X., Sohn, K., Liu, X.M. and Chandraker, M. (2017) Towards Large-Pose Face Frontalization in the Wild. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 3990-3999. <br>https://doi.org/10.1109/ICCV.2017.430Zhao, J., Cheng, Y., Xu, Y., Xiong, L., Li, J., Zhao, F., Jayashree, K., Pranata, S., Shen, S., Xing, J., et al. (2018) Towards Pose Invariant Face Recognition in the Wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), Salt Lake City, 18-23 June 2018, 2207-2216. <br>https://doi.org/10.1109/CVPR.2018.00235Li, M., Lin, J., Ding, Y., et al. (2020) GAN Compression: Efficient Architectures for Interactive Conditional Gans. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 14-19 June 2020, 5284-5294. <br>https://doi.org/10.1109/CVPR42600.2020.00533He, Y.H., Zhang, X.Y. and Sun, J. (2017) Channel Pruning for Accelerating Very Deep Neural Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 1389-1397.Shen, Y., Luo, P., Yan, J., Wang, X. and Tang, X. (2018) Faceid-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, 18-23 June 2018, 821-830. <br>https://doi.org/10.1109/CVPR.2018.00092He, Y.H., Lin, J., Liu, Z.J., Wang, H.R., Li, L.-J. and Han, S. (2018) AMC: AutoML for Model Compression and Acceleration on Mobile Devices. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 784-800. <br>https://doi.org/10.1007/978-3-030-01234-2_48Liu, Z.C., Mu, H.Y., Zhang, X.Y., Guo, Z.C., Yang, X., Kwang-Ting Cheng, T. and Sun, J. (2019) Metapruning: Meta Learning for Automatic Neural Network Channel Pruning. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 3296-3305. <br>https://doi.org/10.1109/ICCV.2019.00339Hinton, G., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network.Chen, G., Choi, W., Yu, X., et al. (2017) Learning Efficient Object Detection Models with Knowledge Distillation. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 742-751.Aguinaldo, A., Chiang, P.-Y., Gain, A., Patil, A., Pearson, K. and Feizi, S. (2019) Compressing Gans Using Knowledge Distillation.Zoph, B. and Le, Q.V. (2016) Neural Architecture Search with Reinforcement Learning.Liu, C., Zoph, B., Neumann, M., et al. (2018) Progres-sive Neural Architecture Search. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 19-34.
<br>https://doi.org/10.1007/978-3-030-01246-5_2Liu, H., Simonyan, K., Vinyals, O., et al. (2017) Hierarchical Representations for Efficient Architecture Search.Liu, H., Simonyan, K. and Yang, Y. (2018) Darts: Differentiable Architecture Search.Cai, H., Zhu, L. and Han, S. (2018) Proxylessnas: Direct Neural Architecture Search on Target Task and Hardware.Wu, B.C., et al. (2019) Fbnet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seoul, 27 October-2 November 2019, 10734-10742. <br>https://doi.org/10.1109/CVPR.2019.01099Cai, H., Gan, C., Wang, T., et al. (2019) Once-for-All: Train One Network and Specialize It for Efficient Deploy-ment.Azadi, S., Olsson, C., Darrell, T., et al. (2018) Discriminator Rejection Sampling.Chen, Y., Wang, N. and Zhang, Z. (2018) DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer. Proceedings of the AAAI Conference on Artificial Intelligence, 32, 2852-2859.
<br>https://ojs.aaai.org/index.php/AAAI/article/view/11783Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. CVPR, Boston, 7-12 June 2015, 3431-3440. <br>https://doi.org/10.1109/CVPR.2015.7298965He, K.M., Zhang, X.Y., Ren, S.Q. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), Las Vegas, 26 June-1 July 2016, 770-778.
<br>https://doi.org/10.1109/CVPR.2016.90Ronneberger, O., Fischer, P. and Brox, T. (2015) U-net: Convolution-al Networks for Biomedical Image Segmentation. MICCAI, Munich, 5-9 October 2015, 234-241. <br>https://doi.org/10.1007/978-3-319-24574-4_28Howard, A.G., Zhu, M.L., Chen, B., Kalenichenko, D., et al. (2017) Efficient Convolutional Neural Networks for Mobile Vision Applications.Johnson, J., et al. (2016) Per-ceptual Losses for Real-Time Style Transfer and Super-Resolution. In: European Conference on Computer Vision, Springer, Cham, 694-711. <br>https://doi.org/10.1007/978-3-319-46475-6_43Liu, Z., Li, J.G., Shen, Z.Q., Huang, G., Yan, S.M. and Zhang, C.S. (2017) Learning Efficient Convolutional Networks through Network Slimming. Pro-ceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 2736-2744.Zhuang, Z.W., Tan, M.K., Zhuang, B.H., Liu, J., Guo, Y., Wu, Q.Y., et al. (2018) Discrimina-tion-Aware Channel Pruning for Deep Neural Networks. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, 3-8 December 2018, 875-886.Luo, J.H., Wu, J. and Lin, W. (2017) ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. Proceedings of the IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 5058-5066.
<br>https://doi.org/10.1109/ICCV.2017.541Guo, Z.C., Zhang, X.Y., Mu, H.Y., Heng, W., Liu, Z.C., Wei, Y.C. and Sun, J. (2019) Single Path Oneshot Neural Architecture Search with Uniform Sampling.Gross, R., Matthews, I., Cohn, J., et al. (2010) Multi-Pie. Image and Vision Computing, 28, 807-813.
<br>https://doi.org/10.1016/j.imavis.2009.08.002Gao, W., Cao, B., Shan, S., et al. (2007) The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 38, 149-161.
<br>https://doi.org/10.1109/TSMCA.2007.909557