PERFORMANCE ANALYSIS OF STATE-OF-THE-ART MODELS FOR POSE-GUIDED PERSON IMAGE GENERATION
DOI:
https://doi.org/10.34218/IJCET_16_01_258Keywords:
DeepFashion, Evaluation Metrics, Market-1501, Performance Analysis, Pose-guided Person Image GenerationAbstract
Pose-guided person image generation is now a central field of study in computer vision, where sophisticated deep-learning methods are used to generate realistic images of people in a given pose. This work compares the performance of current state-of-the-art models on two benchmark datasets: DeepFashion and Market-1501. These datasets provide dense pose, clothing, and background variations and therefore are appropriate for quantifying model robustness. Evaluation is focused on key metrics such as Structural Similarity Index (SSIM), Fréchet Inception Distance (FID), and Inception Score (IS) to estimate the quality, realism, and diversity of the generated images. Our results identify the strengths and weaknesses of each model, providing important insights for future development in pose-guided image synthesis. We also bring into focus the challenges presented by human deformation and structural alignment, which are still the areas of utmost need for improvement.
References
Liqian Ma et al. “Pose guided person image generation”. In: Advances in Neural Information Processing Systems. 2017, pp. 405–415.
Wen Liu et al. “Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis”. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA, USA: IEEE Computer Society, Nov. 2019, pp. 5903–5912. doi: 10.1109/ICCV.2019.00600. url: https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00600.
Aliaksandr Siarohin et al. “Deformable GANs for Pose-Based Human Image Generation”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Dec. 2018, pp. 3408–3416. isbn: 9781538664209. doi: 10.1109/CVPR. 2018.00359.
Patrick Esser and Ekaterina Sutter. “A Variational U-Net for Conditional Appearance and Shape Generation”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2018, pp. 8857–8866. doi: 10.1109/CVPR.2018.00923. url: https://doi. ieeecomputersociety.org/10.1109/CVPR.2018.00923.
Han Yang et al. “Towards Photo-Realistic Virtual Try-On by Adaptively Generating ↔ Preserving Image Content”. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2020, pp. 7847–7856. doi: 10.1109/CVPR42600.2020.00787. url: https://doi. ieeecomputersociety.org/10.1109/CVPR42600.2020.00787.
Kang Yuan and Sheng Li. “2.5D pose guided human image generation”. In: Proceedings of the 2021 International Conference on Multimedia Retrieval. Association for Computing Machinery, Inc, Aug. 2021, pp. 501–505. isbn: 9781450384636. doi: 10.1145/3460426. 3463580.
Liqian Ma et al. “Disentangled Person Image Generation”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Dec. 2018, pp. 99–108. isbn: 9781538664209. doi: 10.1109/CVPR.2018.00018.
Karmakar. “A Robust Pose Transformational GAN for Pose Guided Person Image Synthesis”. In: Computer Vision, Pattern Recognition, Image Processing, and Graphics. Ed. by Babu. Singapore: Springer Singapore, 2020, pp. 89–99. isbn: 978-981-15-8697-2.
Meichen Liu et al. “Person image generation with semantic attention network for person re-identification”. In: arXiv preprint arXiv:2008.07884 (2020).
Stéphane Lathuilière et al. “Attention-based Fusion for Multi-source Human Image Generation”. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). 2020, pp. 428–437. doi: 10.1109/WACV45572.2020.9093602.
Artur Grigorev et al. “Coordinate-based Texture Inpainting for Pose-Guided Image Generation”. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Nov. 2018). url: http://arxiv.org/abs/1811.11459.
Natalia Neverova, Rıza Alp Güler, and Iasonas Kokkinos. “Dense Pose Transfer”. In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part III. Munich, Germany: Springer-Verlag, 2018, pp. 128– 143. isbn: 978-3-030-01218-2. doi: $10.1007/978-3-030-01219-9_8$. url: https: //doi.org/10.1007/978-3-030-01219-9_8.
Matthew Loper et al. “SMPL”. In: ACM Transactions on Graphics 34.6 (Oct. 2015), pp. 1–16. doi: 10.1145/2816795.2818013. url: https://doi.org/10.1145% 2F2816795.2818013.
Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. “Densepose: Dense human pose estimation in the wild”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 7297–7306.
Chenyang Si et al. “Multistage Adversarial Losses for Pose-Based Human Image Synthesis”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Dec. 2018, pp. 118–126. isbn: 9781538664209. doi: 10.1109/ CVPR.2018.00020.
Albert Pumarola et al. “Unsupervised Person Image Synthesis in Arbitrary Poses”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Dec. 2018, pp. 8620–8628. isbn: 9781538664209. doi: 10.1109/ CVPR.2018.00899.
Aliaksandr Siarohin et al. “Deformable GANs for Pose-Based Human Image Generation”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, June 2018. doi: 10.1109/cvpr.2018.00359. url: https://doi.org/10.1109%2Fcvpr. 2018.00359.
Guha Balakrishnan et al. “Synthesizing Images of Humans in Unseen Poses”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Dec. 2018, pp. 8340–8348. isbn: 9781538664209. doi: 10.1109/CVPR.2018.00870.
Haoye Dong et al. “Soft-gated warping-GAN for pose-guided person image synthesis”. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Montréal, Canada: Curran Associates Inc., 2018, pp. 472–482.
Haoye Dong et al. “Part-preserving pose manipulation for person image synthesis”. In: 2019 IEEE International Conference on Multimedia and Expo (ICME). Vol. 2019-July. IEEE Computer Society, July 2019, pp. 1234–1239. isbn: 9781538695524. doi: 10.1109/ ICME.2019.00215.
Yining Li, Chen Huang, and Chen Change Loy. “Dense intrinsic appearance flow for human pose transfer”. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 2019-June. IEEE Computer Society, June 2019, pp. 3688–3697. isbn: 9781728132938. doi: 10.1109/CVPR.2019.00381.
Dong Liang et al. “PCGAN: partition-controlled human image generation”. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI’19/IAAI’19/EAAI’19. Honolulu, Hawaii, USA: AAAI Press,
isbn: 978-1-57735-809-1.doi: 10.1609/aaai.v33i01.33018698.url: https://doi.org/10.1609/aaai.v33i01.33018698.
Mohamed Ilyes Lakhal, Oswald Lanz, and Andrea Cavallaro. “Pose Guided Human Image Synthesis by View Disentanglement and Enhanced Weighting Loss”. In: Computer Vision – ECCV 2018 Workshops. Ed. by Laura Leal-Taixé and Stefan Roth. Cham: Springer International Publishing, 2019, pp. 380–394. isbn: 978-3-030-11012-3.
Wei Sun et al. “Pose Guided Fashion Image Synthesis Using Deep Generative Model”. In: ArXiv abs/1906.07251 (2019). url: https://api.semanticscholar.org/CorpusID: 189998692.
Sijie Song et al. “Unsupervised Person Image Generation with Semantic Parsing Transformation”. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2019, pp. 2352– 2361. doi: 10.1109/CVPR.2019.00246. url: https://doi.ieeecomputersociety.org/ 10.1109/CVPR.2019.00246.
Zhen Zhu et al. “Progressive pose attention transfer for person image generation”. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vol. 2019June. IEEE Computer Society, June 2019, pp. 2342–2351. isbn: 9781728132938. doi: 10.1109/CVPR.2019.00245.
Xintong Han et al. “ClothFlow: A flow-based model for clothed person generation”. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Vol. 2019October. Institute of Electrical and Electronics Engineers Inc., Oct. 2019, pp. 10470– 10479. isbn: 9781728148038. doi: 10.1109/ICCV.2019.01057.
Chengming Xu et al. “Pose-Guided Person Image Synthesis in the Non-Iconic Views”. In: IEEE Transactions on Image Processing 29 (2020), pp. 9060–9072. doi: 10.1109/TIP. 2020.3023853.
Chengkang Shen, Peiyan Wang, and Wei Tang. “Two-Stream Appearance Transfer Network for Person Image Generation”. In: ArXiv abs/2011.04181 (2020). url: https:// api.semanticscholar.org/CorpusID:226282116.
Wenbin Zhao et al. “Pose Guided Person Image Generation Based on Pose Skeleton Sequence and 3D Convolution”. In: 2020 IEEE International Conference on Image Processing (ICIP). 2020, pp. 1561–1565. doi: 10.1109/ICIP40778.2020.9190773.
Tianxiang Ma et al. “MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation”. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2021, pp. 13617–13626. doi: 10.1109/CVPR46437.2021.01341. url: https://doi.ieeecomputersociety org/10.1109/CVPR46437.2021.01341.
Kun Li et al. “PoNA: Pose-Guided Non-Local Attention for Human Pose Transfer”. In: IEEE Transactions on Image Processing 29 (Oct. 2020), pp. 9584–9599. issn: 1057-7149. doi: 10.1109/tip.2020.3029455.
Hao Tang et al. “XingGAN for Person Image Generation”. In: Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV. Glasgow, United Kingdom: Springer-Verlag, 2020, pp. 717–734. isbn: 978-3-03058594-5. doi: $10.1007/978-3-030-58595-2_43$. url: https://doi.org/10.1007/ 978-3-030-58595-2_43.
Yurui Ren et al. “Deep Image Spatial Transformation for Person Image Generation”. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 7687–7696. url: https://api.semanticscholar.org/CorpusID:211677243.
Ting-yao Hu and Alexander G. Hauptmann. “Pose Guided Person Image Generation with Hidden P-Norm Regression”. In: 2021 IEEE International Conference on Image Processing (ICIP) (2021), pp. 2423–2427. url: https://api.semanticscholar.org/ CorpusID:231979349.
Lingbo Yang et al. “Towards Fine-Grained Human Pose Transfer with Detail Replenishing Network”. In: IEEE Transactions on Image Processing 30 (2021), pp. 2422–2435. issn: 19410042. doi: 10.1109/TIP.2021.3052364.
Yuelong Li, Tongshun Zhang, and Jianming Wang. “SPMPG: ROBUST PERSON IMAGE GENERATION WITH SEMANTIC PARSING MAP”. In: 2021 IEEE International Conference on Image Processing (ICIP). Vol. 2021-September. IEEE Computer Society, 2021, pp. 1364–1368. isbn: 9781665441155. doi: 10.1109/ICIP42928.2021.9506397.
Zhou Xiaomao, Wang Wei, and Du Bing. “PSG-GAN: Progressive Person Image Generation with Self-Guided Local Focuses”. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI). Vol. 2021-November. IEEE Computer Society, 2021, pp. 763–769. isbn: 9781665408981. doi: 10.1109/ICTAI52525.2021.00121.
Badour Albahar et al. “Pose with style”. In: ACM Transactions on Graphics 40 (6 Dec. 2021). issn: 15577368. doi: 10.1145/3478513.3480559.
Amena Khatun et al. “Pose-driven Attention-guided Image Generation for Person ReIdentification”. In: Pattern Recogn. (Apr. 2021). url: http://arxiv.org/abs/2104. 13773.
Meichen Liu et al. “Pose transfer generation with semantic parsing attention network for person re-identification”. In: Knowledge-Based Systems 223 (July 2021). issn: 09507051. doi: 10.1016/j.knosys.2021.107024.
Jilin Tang et al. “Structure-aware person image generation with pose decomposition and semantic correlation”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. 2021, pp. 2656–2664.
P. Zhang et al. “Exploring Dual-task Correlation for Pose Guided Person Image Generation”. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2022, pp. 7703–7712. doi: 10.1109/CVPR52688.2022.00756. url: https://doi.ieeecomputersociety. org/10.1109/CVPR52688.2022.00756.
Jiaxiang Chen et al. “Exploring Kernel-based Texture Transfer for Pose-guided Person Image Generation”. In: IEEE Transactions on Multimedia (2022). issn: 19410077. doi: 10.1109/TMM.2022.3221351.
Zijian Wang et al. “Self-supervised correlation mining network for person image generation”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 7703–7712.
A. Kumar Bhunia et al. “Person Image Synthesis via Denoising Diffusion Model”. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2023, pp. 5968–5976. doi: 10.1109/ CVPR52729.2023.00578. url: https://doi.ieeecomputersociety.org/10.1109/ CVPR52729.2023.00578.
Ji Liu and Yuesheng Zhu. “Precise Correspondence Enhanced GAN for Person Image Generation”. In: Neural Processing Letters 54 (6 Dec. 2022), pp. 5125–5142. issn: 1573773X. doi: 10.1007/s11063-022-10853-2.
Hidemoto Nakada and Hideki Asoh. “A Method to Generate Posed Person Image with few Context Images”. In: 2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM). Institute of Electrical and Electronics Engineers Inc., 2022. isbn: 9781665426787. doi: 10.1109/IMCOM53663.2022.9721635.
Zhengbin Yan et al. “SDAN: Semantic-Driven Dual Attentional Network for Image Generation”. In: 2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). Institute of Electrical and Electronics Engineers Inc., 2022, pp. 521– 525. isbn: 9781665499163. doi: 10.1109/PRAI55851.2022.9904248.
Baoyu Chen et al. “PMAN: Progressive Multi-Attention Network for Human Pose Transfer”. In: IEEE Transactions on Circuits and Systems for Video Technology 32 (1 Jan. 2022), pp. 302–314. issn: 15582205. doi: 10.1109/TCSVT.2021.3059706.
Pengze Zhang et al. “Lightweight Texture Correlation Network for Pose Guided Person Image Generation”. In: IEEE Transactions on Circuits and Systems for Video Technology 32 (7 July 2022), pp. 4584–4598. issn: 15582205. doi: 10.1109/TCSVT.2021.3131738.
Rishabh Jain et al. “VGFlow: Visibility guided Flow Network for Human Reposing”. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2023, pp. 21088–21097. doi: 10.1109/CVPR52729.2023.02020. url: https://doi.ieeecomputersociety.org/10.1109/CVPR52729.2023.02020.
Jiawei Lu et al. “Pose guided image generation from misaligned sources via residual flow based correction”. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. 2022, pp. 1863–1871.
Wei Wei et al. “Style-content-aware Adaptive Normalization Based Pose Guided for Person Image Synthesis”. In: IEEE Access (June 2023), pp. 1–1. issn: 21693536. doi: 10.1109/access.2023.3290102.
Liyuan Ma et al. “Multi-scale cross-domain alignment for person image generation”. In: CAAI Transactions on Intelligence Technology (2023). issn: 24682322. doi: 10.1049/ cit2.12224.
Pengze Zhang et al. “Pose Guided Person Image Generation via Dual-task Correlation and Affinity Learning”. In: IEEE Transactions on Visualization and Computer Graphics (2023). issn: 19410506. doi: 10.1109/TVCG.2023.3286394.
Yuan Huang et al. “CPD-GAN: Cascaded Pyramid Deformation GAN for Pose Transfer”. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2023, pp. 1–5. doi: 10.1109/ICASSP49357.2023.10096856.
Yang Liu et al. “CoGAN: Cooperatively trained conditional and unconditional GAN for person image generation”. In: IET Image Processing 17.10 (June 2023), pp. 2949–2957. doi: 10.1049/ipr2.12843. url: https://doi.org/10.1049%2Fipr2.12843.
Meng Wang, Jiaxing Chen, and Haipeng Liu. “A novel Multi-scale architecture driven by decoupled semantic attention transfer for person image generation”. In: Computers and Graphics (Pergamon) 111 (Apr. 2023), pp. 24–36. issn: 00978493. doi: 10.1016/j.cag. 2023.01.003.
Prasun Roy et al. “Multi-scale attention guided pose transfer”. In: Pattern Recognition 137 (May 2023). issn: 00313203. doi: 10.1016/j.patcog.2023.109315.
Mykhaylo Andriluka et al. “2D Human Pose Estimation: New Benchmark and State of the Art Analysis”. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014, pp. 3686–3693. doi: 10.1109/CVPR.2014.471.
Ziwei Liu et al. “DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016, pp. 1096–1104. doi: 10.1109/CVPR.2016.124.
Liang Zheng et al. “Scalable Person Re-identification: A Benchmark”. In: 2015 IEEE International Conference on Computer Vision (ICCV). 2015, pp. 1116–1124. doi: 10.1109/ICCV.2015.133.
Zhou Wang et al. “Image quality assessment: from error visibility to structural similarity”. In: IEEE Transactions on Image Processing 13.4 (2004), pp. 600–612. doi: 10.1109/TIP. 2003.819861.
Tim Salimans et al. “Improved techniques for training gans”. In: Advances in neural information processing systems 29 (2016).
Martin Heusel et al. “Gans trained by a two time-scale update rule converge to a local nash equilibrium”. In: Advances in neural information processing systems 30 (2017).
Richard Zhang et al. “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric”. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018). doi:
1109/cvpr.2018.00068.
Ali Borji. “Pros and cons of GAN evaluation measures”. In: Computer vision and image understanding 179 (2019), pp. 41–65.
Jost Tobias Springenberg. “Unsupervised and semi-supervised learning with categorical generative adversarial networks”. In: arXiv preprint arXiv:1511.06390 (2015).
Zhen Jia et al. “Human image generation: A comprehensive survey”. In: ACM Computing Surveys 56.11 (2024), pp. 1–39.
Shane Barratt and Rishi Sharma. “A note on the inception score”. In: arXiv preprint arXiv:1801.01973 (2018).
Zhe Cao et al. “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields”. In: IEEE Transactions on Pattern Analysis & Machine Intelligence 43.01 (Jan. 2021), pp. 172–186. issn: 1939-3539. doi: 10.1109/TPAMI.2019.2929257. url: https://doi.ieeecomputersociety.org/10.1109/TPAMI.2019.2929257.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Biponjot Kaur , Sarbjeet Singh (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.