GENERATIVE AI FOR DATA-CENTRIC AI DEVELOPMENT
Keywords:
Data-centric, AI, Healthcare, DevelopmentAbstract
In this paper, we explore the work of Generative AI in accelerating data centric development of AI. Synthetic data generation, automated segmentation, and domain specific solutions in the healthcare and autonomous driving domain are just a few key applications. The paper presents strategies for realism, bias mitigation, and privacy preservation, and discusses directions of future advancement in underserved domains and collaborative dataset generation.
References
Wang, D., Huang, Y., Ying, W., Bai, H., Gong, N., Wang, X., ... & Fu, Y. (2025). Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation. arXiv preprint arXiv:2501.10555. https://doi.org/10.48550/arXiv.2501.10555
Wang, A. X. (2024). Data-Centric AI: Tabular Data Synthesis with Deep Generative Models (Doctoral dissertation, Open Access Te Herenga Waka-Victoria University of Wellington). https://doi.org/10.26686/wgtn.27014419
Kumar, S., Datta, S., Singh, V., Singh, S. K., & Sharma, R. (2024). Opportunities and Challenges in Data-Centric AI. IEEE Access. 10.1109/ACCESS.2024.3369417
Wang, D. (2024). Data-Centric AI: Taming AI-ready Feature Space from Decision-Making to Generative-AI Perspectives. https://stars.library.ucf.edu/etd2023/462
Dhoni, P. (2023). Exploring the synergy between generative AI, data and analytics in the modern age. Authorea Preprints. https://d197for5662m48.cloudfront.net/documents/publicationstatus/171558/preprint_pdf/298f6c840def1f093447a60ab1265e49.pdf
Zha, D., Bhat, Z. P., Lai, K. H., Yang, F., & Hu, X. (2023). Data-centric ai: Perspectives and challenges. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM) (pp. 945-948). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611977653.ch106
Singh, P. (2023). Systematic review of data-centric approaches in artificial intelligence and machine learning. Data Science and Management, 6(3), 144-157. https://doi.org/10.1016/j.dsm.2023.06.001
Mazumder, M., Banbury, C., Yao, X., Karlaš, B., Gaviria Rojas, W., Diamos, S., ... & Janapa Reddi, V. (2023). Dataperf: Benchmarks for data-centric ai development. Advances in Neural Information Processing Systems, 36, 5320-5347. https://proceedings.neurips.cc/paper_files/paper/2023/file/112db88215e25b3ae2750e9eefcded94-Paper-Datasets_and_Benchmarks.pdf
Ghaisas, S., & Singhal, A. (2024). Dealing with Data for RE: Mitigating Challenges using NLP and Generative AI. arXiv preprint arXiv:2402.16977. https://doi.org/10.48550/arXiv.2402.16977
Chan, C. Y. (2024). Data-Driven Innovation: The Potential of Synthetic Data through Generative AI. https://urn.fi/URN:NBN:fi:amk-202405079846
Kaplan, S. (2024). Data-centric remedies for challenges in computer vision applications: insights from active learning, deep generative models, and explainable AI. https://urn.fi/URN:ISBN:978-952-412-075-3
Inala, J. P., Wang, C., Drucker, S., Ramos, G., Dibia, V., Riche, N., ... & Gao, J. (2024). Data Analysis in the Era of Generative AI. arXiv preprint arXiv:2409.18475. https://doi.org/10.48550/arXiv.2409.18475
Hansen, L., Seedat, N., van der Schaar, M., & Petrovic, A. (2023). Reimagining synthetic tabular data generation through data-centric AI: A comprehensive benchmark. Advances in Neural Information Processing Systems, 36, 33781-33823. https://proceedings.neurips.cc/paper_files/paper/2023/file/6aa9a05b929fb08ff46a58cab6cf860d-Paper-Datasets_and_Benchmarks.pdf
Garcia, E. Smart City and IoT Data Collection Leveraging Generative AI. https://philpapers.org/archive/GARSCA-9.pdf
Subramonyam, H., Thakkar, D., Dieber, J., & Sinha, A. (2024). Content-Centric Prototyping of Generative AI Applications: Emerging Approaches and Challenges in Collaborative Software Teams. arXiv preprint arXiv:2402.17721. https://doi.org/10.48550/arXiv.2402.17721
Published
Issue
Section
License
Copyright (c) 2025 Meethun Panda (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.