ENHANCING DATA ANALYTICS WITH RETRIEVAL-AUGMENTED GENERATION WORKLOADS
DOI:
https://doi.org/10.34218/IJCET_16_01_244Keywords:
Retrieval-Augmented Generation, Data Analytics Pipeline, Serverless Computing, Enterprise Analytics Integration, Real-time ProcessingAbstract
This article presents an innovative approach to enhancing data analytics by integrating Retrieval-Augmented Generation (RAG) workloads into traditional analytics pipelines. The article introduces a comprehensive framework that combines advanced retrieval mechanisms utilizing vector search and embedding models, generative pipelines with transformer-based architectures, and integration layers to address the growing challenges in processing and analyzing large-scale data. Through extensive testing across multiple data categories, including financial, healthcare, manufacturing, and customer behavior datasets, the system achieved a 91.7% accuracy in semantic search capabilities and a 93.4% improvement in pattern recognition using dense vector embeddings. The retrieval component, powered by a neural network architecture with 875 million parameters and trained on 623 million records, significantly improved precision and recall metrics. The proposed framework incorporates serverless computing architecture and automated pipeline management, enhancing performance across various operational metrics. The implementation considerations and guidelines offer a practical roadmap for organizations seeking to modernize their analytics infrastructure while maintaining security and scalability, with the RAG-enhanced system showing a 76.3% reduction in latency compared to traditional methods while processing datasets up to 1.2 petabytes in size.
References
Lakshmi Haritha Medida and G.L.N.V.s. Kumar, "Addressing Challenges in Data Analytics: A Comprehensive Review and Proposed Solutions," ResearchGate, April 2024. [Online]. Available: https://www.researchgate.net/publication/379728406_Addressing_Challenges_in_Data_Analytics_A_Comprehensive_Review_and_Proposed_Solutions
GeeksforGeeks, "Challenges of Working with Unstructured Data in Data Engineering," 29 May 2024. [Online]. Available: https://www.geeksforgeeks.org/challenges-of-working-with-unstructured-data-in-data-engineering/
José Antonio Heredia Álvaro, Javier González Barreda, "An advanced retrieval-augmented generation system for manufacturing quality control," Advanced Engineering Informatics, Volume 64, March 2025, 103007. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S147403462400658X
Farid Shirazi et al., "Strategic integration of big data analytics in R&D: Impact on new product success in turbulent markets," Industrial Marketing Management, Volume 125, February 2025, Pages 303-318. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S0019850125000021
Daniel Bauer et al., "Building and Operating a Large-Scale Enterprise Data Analytics Platform," Big Data Research, Volume 23, 15 February 2021, 100181. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2214579620300496
Jorge Veiga et al., "Performance evaluation of big data frameworks for large-scale data analytics," ResearchGate, December 2016. [Online]. Available: https://www.researchgate.net/publication/313451303_Performance_evaluation_of_big_data_frameworks_for_large-scale_data_analytics
Emmanouil Skondras, "Performance Analysis and Optimization of Next Generation Wireless Networks," ResearchGate, April 2019. [Online]. Available: https://www.researchgate.net/publication/337911796_Performance_Analysis_and_Optimization_of_Next_Generation_Wireless_Networks
Daniil Larionov et al., "EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics," arXiv:2209.09593 [cs.CL], 31 Oct 2023. [Online]. Available: https://arxiv.org/abs/2209.09593
Zahra Shojaee Rad & Mostafa Ghobaei-Arani, "Data pipeline approaches in serverless computing: a taxonomy, review, and research trends," Journal of Big Data volume 11, Article number: 82 (2024), 11 June 2024. [Online]. Available: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-024-00939-0
Maria C. Solano and Juan C. Cruz, "Integrating Analytics in Enterprise Systems: A Systematic Literature Review of Impacts and Innovations," Adm. Sci. 2024, 14(7), 138, 30 June 2024. [Online]. Available: https://www.mdpi.com/2076-3387/14/7/138
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Abhilash Nagilla (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.