ENHANCING DATA ANALYTICS WITH RETRIEVAL-AUGMENTED GENERATION WORKLOADS

Authors

  • Abhilash Nagilla JNTU-H, India. Author

DOI:

https://doi.org/10.34218/IJCET_16_01_244

Keywords:

Retrieval-Augmented Generation, Data Analytics Pipeline, Serverless Computing, Enterprise Analytics Integration, Real-time Processing

Abstract

This article presents an innovative approach to enhancing data analytics by integrating Retrieval-Augmented Generation (RAG) workloads into traditional analytics pipelines. The article introduces a comprehensive framework that combines advanced retrieval mechanisms utilizing vector search and embedding models, generative pipelines with transformer-based architectures, and integration layers to address the growing challenges in processing and analyzing large-scale data. Through extensive testing across multiple data categories, including financial, healthcare, manufacturing, and customer behavior datasets, the system achieved a 91.7% accuracy in semantic search capabilities and a 93.4% improvement in pattern recognition using dense vector embeddings. The retrieval component, powered by a neural network architecture with 875 million parameters and trained on 623 million records, significantly improved precision and recall metrics. The proposed framework incorporates serverless computing architecture and automated pipeline management, enhancing performance across various operational metrics. The implementation considerations and guidelines offer a practical roadmap for organizations seeking to modernize their analytics infrastructure while maintaining security and scalability, with the RAG-enhanced system showing a 76.3% reduction in latency compared to traditional methods while processing datasets up to 1.2 petabytes in size.

References

Lakshmi Haritha Medida and G.L.N.V.s. Kumar, "Addressing Challenges in Data Analytics: A Comprehensive Review and Proposed Solutions," ResearchGate, April 2024. [Online]. Available: https://www.researchgate.net/publication/379728406_Addressing_Challenges_in_Data_Analytics_A_Comprehensive_Review_and_Proposed_Solutions

GeeksforGeeks, "Challenges of Working with Unstructured Data in Data Engineering," 29 May 2024. [Online]. Available: https://www.geeksforgeeks.org/challenges-of-working-with-unstructured-data-in-data-engineering/

José Antonio Heredia Álvaro, Javier González Barreda, "An advanced retrieval-augmented generation system for manufacturing quality control," Advanced Engineering Informatics, Volume 64, March 2025, 103007. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S147403462400658X

Farid Shirazi et al., "Strategic integration of big data analytics in R&D: Impact on new product success in turbulent markets," Industrial Marketing Management, Volume 125, February 2025, Pages 303-318. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S0019850125000021

Daniel Bauer et al., "Building and Operating a Large-Scale Enterprise Data Analytics Platform," Big Data Research, Volume 23, 15 February 2021, 100181. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2214579620300496

Jorge Veiga et al., "Performance evaluation of big data frameworks for large-scale data analytics," ResearchGate, December 2016. [Online]. Available: https://www.researchgate.net/publication/313451303_Performance_evaluation_of_big_data_frameworks_for_large-scale_data_analytics

Emmanouil Skondras, "Performance Analysis and Optimization of Next Generation Wireless Networks," ResearchGate, April 2019. [Online]. Available: https://www.researchgate.net/publication/337911796_Performance_Analysis_and_Optimization_of_Next_Generation_Wireless_Networks

Daniil Larionov et al., "EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics," arXiv:2209.09593 [cs.CL], 31 Oct 2023. [Online]. Available: https://arxiv.org/abs/2209.09593

Zahra Shojaee Rad & Mostafa Ghobaei-Arani, "Data pipeline approaches in serverless computing: a taxonomy, review, and research trends," Journal of Big Data volume 11, Article number: 82 (2024), 11 June 2024. [Online]. Available: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-024-00939-0

Maria C. Solano and Juan C. Cruz, "Integrating Analytics in Enterprise Systems: A Systematic Literature Review of Impacts and Innovations," Adm. Sci. 2024, 14(7), 138, 30 June 2024. [Online]. Available: https://www.mdpi.com/2076-3387/14/7/138

Downloads

Published

2025-02-17

How to Cite

Abhilash Nagilla. (2025). ENHANCING DATA ANALYTICS WITH RETRIEVAL-AUGMENTED GENERATION WORKLOADS. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY, 16(01), 3524-3543. https://doi.org/10.34218/IJCET_16_01_244