NEXT-GENERATION DATA PIPELINES: AUTOMATING WORKFLOWS WITH PYTHON WHEELS

Authors

  • Prakash Babu Sankuri USA Author

Keywords:

Python Wheels, ETL/ELT Architecture, Workflow Optimization, Data Engineering

Abstract

This technical article explores the evolution and implementation of next-generation data pipelines, focusing on the transformative role of Python wheels in automating workflows. The article examines how modern data engineering practices have progressed from traditional ETL processes to sophisticated ELT architectures, addressing the growing demands of complex analytical requirements. The article investigates the fundamental components of Python wheels, their integration with existing systems, and their impact on deployment efficiency and resource optimization. This article demonstrates how wheel-based automation has revolutionized data pipeline management across various industries through a comprehensive article analysis of implementation best practices, performance optimization techniques, and real-world applications. The article encompasses critical aspects such as dependency management, caching strategies, parallel processing capabilities, and memory optimization techniques while addressing the challenges organizations face in maintaining data lineage and regulatory compliance. This article explains how Python wheels have become a cornerstone technology in modern data engineering, enabling organizations to build more resilient, maintainable, and efficient data pipelines while ensuring operational excellence and security.

References

Miguel Garcia, "The Evolution of Data Pipelines: ETL, ELT, and the Rise of Reverse ETL," IEEE Xplore, 2023. https://dzone.com/articles/the-evolution-of-data-pipelines

Junwen Liu, "A Survey of Modern Scientific Workflow Scheduling," IEEE International Conference on Services Computing (SCC), 2020. https://ieeexplore.ieee.org/abstract/document/9284517

J. van der Vegt, "Deploy Production Pipelines Even Easier With Python Wheel Tasks," Databricks Blog, Feb. 14, 2022. https://www.databricks.com/blog/2022/02/14/deploy-production-pipelines-even-easier-with-python-wheel-tasks.html

Nike-Inc, "Koheesio: A Python Framework for Efficient Data Pipelines," GitHub Repository, 2024. https://github.com/Nike-Inc/koheesio/

Junwen Liu, "A Survey of Modern Scientific Workflow Scheduling," IEEE International Conference on Services Computing (SCC), 2020. https://ieeexplore.ieee.org/document/9284517

Vedran Kasalica, "Automated Composition of Scientific Workflows: A Case Study," IEEE Xplore, 2019. https://ieeexplore.ieee.org/document/8588718

Gupta, J., Kant, K., & Abouelwafa, A. (2020). "FussyCache: A Caching Mechanism for Emerging Storage Hierarchies" https://ieeexplore.ieee.org/abstract/document/9407317

Zhenyun Zhuang, "SmartCache: Application Layer Caching to Improve Performance" (2016). https://ieeexplore.ieee.org/document/7840762

W.T. Tsai, "Scenario-based Object-Oriented Test Frameworks for Testing Distributed Systems," IEEE Conference Publication, 2003. https://ieeexplore.ieee.org/document/1204349

Jehad Al Dallal, "Automation of Object-Oriented Framework Application Testing," IEEE GCC Conference & Exhibition, 2009. https://ieeexplore.ieee.org/document/5734312

M Aiswarya Raj, "On the Impact of ML use cases on Industrial Data Pipelines," https://ieeexplore.ieee.org/document/9712003

GitHub, "Building Modern Data Applications Using Databricks," https://github.com/PacktPublishing/Building-Modern-Data-Applications-Using-Databricks-Lakehouse

Published

2025-01-17

How to Cite

Prakash Babu Sankuri. (2025). NEXT-GENERATION DATA PIPELINES: AUTOMATING WORKFLOWS WITH PYTHON WHEELS. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY, 16(01), 821-834. https://ijcet.in/index.php/ijcet/article/view/251