A UNIFIED FRAMEWORK FOR CHAOS ENGINEERING AND SLO MANAGEMENT IN MULTI-CLOUD ENVIRONMENTS

Authors

  • Dileep Kumar Reddy Lankala Southern Arkansas University, USA. Author

DOI:

https://doi.org/10.34218/IJCET_16_01_238

Keywords:

Chaos Engineering, Cloud Computing, Multi-Cloud Architecture, Service Level Objectives (SLOs), System Reliability

Abstract

This article introduces a comprehensive framework that integrates chaos engineering with Service Level Objective (SLO) management in multi-cloud environments. The framework addresses the growing challenges of maintaining consistent performance and reliability across distributed systems spanning multiple cloud providers. By combining controlled fault injection with real-time monitoring and automated incident response, the framework enables organizations to proactively identify system weaknesses and enhance resilience. The approach incorporates sophisticated pattern recognition algorithms, automated remediation procedures, and intelligent load balancing capabilities to ensure optimal performance across diverse cloud platforms. Through case studies in financial services and e-commerce sectors, the framework demonstrates significant improvements in system reliability, incident response times, and overall operational efficiency. The implementation results validate the effectiveness of integrating chaos engineering principles with SLO management for maintaining robust cloud systems.

References

Raja Muhammad Ubaid Ullah, et al., "Cloud Computing Adoption in Enterprise: Challenges and Benefits," International Journal of Computer Trends and Technology (IJCTT) – Volume 67 Issue 6 - June 2019. [Online]. Available: https://ijcttjournal.org/2019/Volume-67%20Issue-6/IJCTT-V67I6P116.pdf

Bhavana Chaurasia, et al., "A Comprehensive Study on Failure Detectors of Distributed Systems," Journal of Scientific Research 64(02):250-260, 2020. [Online]. Available: https://www.researchgate.net/publication/343168303_A_Comprehensive_Study_on_Failure_Detectors_of_Distributed_Systems

Danilo Ardagna, "Cloud and Multi-cloud Computing: Current Challenges and Future Applications," IEEE/ACM 7th International Workshop on Principles of Engineering Service-Oriented and Cloud Systems, 2015. [Online]. Available: https://ieeexplore.ieee.org/document/7172841

Sara Palacios Chavarro, et al., "On the Way to Automatic Exploitation of Vulnerabilities and Validation of Systems Security through Security Chaos Engineering," Big Data Cogn. Comput. 2023, 2023. [Online]. Available: https://www.mdpi.com/2504-2289/7/1/1

Pethuru Raj, et al., "The Observability, Chaos Engineering, and Remediation for Cloud‐Native Reliability," Wiley-IEEE Press, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9930697

Henrik Karlzen, et al., "Automatic incident response solutions: a review of proposed solutions’ input and output," The 18th International Conference on Availability, Reliability and Security, 2023. [Online]. Available: https://www.researchgate.net/publication/373483648_Automatic_incident_response_solutions_a_review_of_proposed_solutions'_input_and_output

Filippo Poltronieri, "A Chaos Engineering Approach for Improving the Resiliency of IT Services Configurations," NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9789887

Swethasri Kavuri, et al., "Implementing Effective SLO Monitoring in High-Volume Data Processing Systems," International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 6, no. 4, pp. 479-487, 2020. [Online]. Available: https://ijsrcseit.com/paper/CSEIT206479.pdf

Quadri Waseem, et al., "Quantitative Analysis and Performance Evaluation of Target-Oriented Replication Strategies in Cloud Computing," Electronics, vol. 10, no. 6, pp. 672, 2021. [Online]. Available: https://www.mdpi.com/2079-9292/10/6/672

JS Saini, et al., "Performance analysis of a distributed processing system — a case study," Microprocessors and Microsystems, Volume 9, Issue 4, May 1985, Pages 184-190. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/0141933185900055

Garima, et al., "Machine Learning Approach for Cloud Computing Security," 3rd International Conference on Intelligent Engineering and Management (ICIEM), 2022. [Online]. Available: https://www.researchgate.net/publication/362774311_Machine_Learning_Approach_for_Cloud_Computing_Security

Sukhpal Singh Gill , et al., "AI for next generation computing: Emerging trends and future directions," Internet of Things, Volume 19, August 2022, 100514. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S254266052200018X

Downloads

Published

2025-02-17

How to Cite

Dileep Kumar Reddy Lankala. (2025). A UNIFIED FRAMEWORK FOR CHAOS ENGINEERING AND SLO MANAGEMENT IN MULTI-CLOUD ENVIRONMENTS. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY, 16(01), 3425-3440. https://doi.org/10.34218/IJCET_16_01_238