A UNIFIED FRAMEWORK FOR CHAOS ENGINEERING AND SLO MANAGEMENT IN MULTI-CLOUD ENVIRONMENTS
DOI:
https://doi.org/10.34218/IJCET_16_01_238Keywords:
Chaos Engineering, Cloud Computing, Multi-Cloud Architecture, Service Level Objectives (SLOs), System ReliabilityAbstract
This article introduces a comprehensive framework that integrates chaos engineering with Service Level Objective (SLO) management in multi-cloud environments. The framework addresses the growing challenges of maintaining consistent performance and reliability across distributed systems spanning multiple cloud providers. By combining controlled fault injection with real-time monitoring and automated incident response, the framework enables organizations to proactively identify system weaknesses and enhance resilience. The approach incorporates sophisticated pattern recognition algorithms, automated remediation procedures, and intelligent load balancing capabilities to ensure optimal performance across diverse cloud platforms. Through case studies in financial services and e-commerce sectors, the framework demonstrates significant improvements in system reliability, incident response times, and overall operational efficiency. The implementation results validate the effectiveness of integrating chaos engineering principles with SLO management for maintaining robust cloud systems.
References
Raja Muhammad Ubaid Ullah, et al., "Cloud Computing Adoption in Enterprise: Challenges and Benefits," International Journal of Computer Trends and Technology (IJCTT) – Volume 67 Issue 6 - June 2019. [Online]. Available: https://ijcttjournal.org/2019/Volume-67%20Issue-6/IJCTT-V67I6P116.pdf
Bhavana Chaurasia, et al., "A Comprehensive Study on Failure Detectors of Distributed Systems," Journal of Scientific Research 64(02):250-260, 2020. [Online]. Available: https://www.researchgate.net/publication/343168303_A_Comprehensive_Study_on_Failure_Detectors_of_Distributed_Systems
Danilo Ardagna, "Cloud and Multi-cloud Computing: Current Challenges and Future Applications," IEEE/ACM 7th International Workshop on Principles of Engineering Service-Oriented and Cloud Systems, 2015. [Online]. Available: https://ieeexplore.ieee.org/document/7172841
Sara Palacios Chavarro, et al., "On the Way to Automatic Exploitation of Vulnerabilities and Validation of Systems Security through Security Chaos Engineering," Big Data Cogn. Comput. 2023, 2023. [Online]. Available: https://www.mdpi.com/2504-2289/7/1/1
Pethuru Raj, et al., "The Observability, Chaos Engineering, and Remediation for Cloud‐Native Reliability," Wiley-IEEE Press, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9930697
Henrik Karlzen, et al., "Automatic incident response solutions: a review of proposed solutions’ input and output," The 18th International Conference on Availability, Reliability and Security, 2023. [Online]. Available: https://www.researchgate.net/publication/373483648_Automatic_incident_response_solutions_a_review_of_proposed_solutions'_input_and_output
Filippo Poltronieri, "A Chaos Engineering Approach for Improving the Resiliency of IT Services Configurations," NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, 2022. [Online]. Available: https://ieeexplore.ieee.org/document/9789887
Swethasri Kavuri, et al., "Implementing Effective SLO Monitoring in High-Volume Data Processing Systems," International Journal of Scientific Research in Computer Science, Engineering and Information Technology, vol. 6, no. 4, pp. 479-487, 2020. [Online]. Available: https://ijsrcseit.com/paper/CSEIT206479.pdf
Quadri Waseem, et al., "Quantitative Analysis and Performance Evaluation of Target-Oriented Replication Strategies in Cloud Computing," Electronics, vol. 10, no. 6, pp. 672, 2021. [Online]. Available: https://www.mdpi.com/2079-9292/10/6/672
JS Saini, et al., "Performance analysis of a distributed processing system — a case study," Microprocessors and Microsystems, Volume 9, Issue 4, May 1985, Pages 184-190. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/0141933185900055
Garima, et al., "Machine Learning Approach for Cloud Computing Security," 3rd International Conference on Intelligent Engineering and Management (ICIEM), 2022. [Online]. Available: https://www.researchgate.net/publication/362774311_Machine_Learning_Approach_for_Cloud_Computing_Security
Sukhpal Singh Gill , et al., "AI for next generation computing: Emerging trends and future directions," Internet of Things, Volume 19, August 2022, 100514. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S254266052200018X
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Dileep Kumar Reddy Lankala (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.