RELIABLE AI SYSTEMS IN HEALTHCARE (AI MEETS SRE)

Authors

  • Vijaybhasker Pagidoju Lead Site Reliability Engineer /Architect, Centene Corporation USA. Saint Charles, MO USA. Author

DOI:

https://doi.org/10.34218/IJCET_16_02_003

Keywords:

AI Reliability, Site Reliability Engineering (SRE), Predictive Monitoring, AIOps, Healthcare AI, Anomaly Detection, Machine Learning For Healthcare

Abstract

Modern healthcare increasingly relies on artificial intelligence (AI) for critical tasks such as diagnosis, risk prediction, and patient monitoring. Ensuring the reliability of these AI systems is paramount, especially in life-critical applications. This paper bridges Site Reliability Engineering (SRE) principles with clinical AI systems to enhance their robustness and uptime. We propose a comprehensive AIOps framework for predictive monitoring of healthcare AI, leveraging machine learning models and SRE practices to anticipate failures and maintain performance. Key contributions include an architecture that integrates real-time telemetry from AI models with automated anomaly detection, an algorithm for predictive error budgeting, and a case study using a hospital patient monitoring AI system. We demonstrate through experiments on public healthcare datasets that our approach can detect performance drifts and system anomalies before they impact patient care. The results show improved response time to incidents and higher system availability compared to traditional monitoring. We discuss the implications for industry adoption, emphasizing how an AI-driven SRE paradigm can proactively ensure the safety and reliability of AI in healthcare. Finally, we outline best practices and recommend avenues for integrating our framework into existing healthcare operations.

 

References

A. Bag, “Scaling Predictive Analytics With AIOps to Drive Next-Gen SRE,” DevOps.com, 2022. [Online]. Available: https://www.devops.com/scaling-predictive-analytics-with-aiops-to-drive-next-gen-sre/. [Accessed:Mar. 6, 2025].

IBM, An SRE Journey to AIOps, IBM Cloud Blog, 2021. [Online]. Available: https://www.ibm.com/cloud/blog/an-sre-journey-to-aiops. [Accessed: Mar. 6, 2025].

C. Leibig, C. Binnig, L. Dold, and M. Oswald, “Leveraging Uncertainty Estimates for Predicting AI Failures in Medical Imaging,” Nature Machine Intelligence, vol. 2, no. 9, pp. 523–531, 2020. [Online]. Available: https://doi.org/10.1038/s42256-020-0212-7. [Accessed: Mar. 6, 2025].

M. Mitchell et al., “Model Cards for Model Reporting,” in Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT)*, 2019, pp. 220–229. [Online]. Available :https://doi.org/10.1145/3287560.3287596. [Accessed: Mar. 6, 2025].

N. R. Murphy, P. K. Sahoo, D. Sculley, C. Chen, and T. Underwood, Reliable Machine Learning: Applying SRE Principles to ML in Production, O’Reilly Media, 2022. [Online]. Available:https://www.oreilly.com/library/view/reliable-machine-learning/9781492074139/. [Accessed: Mar. 6, 2025].

P. Rajpurkar et al., “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” arXiv preprint arXiv:1711.05225, 2017. [Online]. Available: https://arxiv.org/abs/1711.05225. [Accessed: Mar.6, 2025].

Splunk, “SRE Metrics: Core SRE Components, the Four Golden Signals, and the Service Level Indicators,” Splunk, 2021. [Online]. Available: https://www.splunk.com/en_us/blog/devops/sre-metrics-core-sre-components-the-four-golden-signals-and-the-service-level-indicators.html. [Accessed: Mar. 6, 2025].

S. D. Subramanyam and S. Pranavi, “Leveraging Data Analytics and Artificial Intelligence for Optimizing Medicaid Systems,” International Journal of Research in Engineering, Science and Management, vol. 8, no. 1, pp. 58–62, 2025. [Online]. Available: https://journal.ijresm.com/index.php/ijresm/article/view/3194. [Accessed: Mar. 6, 2025].

Downloads

Published

2023-03-10

How to Cite

Vijaybhasker Pagidoju. (2023). RELIABLE AI SYSTEMS IN HEALTHCARE (AI MEETS SRE). INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY, 16(2), 37-58. https://doi.org/10.34218/IJCET_16_02_003