AI-DRIVEN DYNAMIC DEPENDENCY GRAPH GENERATION FOR PREDICTIVE OBSERVABILITY IN DISTRIBUTED SYSTEMS
DOI:
https://doi.org/10.34218/IJCET_16_01_197Keywords:
AI-driven Observability, Dynamic Dependency Graphs, Graph Neural Networks (GNNs), Predictive System Monitoring, Distributed Systems Dependencies, Time-series Telemetry Analysis, Cascading Failure Prediction, Incident Management Optimization, Mean Time To Resolution (MTTR) ReductionAbstract
Modern distributed systems present intricate interdependencies that challenge traditional observability tools. This paper introduces an AI-driven approach to dynamic dependency graph generation, leveraging time-series telemetry data and graph neural networks (GNNs) to model complex system interactions. By dynamically mapping dependencies in real-time, this technique identifies latent bottlenecks, predicts cascading failures, and offers actionable insights for proactive system management. The proposed solution includes a pipeline for telemetry preprocessing, feature extraction, and dependency inference, validated against real-world distributed architectures. Experimental results demonstrate improved accuracy in dependency mapping and significant reductions in mean time to resolution (MTTR) for incident management. This study highlights the transformative potential of AI in advancing predictive observability and operational resilience.
References
Chandrasekaran, B., & Turner, A. (2021). Machine Learning for Observability: A New Paradigm. Journal of Distributed Systems, 34(2), 123-140.
Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive Representation Learning on Large Graphs. Advances in Neural Information Processing Systems.
Kipf, T. N., & Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR).
OpenTelemetry Contributors. (2022). OpenTelemetry: A Distributed Tracing and Metrics Framework. Available at: https://opentelemetry.io/
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.
Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2018). Graph Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. Advances in Neural Information Processing Systems.
Laskey, K. B. (2005). Decision Networks: A Probabilistic Approach to Decision Support. IEEE Transactions on Systems, Man, and Cybernetics.
Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2020). Graph Neural Networks: A Review of Methods and Applications. AI Open, 1, 57-81. https://arxiv.org/abs/1812.08434
Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful Are Graph Neural Networks? International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1810.00826
Hamilton, W. L. (2020). Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 14(3), 1-159. https://www.morganclaypool.com/doi/10.2200/S00931ED1V01Y202008AIM046
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828. https://doi.org/10.1109/TPAMI.2013.50
Sigelman, B., Barroso, L. A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, B., & Shanbhag, C. (2010). Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Google Research Paper. https://research.google/pubs/pub36356/
OpenTelemetry Contributors. (2023). OpenTelemetry: A Distributed Tracing and Metrics Framework. Available at: https://opentelemetry.io/
Hellerstein, J. M., & Mao, Y. (2010). A Quantitative Approach to Observability in Large-Scale Systems. Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI).
Peng, Z., Bai, Y., He, L., Jiang, J., & Liu, H. (2021). AI-Driven Observability for Large-Scale Cloud Systems: A Graph-Based Approach. IEEE Transactions on Cloud Computing, 9(3), 556-571.
Chen, J., Wang, Y., & Zhang, M. (2022). Self-Supervised Learning for Observability: Leveraging AI in Anomaly Detection. International Conference on Big Data Analytics.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Anusha Reddy Narapureddy (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.