AI-DRIVEN DYNAMIC DEPENDENCY GRAPH GENERATION FOR PREDICTIVE OBSERVABILITY IN DISTRIBUTED SYSTEMS

Anusha Reddy Narapureddy

doi:10.34218/IJCET_16_01_197

Authors

Anusha Reddy Narapureddy IEEE Senior Member, USA. Author

DOI:

https://doi.org/10.34218/IJCET_16_01_197

Keywords:

AI-driven Observability, Dynamic Dependency Graphs, Graph Neural Networks (GNNs), Predictive System Monitoring, Distributed Systems Dependencies, Time-series Telemetry Analysis, Cascading Failure Prediction, Incident Management Optimization, Mean Time To Resolution (MTTR) Reduction

Abstract

Modern distributed systems present intricate interdependencies that challenge traditional observability tools. This paper introduces an AI-driven approach to dynamic dependency graph generation, leveraging time-series telemetry data and graph neural networks (GNNs) to model complex system interactions. By dynamically mapping dependencies in real-time, this technique identifies latent bottlenecks, predicts cascading failures, and offers actionable insights for proactive system management. The proposed solution includes a pipeline for telemetry preprocessing, feature extraction, and dependency inference, validated against real-world distributed architectures. Experimental results demonstrate improved accuracy in dependency mapping and significant reductions in mean time to resolution (MTTR) for incident management. This study highlights the transformative potential of AI in advancing predictive observability and operational resilience.

References

Chandrasekaran, B., & Turner, A. (2021). Machine Learning for Observability: A New Paradigm. Journal of Distributed Systems, 34(2), 123-140.

Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive Representation Learning on Large Graphs. Advances in Neural Information Processing Systems.

Kipf, T. N., & Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations (ICLR).

OpenTelemetry Contributors. (2022). OpenTelemetry: A Distributed Tracing and Metrics Framework. Available at: https://opentelemetry.io/

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436-444.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems.

Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2018). Graph Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. Advances in Neural Information Processing Systems.

Laskey, K. B. (2005). Decision Networks: A Probabilistic Approach to Decision Support. IEEE Transactions on Systems, Man, and Cybernetics.

Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2020). Graph Neural Networks: A Review of Methods and Applications. AI Open, 1, 57-81. https://arxiv.org/abs/1812.08434

Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2019). How Powerful Are Graph Neural Networks? International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1810.00826

Hamilton, W. L. (2020). Graph Representation Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 14(3), 1-159. https://www.morganclaypool.com/doi/10.2200/S00931ED1V01Y202008AIM046

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828. https://doi.org/10.1109/TPAMI.2013.50

Sigelman, B., Barroso, L. A., Burrows, M., Stephenson, P., Plakal, M., Beaver, D., Jaspan, B., & Shanbhag, C. (2010). Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Google Research Paper. https://research.google/pubs/pub36356/

OpenTelemetry Contributors. (2023). OpenTelemetry: A Distributed Tracing and Metrics Framework. Available at: https://opentelemetry.io/

Hellerstein, J. M., & Mao, Y. (2010). A Quantitative Approach to Observability in Large-Scale Systems. Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI).

Peng, Z., Bai, Y., He, L., Jiang, J., & Liu, H. (2021). AI-Driven Observability for Large-Scale Cloud Systems: A Graph-Based Approach. IEEE Transactions on Cloud Computing, 9(3), 556-571.

Chen, J., Wang, Y., & Zhang, M. (2022). Self-Supervised Learning for Observability: Leveraging AI in Anomaly Detection. International Conference on Big Data Analytics.

AI-DRIVEN DYNAMIC DEPENDENCY GRAPH GENERATION FOR PREDICTIVE OBSERVABILITY IN DISTRIBUTED SYSTEMS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite