AN ENHANCED ECN-BASED CONGESTION MITIGATION WITH TRAFFIC REROUTING FOR CLOS DATA CENTER NETWORKS
Keywords:
Explicit Congestion Notification (ECN), Equal-Cost Multi-Path (ECMP), Congestion Experienced (CE), AI Data Centers, Congestion Control, Spine-Leaf Networks, CLOSAbstract
Explicit Congestion Notification (ECN) is widely used in modern data center networks to signal congestion without incurring packet loss. ECN is effective in reducing queue buildup and preserving high throughput. However, congestion signals are generated only after queue buildup has begun, which may delay mitigation in large-scale AI data centers where synchronized, high-throughput traffic flows are common. This paper examines the limitations of conventional ECN behavior in spine-leaf data center fabrics and proposes an enhanced ECN-based mechanism that utilizes in-network feedback and dynamically adapts Equal-Cost Multi-Path (ECMP) paths. By proactively rerouting traffic away from congested paths before widespread congestion propagation, the proposed approach aims to improve job completion time, reduce unnecessary rate throttling, and better utilize available network resources. Additionally, incorporating real-time buffer telemetry into this path-selection process provides a powerful extension that directly elevates fabric stability and load distribution. This work shows the potential of a closer integration of congestion notification and path selection mechanisms may help shorten congestion resolution and improve the efficiency of AI training and inference workloads in modern data center environments.
References
K. Ramakrishnan, S. Floyd, and D. Black. The addition of explicit congestion notification (ECN). RFC 3168.
IEEE. 802.11Qbb. Priority based flow control, 2011.
Infiniband Trade Association. Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A16: RDMA over converged ethernet (RoCE), 2010.
Infiniband Trade Association. Supplement to infiniBand architecture specification volume 1 release 1.2.2 annex
A17: RoCEv2 (IP routable RoCE), 2014.
S. Floyd and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking, 1:397–413, 1993.
M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data Center TCP (DCTCP). In SIGCOMM, 2010.
S. Floyed, 'TCP and Explicit Congestion Notification," ACM Computer Communications Review, pp. 10-23, October
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan,
Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. 2014. CONGA: distributed congestion-aware load balancing for datacenters. SIGCOMM Comput. Commun. Rev. 44, 4 (October 2014), 503–514
J. Wang, D. Yuan, W. Luo, S. Rao, R. Simon Sherratt, and J. Hu, “Congestion Control Using In-Network Telemetry for Lossless Datacenters,” Comput. Mater. Contin., vol. 75, no. 1, pp. 1195–1212, 2023.
Savaliya, Mohitkumar. (2025). CONGESTION CONTROL IN AI DATACENTERS: LEVERAGING ECN AND PFC FOR IMPROVED NETWORK EFFICIENCY. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING AND TECHNOLOGY. 16. 47-53. 10.34218/IJCET_16_05_004.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Mohitkumar Savaliya (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.