Enterprise Data Centers Readiness for AI
Enterprise data centers present unique challenges that require significant upgrades in power, cooling and networking systems to install the AI infrastructure.
Artificial Intelligence (AI) is revolutionizing industries by enabling new capabilities and efficiencies. However, the integration of AI into enterprise data centers presents unique challenges that require significant upgrades in power, cooling, and networking systems. This article explores the key areas that data centers must address to be ready for AI workloads.
1. Increased Power Demands
AI workloads, particularly those involving Graphics Processing Units (GPUs), demand significantly more power than traditional server workloads. AI GPU cabinets can consume between 30 and 100 kW, which is 3 to 10 times higher than traditional server cabinets. For instance, a Nvidia DGX H200 server alone can consume up to 10.2 kW. This necessitates substantial upgrades to the electromechanical infrastructure of data centers to support these power requirements.
2. Advanced Cooling Requirements
The high-power consumption of AI GPUs also leads to increased heat generation, which traditional air-cooling systems cannot handle effectively beyond 10 kW per rack. As a result, data centers must adopt liquid or hybrid cooling technologies to manage the excessive heat generated by GPUs.
The choice between in-row cooling and liquid cooling depends on the specific cooling requirements and the power density of the server racks.
1. In-row cooling is typically used when the power density of the server racks is moderate, usually up to 30 kW per rack. This method involves placing cooling units directly within the rows of server racks, providing targeted cooling right where it’s needed. In-row cooling is effective for managing the heat generated by high-density AI GPU servers, ensuring that the hot air is immediately cooled and maintaining optimal operating temperature. This method is less invasive and can be integrated into existing data center infrastructure with minimal modifications.
2. Liquid cooling is necessary when dealing with extremely high-power densities, often exceeding 30 kW per rack. This method involves using liquid to absorb and dissipate heat directly from the GPUs, which is far more efficient than air cooling. Liquid cooling is essential for managing the excessive heat generated by AI GPU servers, especially in scenarios where traditional air-cooling systems are insufficient. However, implementing liquid cooling can be more invasive and may require significant modifications to the existing data center infrastructure.
3. Complex Networking Needs
AI infrastructure requires advanced networking technologies to handle the high demands of AI workloads. Unlike traditional application loads that may only need one or two networks, AI computing necessitates four separate networks: compute, storage, management, and in-band/out-of-band management. These networks facilitate GPU-to-GPU communication, storage access, and overall system management. Additionally, the bandwidth requirements for AI workloads are significantly higher, with communication between GPUs requiring up to 800 Gbps.
To manage the 800 Gbps bandwidth required for GPU communication in AI infrastructure, several advanced networking protocols can be utilized. Here are some examples:
InfiniBand: This high-performance networking protocol is designed for data-intensive applications and is commonly used in AI infrastructure. It provides low-latency and high-throughput communication, making it suitable for the demanding bandwidth requirements of GPU communication.
Ethernet: While traditional Ethernet may not be sufficient for 800 Gbps bandwidth, advanced versions such as 400 Gigabit Ethernet (400GbE) and 800 Gigabit Ethernet (800GbE) are being developed to meet these high-speed requirements.
These protocols are essential for ensuring efficient and reliable communication between GPUs, enabling the high-speed data transfer necessary for AI workloads.
Final words and next steps
Preparing enterprise data centers for AI is a complex and demanding task that involves significant upgrades in power, cooling, and networking systems. By addressing these challenges and staying ahead of future trends, data centers operators can ensure they are ready to support the next generation of AI-driven applications and services. As AI continues to evolve, data centers must remain adaptable and forward-thinking to meet the ever-growing demands of this transformative technology.
In my next article, I’ll outline the steps needed to create a data center readiness program to address these challenges for adoption of AI infrastructure.
Let me know in the comments if you’re considering any power, cooling, or network changes you need to make when deploying AI infrastructure in your data centers.


