AI Data Center Power: How 100 MW Campuses Are Rewriting Grid Engineering

Five years ago, a 10 MW data center was a large data center. Today, hyperscalers are building AI training campuses that consume 100 MW, 200 MW, and in some cases more, from a single contiguous site. The engineering required to supply, condition, and distribute that much power — and the stress it places on regional electricity grids — is reshaping how utilities, grid operators, and infrastructure developers think about large-scale power.

## GPU Cluster Power Density: The kW/Rack Progression

The underlying driver is accelerator power density. An NVIDIA H100 SXM5 GPU has a thermal design power of 700 watts. A rack of 8 H100s, including supporting servers, networking, and cooling overhead, draws approximately 10 to 12 kW. That figure is now the baseline, not the ceiling.

The H200 and GB200 systems push further. NVIDIA's NVL72 — a rack-scale unit containing 72 B200 GPUs — has a total power consumption of approximately 120 kW per rack. This is not a marginal increase from 10-12 kW per conventional server rack; it is an order-of-magnitude step. A 1,000-rack deployment of NVL72 systems would require 120 MW for the compute alone, before cooling, networking, or facility overhead.

The progression over time:
- 2018 (V100 era): 5–8 kW per GPU server rack
- 2021 (A100 era): 8–12 kW per GPU rack
- 2023 (H100 era): 10–15 kW per GPU rack
- 2025 (H200/B200 era): 50–120 kW per rack

This trajectory means that AI infrastructure is now a qualitatively different power problem from conventional enterprise IT, not just a quantitative one. Cooling systems designed for 20 kW per rack — still standard in most commercial colocation facilities — are inadequate. The entire physical infrastructure stack requires redesign.

## Campus-Scale UPS and Grid Interconnect Design

A 100 MW data center campus is comparable in electrical scale to a medium-sized factory or a small city district. The design of the power delivery infrastructure reflects this.

**Substation design.** A 100 MW campus typically connects to the transmission grid at 115 kV or 230 kV, with on-site substations stepping down to 13.8 kV for distribution across the campus. For the largest campuses in Virginia's data center corridor, 500 kV transmission connections are now in use. The step-down transformer banks are engineered with N+1 redundancy — if one transformer fails, capacity remains. For Tier IV facilities (99.9999% uptime target), N+2 is standard.

**Uninterruptible Power Supply at scale.** Traditional UPS architecture — centralized battery banks behind a static transfer switch — does not scale efficiently to 100 MW. Modern AI campuses increasingly use **distributed UPS** architectures, where battery backup is integrated at the rack or row level rather than at the facility level. This reduces distribution losses, improves fault isolation, and allows the UPS battery capacity to be sized precisely for the IT load rather than over-provisioned for an entire building.

Lithium iron phosphate (LFP) batteries have largely replaced lead-acid in large-scale UPS applications over the last four years. LFP offers higher energy density, longer cycle life (3,000 to 5,000 cycles vs. 500 for lead-acid), and better thermal stability. A 100 MW campus with a 15-minute UPS runtime (standard for generator start capability) requires approximately 25 MWh of battery storage.

**Grid interconnect agreements.** Connecting at 100+ MW scale requires transmission network impact studies — formal analyses by the grid operator that assess whether the new load causes congestion, requires transmission upgrades, or requires new generation capacity. In congested grid areas, these studies can take 18 to 36 months and frequently require the data center developer to fund transmission upgrades costing tens to hundreds of millions of dollars.

## On-Site Generation: Gas Turbines and SMRs

The unreliability of grid power in some markets — combined with the sheer scale of power demand — has pushed the largest AI campuses toward on-site generation strategies.

**Gas turbines** are the near-term solution. Simple-cycle aeroderivative gas turbines (General Electric LM6000, Siemens SGT-A65) can be installed at data center sites in 12 to 18 months, provide fast start capability (minutes to full power), and serve as primary generation rather than just backup. Several large campuses in Texas and the Southeast US have adopted this model, operating as behind-the-meter generators that reduce grid exposure. The carbon profile of this approach is poor — natural gas combustion emissions offset much of the efficiency advantage — but the operational reliability argument is strong.

**Small modular reactors** are the longer-term aspiration. Microsoft's 20-year power purchase agreement with Constellation Energy at the Three Mile Island Unit 1 restart, and Amazon's direct investment in X-energy's SMR program, signal that hyperscalers are taking nuclear seriously as an AI power source. A 300 MWe SMR sited near a data center campus would supply roughly three 100 MW facilities continuously. The economics depend entirely on construction cost certainty — the Vogtle Unit 3 and 4 experience ($35 billion for 2.2 GW) is not reassuring — but SMR proponents argue that factory-built modular designs change this dynamic.

## Power Usage Effectiveness (PUE) Evolution

PUE is the efficiency metric of data center power: total facility energy divided by IT equipment energy. A PUE of 1.0 would mean all power goes to computation; real facilities have overhead from cooling, lighting, and distribution losses.

Traditional air-cooled facilities aimed for PUE between 1.5 and 1.8. The hyperscaler buildout drove this down to 1.1 to 1.2 through evaporative cooling, optimization algorithms, and economies of scale. Google's DeepMind-optimized cooling systems in its Iowa data centers achieved sub-1.1 PUE through ML-driven setpoint management.

AI campuses are introducing a complication. At 120 kW per rack, conventional air cooling cannot remove heat fast enough. **Direct liquid cooling** — running chilled water directly to cold plates on GPU packages — becomes necessary. Liquid cooling has higher infrastructure complexity and upfront cost but better thermal efficiency and enables higher rack power density. The PUE impact of liquid cooling deployment, when accounting for the chiller plant required, typically yields facility PUEs of 1.05 to 1.15 — competitive with or better than best-practice air-cooled facilities.

Immersion cooling — submerging servers in dielectric fluid tanks — takes this further, with PUEs approaching 1.03 in well-designed installations. The limiting factor is operational: immersion cooling changes maintenance procedures entirely (pulling servers from fluid tanks is different from rack-and-stack), and hyperscalers are cautious about adopting new operational models at the pace that AI deployment demands.

## Transmission Congestion in Virginia, Texas, and Ireland

The concentration of data center demand in a small number of geographic markets has created visible grid stress.

**Northern Virginia** — specifically Loudoun County, the self-styled "Data Center Alley" — hosts more data center capacity than any other county on Earth. Dominion Energy, the local utility, has been managing a queue of several hundred data center interconnection requests exceeding 35,000 MW of new load. That is roughly equivalent to adding the electricity demand of Portugal. The transmission constraints are sufficiently severe that new permits in some sub-markets have been paused pending infrastructure expansion.

**ERCOT (Texas)** faces a different challenge: a deregulated market with rapid load growth and increasing weather volatility. The February 2021 freeze demonstrated the fragility of the system. Data centers on the ERCOT grid are now routinely asked to voluntarily curtail during peak demand events. The long-term solution — additional transmission capacity and generation — is moving slowly relative to demand growth.

**Ireland** hosts a significant share of European hyperscaler capacity, partly due to corporate tax advantages and EU data sovereignty requirements. EirGrid, the transmission operator, has estimated that data centers could account for 32% of Ireland's total electricity demand by 2030 — a figure that creates material risks for grid stability and is driving EirGrid to restrict new data center connections in Dublin until transmission infrastructure catches up.

## Utility Co-Investment Models

The scale and permanence of AI data center load is creating new financial relationships between data centers and utilities. Unlike most industrial customers, hyperscalers typically have long operating lifespans (20+ years), creditworthy balance sheets, and predictable demand profiles. This makes them attractive counterparties for utility infrastructure investment.

Several US utilities are now offering **co-investment structures** in which the data center developer funds transmission or substation upgrades and receives a portion of the capacity as rate base credit. Virginia's regulatory framework has been adapted to allow Dominion to recover data-center-specific transmission upgrade costs more efficiently. In exchange, data center operators receive committed power delivery timelines — typically 3 to 5 years from agreement to energization for large sites.

The engineering reality of AI data center power is that the system is growing faster than the infrastructure designed to support it. Grid operators, utilities, and data center developers are all running to catch up with deployment timelines driven by model training schedules and competitive pressure. The engineering solutions exist — liquid cooling, distributed UPS, on-site generation, co-investment transmission models. The constraint is not imagination but the speed at which 100-year-old electrical infrastructure can be safely upgraded.

AI Data Center Power: How 100 MW Campuses Are Rewriting Grid Engineering

// COMMENTS

ON THIS PAGE