HBM vs GDDR: Memory Trade-offs

The AI hardware boom created a memory problem that GDDR wasn't designed to solve.

Graphics cards historically used GDDR — Graphics Double Data Rate memory — chips mounted on the same PCB, connected via a wide but conventional parallel bus. GDDR6 offers good bandwidth per dollar for gaming workloads, where you need fast access to a texture cache but the total memory isn't huge.

Training a large language model is a different problem. You need bandwidth. Constant, massive bandwidth. Moving billions of parameters in and out of memory at every forward and backward pass is the bottleneck — not compute cycles.

## What HBM Actually Does Differently

HBM (High Bandwidth Memory) solves this by putting the memory dies directly on top of the compute die in a 3D stack, connected through silicon interposers via thousands of TSVs (Through-Silicon Vias). Instead of routing signals off the chip to PCB traces and back, the path is millimeters through silicon.

The bandwidth numbers tell the story. GDDR6X on the RTX 4090 reaches about 1 TB/s. HBM3E on NVIDIA's H200 hits 4.8 TB/s. That's nearly 5x more bandwidth in the same thermal envelope — with less power per bit transferred.

## Why You Don't See HBM in Gaming GPUs

The manufacturing process for HBM is significantly more complex and expensive. You're essentially doing advanced packaging — stacking DRAM dies on a logic interposer alongside the GPU die. The yields are lower, the supply chain is concentrated (SK Hynix and Samsung dominate, with Micron entering), and the per-GB cost is roughly 3-4x GDDR6.

For a $500 gaming GPU targeting 16GB of memory, the math doesn't work. But for a $30,000 AI accelerator targeting peak inference throughput, bandwidth-per-watt matters far more than bandwidth-per-dollar.

## The Capacity Problem

HBM stacks are physically large and there's a limit to how many you can fit around a compute die on an interposer or advanced package. NVIDIA's H100 has 80GB HBM3. The H200 pushed to 141GB HBM3E. But fitting more HBM alongside a GPU die requires larger and more expensive packages.

This is why the "memory wall" is a real concern in AI. As model sizes grow, bandwidth and capacity become the limiting factor — not flops. The competition among SK Hynix, Samsung, and Micron to produce HBM4 with higher capacity stacks and lower per-bit power is directly tied to who wins the AI hardware race.

## LPDDR and the Edge Inference Case

For edge inference — running models on phones, cars, embedded devices — neither HBM nor GDDR makes sense. LPDDR (Low Power DDR), the same memory class used in smartphones, offers better power efficiency than GDDR at lower bandwidth levels.

Apple's unified memory architecture, for instance, uses LPDDR5X with a very wide bus to achieve reasonable bandwidth at mobile power levels. It's not 4 TB/s, but it's fast enough for on-device inference of 7B-13B parameter models.

The memory hierarchy question — HBM for data center, GDDR for gaming, LPDDR for edge — reflects the fundamental trade-off: bandwidth, capacity, power, and cost don't all optimize in the same direction.

HBM vs GDDR: Memory Trade-offs

// COMMENTS

ON THIS PAGE