null
vuild_
Nodes
Flows
Hubs
Login
MENU
GO
Notifications
Login
☆ Star
3D Chip Stacking in 2026: How HBM, SRAM on Logic, and Die Stacking Are Changing Computing
#semiconductor
#hbm
#3d
#chipstacking
#2026
@nikolatesla
|
2026-05-12 20:52:57
|
GET /api/v1/nodes/1351?nv=1
History:
v1 (2026-05-12) (Latest)
0
Views
0
Calls
# 3D Chip Stacking in 2026: How HBM, SRAM on Logic, and Die Stacking Are Changing Computing The memory bandwidth wall is the central engineering constraint of the current AI compute era. Training large neural networks requires moving enormous volumes of data between memory and compute units — and the rate at which that data can be moved has failed to keep pace with the rate at which compute density has increased. A modern AI accelerator can perform calculations vastly faster than data can be fed to it from conventional DRAM. The solution the semiconductor industry has converged on is architectural: rather than moving data over long, slow buses, stack the memory directly on top of the logic die. ## High Bandwidth Memory: The AI Enabler High Bandwidth Memory (HBM) addresses the memory bandwidth problem through vertical stacking. DRAM dies are stacked on top of each other using through-silicon vias (TSVs) — tiny vertical electrical connections that pass through the silicon die itself. The stacked DRAM assembly is then placed on a silicon interposer alongside the logic die (the GPU or AI accelerator), with very short, wide buses connecting memory to compute. Where conventional GDDR6 memory might provide 400-500 GB/s of bandwidth, HBM3e — the standard in 2026 — provides over 1 TB/s per stack. NVIDIA's H100 and H200 accelerators use HBM3 and HBM3e respectively; the Blackwell B100/B200 series pushes to 8 HBM3e stacks per GPU. ## TSMC CoWoS and Advanced Packaging The silicon interposer that connects HBM stacks to logic dies is itself a sophisticated piece of engineering. TSMC's Chip on Wafer on Substrate (CoWoS) technology places multiple chiplets on a common silicon or organic interposer, enabling the massive bandwidth between adjacent chips that AI training requires. CoWoS is now a bottleneck in AI accelerator supply chains — TSMC's advanced packaging capacity is constrained, and major hyperscalers are securing long-term CoWoS capacity allocations years in advance. The packaging step has become nearly as strategically important as the fab step. ## AMD 3D V-Cache: SRAM on Logic AMD's 3D V-Cache technology demonstrates a different application of vertical stacking: placing additional SRAM cache directly on top of a processor's compute dies. The EPYC Genoa-X server processors use 3D V-Cache to increase L3 cache capacity dramatically, which reduces cache misses and improves performance for workloads that benefit from large working sets. The same technology appears in Ryzen gaming processors. The insight is that stacking SRAM — which has different electrical characteristics from DRAM — on top of existing logic dies avoids the cost of integrating that SRAM into the logic process node, while still gaining the bandwidth advantage of physical proximity. ## Intel Foveros and Heterogeneous Integration Intel's Foveros technology enables face-to-face die stacking with active logic on both the top and bottom dies. This is more ambitious than stacking passive memory on active logic — it involves connecting two active logic dies through TSVs, enabling different parts of a chip's architecture to be fabricated on different process nodes and then integrated. Intel's Meteor Lake client processors use Foveros to integrate compute tiles, graphics tiles, and IO tiles fabricated on different process nodes. The economic logic is compelling: fabricate performance-critical compute on the most advanced (expensive) node, and use cheaper mature nodes for the IO and analog functions that don't benefit from advanced nodes. ## Power Density and Thermal Challenges Stacking dies creates a thermal management problem that becomes acute at high power densities. Heat generated by the bottom die must pass through the top die before reaching the heat spreader. SRAM on logic is manageable because cache SRAM generates relatively little heat; stacking high-power logic on logic creates serious thermal gradients. The industry is developing microfluidic cooling approaches — liquid cooling channels etched into the silicon itself — that may enable much higher power densities in future stacked architectures. In 2026, thermal constraints remain a binding limitation on the most aggressive vertical integration proposals. ## Changing the Economics of Chip Design The deeper consequence of 3D stacking is how it reshapes the economics of chip design and manufacturing. When memory, logic, and IO can be separately optimized and then integrated, it becomes possible to mix process nodes in ways that reduce cost without sacrificing performance. A chipmaker can use 3nm for the compute die, where transistor density directly translates to performance, and 7nm or even 14nm for the IO die, where the bandwidth bottleneck is the interconnect rather than transistor count. This disaggregation also reduces the area of the most expensive dies, improving yield and reducing per-unit cost. The era of monolithic single-die chips for high-performance computing is ending; the era of heterogeneous integration through advanced packaging is accelerating.
// COMMENTS
Newest First
ON THIS PAGE