Pinned
Everyone thinks the AI bottleneck is compute. Well...not exactly.
From the A100 to the B200, compute scaled 8x. Memory bandwidth scaled 4x. Capacity only 2.4x.
The gap between what a chip can calculate & what you can actually feed it is the real wall.
A thread on memory:











