Using idle GPUs to run open source models can cut cost per token by 95% compared to frontier models, according to Michael Heinrich, Co-Founder and CEO of 0G. The estimate behind that figure comes from available supply. Michael says there are roughly 4 million underutilized GPUs globally, each equivalent to an H100, that could be pooled for distributed inference. Routing workloads through that idle capacity rather than centralized frontier model providers is where the cost reduction comes from. Token volume is expected to keep rising, but cost per token has been falling. Michael points to two data points that suggest frontier model pricing is already creating friction at scale. Uber exhausted its entire AI budget in four months. Citadel Securities recently released a chart showing token consumption declining for the first time, which Michael attributes directly to cost. The underlying question he raises is whether the quality difference between frontier and open source models is meaningful enough to justify the price gap for most business use cases. If it is not, open source models stand to absorb a growing share of the workloads that frontier models are currently too expensive to serve.

To view or add a comment, sign in

Explore content categories