LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.
Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.
Using idle GPUs to run open source models can cut cost per token by 95% compared to frontier models, according to Michael Heinrich, Co-Founder and CEO of 0G.
The estimate behind that figure comes from available supply. Michael says there are roughly 4 million underutilized GPUs globally, each equivalent to an H100, that could be pooled for distributed inference. Routing workloads through that idle capacity rather than centralized frontier model providers is where the cost reduction comes from.
Token volume is expected to keep rising, but cost per token has been falling. Michael points to two data points that suggest frontier model pricing is already creating friction at scale. Uber exhausted its entire AI budget in four months. Citadel Securities recently released a chart showing token consumption declining for the first time, which Michael attributes directly to cost.
The underlying question he raises is whether the quality difference between frontier and open source models is meaningful enough to justify the price gap for most business use cases. If it is not, open source models stand to absorb a growing share of the workloads that frontier models are currently too expensive to serve.
And so, but the difference is that if you use open source models plus you utilize compute that's underutilized. And we estimate as about 4 million GPUs at age 100 equivalents that are available in the world. And so by doing that, you can decrease the cost per token versus let's say a Fable 5 by 95%. So one more thing I'll ask you and then I'll let you go. You think or do you believe? Cause I believe this, right. I shouldn't have said that. But do you believe that there's gonna be more because of the cost associated with some of the US Frontier Lab models, that the open source models are gonna get more usage and that the money that would have been spent on the tokens for the non open source models will get used for Syracuse Labs or the protection side of it? Because that's really important and that's a differentiated service as opposed to like this. Again, I don't want to name the names because we have partnerships. Do that. But it's Model A really that different from Model B and Model C at a scale that matters for business productivity. Yeah, that's a great question. My answer to that would be that Spokane's for sure will rise as far as the number of tokens, but the cost per token will decrease over time. You have on Paradox, right? I mean, everyone's talking about this, right? So definitely go down, but you should feel go up, go up. But even now, you're starting to see because these Frontier models are so expensive, like Uber, for example, Blues for their entire informal balance in four months. And Citadel Securities just came out with a chart where I think for the first time, it's like going down from a like token consumption perspective because. Yeah.