Day 3: Async Inference
Move fast and build things
Happy Saturday. I promise I won’t focus every piece on technical concepts. But this is another good one we can learn from. This idea came to me very recently—so bear with me and my ignorance here.
Over the past three days, I’ve spent a lot of time writing about what we should spend our time doing as twenty-first-century humans. What should we be prioritizing, learning, enjoying, avoiding—and I think this development provides yet another lens into that philosophy.
Some context: we all know what data centers are. They are the physical plants which provide outsourced compute and processing power for AI and ML models, and they are usually wholly owned by the developers of the models they function for (Meta, OpenAI, etc.); we could go down the deep rabbit hole of the host of issues they present, from an energy and water scarcity perspective to construction to pure lack of available land area (re: space and ocean-based GPU facilities, Ezra/Phillip at Starcloud and Sam/Eric at NetworkOcean).
The problem I want to focus on is one of time and cost. Right now, enterprises and consumers alike are most focused on two aspects of model performance: speed to output and the quality of that output. However, as more and more businesses leverage AI for >50% of their operations, there’s a price issue. Take automating even a small number of tasks at Google or NVIDIA; at their scale, with current market prices, using AI for that much work would cost billions of dollars per day. It’s not practical or efficient in any way to justify, and that is precisely why more portions of large enterprises haven’t been automated, even with obvious room for it.
The term “async inference” is not universally accepted. In fact, there are a handful of people I know who use those words to describe what their company specifically focuses on (will not name out of pure fear) and an emerging tech sector, as seen by Baseten’s graphic below. In plain English, the way we solve the above problem is to instead prioritize the amount of work an AI can complete and the cost to achieve it; speed and quality (relatively speaking) do not matter as we continue to scale. Async inference describes the processing—”thinking”—that can be done by background agents on a longer time-scale to support a signficanrly high number of tasks than is currently feasible.
Async inference, I believe, provides a perfect frame for what I mentioned at the top, because it represents a metaphorical shift away from the culture of speed in work and society that inevitably breaks everything. In an abstract sense, the way humans have done their greatest work throughout history has been to think—slowly—for a long time. Even in sectors like finance and tech, unicorn (>$1B valuation) founders are, on average, not as young as the media paints them—at all, actually. Ilya Strebulaev’s post linked here shows that the average age is 35, and the general trend, despite the noise from 18-year-olds dropping out of school, is getting older. In my little aviation world, I look to Wilbur and Orville Wright, who were 36 and 32, respectively, when they flew the first powered aircraft after five years of building it.
Admittedly, I feed into this young (and predominantly male) culture of moving fast and breaking things. It does seem to be the direction our world is headed, even with what I just described. But as a collective, we can be aware (and, if at all possible, try to mitigate) that there is a societal movement towards speed and a lack of expert knowledge dictating some of the most crucial political and market decisions. In whatever you do, try to find a balance between these competing forces—move quickly if something excites you, but also do your diligence and fully understand whatever it is, so you (1) make an informed, conscious action or decision and (2) make your contribution or work more impactful than if you approached it with limited knowledge.



YES. The "move fast and break things" feels akin to "only the good die young" -- the assumption that youthful energy (which is awesome) can only destroy things. Move fast and build things, yes. What happens after the things are broken? What new things can be built? You also speak to the deeper waters of thought that can only emerge with slow time -- the nervous system settles (as in meditation) and a deeper quality of thought -- innovative, substantial, not bound to a stopwatch. David Lynch talks about this with regards to his TM practice. Once again, I feel like you are pulling back the curtain on the Great Oz myths of modernity and tech -- it's all teenagers! It moves at the speed of light that leaves everyone else in the dust! It is your critical thinking that bridges the gap between inside and outside tech, fusing knowledge with philosophy, theory and practice. xo