The future of compute & AI is heterogeneous.
That future is being built by Callosum.
Today we launch with new breakthroughs made possible by heterogeneity, new scaling principles & a roadmap reimagining compute & AI infrastructure:
At @CallosumAI we are aggressively scaling up hiring, in our London office 🇬🇧
We are building a more pluralistic vision of AI: co-evolving diverse chips and new architectures for the next generation of intelligence, that is economically viable at scale.
We've opened roles
@CallosumAI is hiring in London!
We're building the infrastructure for heterogeneous compute - making many models, on many chips, behave as a single coherent, co-evolving system.
Five MTS roles open across the AI infra stack. Link below!
Great news that @UKSovereignAI has invested in Ineffable Intelligence, led by David Silver, joining @CallosumAI in the portfolio.
These are the technologies that will define the next decade (+more) of progress.
Congratulations to David and the team!
🇬🇧 Sovereign AI is backing @IneffableLabs 🇬🇧
We can announce Sovereign AI has invested in Ineffable. Led by David Silver, Ineffable is building learning systems that can go beyond the limits of human knowledge, learning continuously to push the frontiers of science and
Our first investment?
💥 @CallosumAI - @DanAkarca & @achterbrain 💥
Proud to be backing Danyal, Jascha, and the team as they build one of the defining layers of next-gen AI systems. 🔥
incredibly bullish on the future of tech + AI in London.
just to name a few:
• OpenAI just announced (last week) that London will become its largest research hub outside San Francisco
• Anthropic kicked off a 100+ person hiring spree across London and Dublin in 2025
• xAI
We are at a unique moment in time for AI & compute: New accelerators / chips, HPC hardware, and new algorithms have each made strides, but we are not yet orchestrating them as a heterogeneous stack. That is what @CallosumAI is built to do, and today we are sharing our vision 🧵
Today we launched @CallosumAI.
We are building the infrastructure where heterogeneous chips & intelligence co-evolve to solve the world's hardest problems.
Today we present our first results.
Across four large problem spaces, we break SOTA and deliver orders-of-magnitude
Everything here is early evidence for a deeper thesis: as the problems we need to solve grow in difficulty, the systems that solve them must grow in diversity.
Heterogeneous systems - diverse models on diverse hardware, co-evolved end-to-end - unlock scaling territory that
None of these results came from a bigger model.
12× cheaper deep context. New web SOTA with open-source, 3x cheaper and faster. 2.4× cache speedups. 1,767× faster tool calling.
All from heterogeneity - mixed models, mixed chips, mixed scales - co-evolved end-to-end.
This changes what small models can do. 8 × 1B models generating grammar-constrained candidate tool-calls with a naive voting schema: 42.27% accuracy on structured data extraction - a +11 point improvement over a single greedy pass from the same model and +2 points over an 8B
We moved the entire operation on-die on @awscloud Inferentia2. JSON schemas compile into finite state machines, a custom NKI kernel performs constrained decoding entirely in NeuronCore SBUF. The mask lives in on-chip SRAM, right alongside the logits.
O(1) scaling. 1.4μs at
Finally, Tool Calling Problems. Every class of problem above shares a common dependency: tool calling. It's how agents act on the world. Get it wrong and performance breaks. Get it right but too slowly, and the economics break instead.
The bottleneck: grammar enforcement is
This already extends to real workflows. 20% speedup out of the box on a podcast generation task with large system prompts - before any deeper optimisations.
And eviction is just the beginning. Topology-awareness enables pre-fetching context before it's needed, hierarchical