Open-source models are production-ready to power agents.
In Droid, GLM-4.6 achieves 43.5% on Terminal-Bench. It outperforms Sonnet 4 in Claude Code and approaches frontier performance.
Sparse mixture-of-experts architectures make self-hosting practical:
• GPT-OSS-120B: 38% on
Starting today, you can use any open-source model to power your Droids.
Droids achieve the highest scores across all open-source models on Terminal-Bench. We find GLM 4.6 to be the most performant, remarkably achieving a score in Droid that beats Sonnet 4 in Claude Code.












