Feature hasn't been suggested before.
Describe the enhancement you want to request
What feature would you like to see?
A knowledge_harness tool that lets cheap worker models (Kimi K2.5, GLM-5, etc.) call a SOTA model (Claude, GPT, Gemini) for Aha! moments and actual deep reasoning solutions to the task with a capped output (~10k tokens) so instead of running the expensive model for every iteration we just ask them what should be done.
Why does this belong in OpenCode?
The real cost in agentic coding isn't the first plan where ppl usually focus. It's the 50 follow-up iterations obv edits, fixes, test failures, refactors. Running Opus/GPT for all of that gets expensive fast.
How it works
Before writing code, the SOTA model looks at the codebase already in the context window and returns a blueprint of what to do, why, and how. The worker model then takes that blueprint along with the code and just outputs the actual implementation tokens. Output costs drop because the heavy coding is done by cheaper models.
Cost math
Harness call: ~$0.40-0.50 (30-50k input + 10k capped output)
Worker iteration: ~$0.07
Typical session (10 harness calls + 40 worker iterations): ~$7.8 vs $20+ running Opus directly while retaining what Opus would do in practice.
What I'd build
- knowledge_harness tool callable by any agent
- Two modes: on-demand (worker/user triggers) and always-on (every follow-up)
- Predictive cost estimation before each call ("~$0.45, proceed?")
- Session budget tracking — "I have $2 today" and the system rations calls
Related issues
#8456 - task-type model routing (this is the dynamic, on-demand version)
#6651 - dynamic model selection for subagents via Task tool
#10354 - near-real-time cost visibility
#11138 - configurable per-session token budget
#9649 - multi-agent coding (different models for different strengths)
#7602 - model fallback/failover support
#2804 - user leaving OpenCode because they can't track costs
Feature hasn't been suggested before.
Describe the enhancement you want to request
What feature would you like to see?
A knowledge_harness tool that lets cheap worker models (Kimi K2.5, GLM-5, etc.) call a SOTA model (Claude, GPT, Gemini) for Aha! moments and actual deep reasoning solutions to the task with a capped output (~10k tokens) so instead of running the expensive model for every iteration we just ask them what should be done.
Why does this belong in OpenCode?
The real cost in agentic coding isn't the first plan where ppl usually focus. It's the 50 follow-up iterations obv edits, fixes, test failures, refactors. Running Opus/GPT for all of that gets expensive fast.
How it works
Before writing code, the SOTA model looks at the codebase already in the context window and returns a blueprint of what to do, why, and how. The worker model then takes that blueprint along with the code and just outputs the actual implementation tokens. Output costs drop because the heavy coding is done by cheaper models.
Cost math
Harness call: ~$0.40-0.50 (30-50k input + 10k capped output)
Worker iteration: ~$0.07
Typical session (10 harness calls + 40 worker iterations): ~$7.8 vs $20+ running Opus directly while retaining what Opus would do in practice.
What I'd build
Related issues
#8456 - task-type model routing (this is the dynamic, on-demand version)
#6651 - dynamic model selection for subagents via Task tool
#10354 - near-real-time cost visibility
#11138 - configurable per-session token budget
#9649 - multi-agent coding (different models for different strengths)
#7602 - model fallback/failover support
#2804 - user leaving OpenCode because they can't track costs