fix(subagent): resolve runtime model from subagent default instead of parent primary#72984
fix(subagent): resolve runtime model from subagent default instead of parent primary#72984joeykrug wants to merge 25 commits into
Conversation
Greptile SummaryFixes a real spawn-time race condition: when the gateway reads a subagent's session entry before the model write has flushed, Confidence Score: 4/5Safe to merge; the race-condition fix is correct and well-tested, with one intentional behavioral change worth calling out in release notes. No logic errors or security issues found. The only concern is the precedence reorder in src/agents/model-selection.ts — the Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/model-selection.ts
Line: 244-248
Comment:
**Breaking precedence change for `agentConfig.model`**
After the reorder, any agent whose subagents relied on inheriting the agent's own primary model (no explicit `subagents.model` set on the agent, but a global `agents.defaults.subagents.model` does exist) will silently switch to the global default. Before this change `agentConfig.model` won over the global default; now it's the last resort. The behavior is documented as intentional, but it is a silent regression for that configuration pattern — users won't get a warning, and the previously-used model changes without any explicit action on their part.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(subagent): resolve runtime model fro..." | Re-trigger Greptile |
| return ( | ||
| normalizeModelSelection(agentConfig?.subagents?.model) ?? | ||
| normalizeModelSelection(agentConfig?.model) ?? | ||
| normalizeModelSelection(params.cfg.agents?.defaults?.subagents?.model) | ||
| normalizeModelSelection(params.cfg.agents?.defaults?.subagents?.model) ?? | ||
| normalizeModelSelection(agentConfig?.model) | ||
| ); |
There was a problem hiding this comment.
Breaking precedence change for
agentConfig.model
After the reorder, any agent whose subagents relied on inheriting the agent's own primary model (no explicit subagents.model set on the agent, but a global agents.defaults.subagents.model does exist) will silently switch to the global default. Before this change agentConfig.model won over the global default; now it's the last resort. The behavior is documented as intentional, but it is a silent regression for that configuration pattern — users won't get a warning, and the previously-used model changes without any explicit action on their part.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/model-selection.ts
Line: 244-248
Comment:
**Breaking precedence change for `agentConfig.model`**
After the reorder, any agent whose subagents relied on inheriting the agent's own primary model (no explicit `subagents.model` set on the agent, but a global `agents.defaults.subagents.model` does exist) will silently switch to the global default. Before this change `agentConfig.model` won over the global default; now it's the last resort. The behavior is documented as intentional, but it is a silent regression for that configuration pattern — users won't get a warning, and the previously-used model changes without any explicit action on their part.
How can I resolve this? If you propose a fix, please make it concise.46612ec to
9d60b36
Compare
|
Codex review: needs real behavior proof before merge. Reviewed June 11, 2026, 1:00 PM ET / 17:00 UTC. Summary PR surface: Source +140, Tests +533. Total +673 across 10 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Stored data model Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model internal, reasoning high; reviewed against 575cae59d486. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +140, Tests +533. Total +673 across 10 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
… parent primary `resolveSessionModelRef` had no subagent awareness — when `entry.model` was empty (race between spawn-side write and gateway read), it fell through to `resolveDefaultModelForAgent`, returning the parent agent's primary model instead of the configured subagent default. Also reorder the `??` chain in `resolveSubagentConfiguredModelSelection`: global `agents.defaults.subagents.model` now takes precedence over the spawning agent's own `model` field, matching the documented behavior. Tests cover both the runtime fallback and the spawn-time chain order.
The Pi runtime harness initialization writes the parent agent's primary model into the child session entry, overwriting the value that `resolveSubagentSpawnModelSelection` correctly resolved during spawn. This was the production failure mode covered up by the read-side fallback in fbbd3c2d5f. When `entry.subagentRole` is set or `entry.spawnDepth >= 1`, prefer `agents.list[<id>].subagents.model.primary` (with the standard chain fallbacks via `resolveSubagentConfiguredModelSelection`) over the parent agent's primary `model.primary`. `getReplyFromConfig` resolved the run model via `resolveDefaultModel` (which goes through `resolveDefaultModelForAgent` and is not subagent-aware), and then post-run persistence in `updateSessionStoreAfterAgentRun` wrote that model back to `entry.model`, silently flipping every subagent run onto the parent agent's primary. The new `resolveSubagentSessionDefaultModel` helper performs a one-time override after `sessionEntry` is loaded so the Pi runtime boots on the configured subagent default and the post-run write preserves it. Tests cover the subagent path and a sanity assertion that non-subagent sessions still inherit the parent's primary.
Main re-introduced the inline 'type Command' export after the prior fix
extracted it to a separate 'export type { Command }' line; the merge kept
both, causing TS2300 'Duplicate identifier' under the extension package
boundary tsc compile. Keep only the inline form (consistent with main).
The reply-time fallback (resolveSubagentSessionDefaultModel) and the gateway-side fallback (resolveSessionModelRef) both fed the configured subagent selection straight into parseModelRef, so a configured alias such as 'gpt' was parsed as a bare model under the default provider instead of being resolved through agents.defaults.models. That made the race fallback disagree with resolveSubagentSpawnModelSelection, which already runs alias resolution. Switch both helpers to resolveModelRefFromString with a buildModelAliasIndex alias index built from the same default provider, matching the spawn-side contract. Adds regression tests in both files (configured subagent alias 'gpt' resolves to openai/gpt-5.4) and a CHANGELOG entry.
63c26bd to
b45a2f4
Compare
7496b24 to
98db52b
Compare
…icts Resolve 5 conflicts preserving the subagent-model-resolution feature while adopting upstream's refactors: - get-reply.ts: keep the subagent-default boot block on upstream's timing-wrapped resolver; defaultProvider/defaultModel/aliasIndex become let-bound so the subagent block can reassign them. - model-selection.ts / model-selection-resolve.ts / cron run-model-selection aggregator: adopt upstream's canonical resolveSubagentConfiguredModelSelection (same subagents.model -> defaults.subagents.model -> agent primary precedence the PR introduced, plus upstream's includeAgentPrimary knob); drop the now-stale resolve-file import/re-exports. - transport-stream.ts, preaction.test.ts, embedded-agent-runner/runs.test.ts: take upstream's unrelated churn (Gemini signature payload helper, programLocal rename, file rename + new imports). Finding: ClawSweeper's image-model one-turn override precedence defect is moot after this merge -- upstream commit 55a0c9b removed the image-model override plumbing (opts.modelOverride / hasAppliedImageModelOverride) from get-reply.ts entirely. The surviving one-turn override is the heartbeat override, which the subagent-default block already gates on (!hasResolvedHeartbeatModelOverride). Added a focused regression in directive-handling.defaults.subagent.test.ts pinning that gating: a resolved one-turn override wins and the subagent default does not clobber it, while an ordinary subagent turn applies the subagent default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The branch's prior main-merges caused git merge to silently revert unrelated upstream files (import-sort churn) in two files the PR never intended to touch: - extensions/memory-wiki/src/tool.test.ts - src/flows/doctor-lint-flow.ts Restore both to upstream/main so check-dependencies / build-artifacts pass, leaving only the PR's real subagent-model-resolution changes in the net diff. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # src/auto-reply/reply/stored-model-override.ts
Summary
Two-part fix for subagent model resolution:
resolveSessionModelRefhad no subagent awareness. Whenentry.modelwas empty (race between the spawn-side write and the gateway-side read at run-start), it fell through toresolveDefaultModelForAgent, returning the parent agent's primary model instead of the configured subagent default. Also reorders the??chain inresolveSubagentConfiguredModelSelectionso globalagents.defaults.subagents.modeltakes precedence over the spawning agent's ownmodelfield.getReplyFromConfigresolved the run model viaresolveDefaultModel(which goes throughresolveDefaultModelForAgentand is not subagent-aware), and post-run persistence inupdateSessionStoreAfterAgentRunthen wrote that model back toentry.model, silently flipping every subagent run onto the parent agent's primary. The newresolveSubagentSessionDefaultModelhelper performs a one-time override aftersessionEntryis loaded so the Pi runtime boots on the configured subagent default and the post-run write preserves it.Failure mode (before)
Config:
Spawn flow:
sessions_spawn(task=..., agentId=main)with no explicitmodel:.gpt-5.5) and writesentry.model/entry.modelProviderviapersistInitialChildSessionRuntimeModel.getReplyFromConfig, which resolvesdefaultProvider/defaultModelviaresolveDefaultModelForAgent— returning Opus (parent primary) — and uses that as the run'sprovider/model.updateSessionStoreAfterAgentRun → setSessionRuntimeModelwrites Opus back toentry.model, clobbering the spawn-time gpt-5.5.Result on disk:
~/.openclaw/agents/<agent>/sessions/sessions.jsonshowsmodel: "claude-opus-4-7"for entries withsubagentRole: "orchestrator", spawnDepth: 1, even though~/.openclaw/subagents/runs.jsoncorrectly recordedgpt-5.5. The read-side fallback in commit 1 only kicks in whileentry.modelis empty — once the post-run write fires,entry.modelis non-empty and the fallback is skipped.Fix
resolveSessionModelRef: whenentry.spawnDepth >= 1orentry.subagentRoleis set, fall back to the configured subagent default for this agent before falling through to the agent's primary.resolveSubagentConfiguredModelSelection: reorder??chain toagentConfig.subagents.model→agents.defaults.subagents.model→agentConfig.model.resolveSubagentSessionDefaultModel: new helper indirective-handling.defaults.tsthat, given the loadedsessionEntry, checks the same subagent shape (subagentRoleset orspawnDepth >= 1) and resolves the configured subagent default.getReplyFromConfiginvokes it once aftersessionEntryis loaded and overridesdefaultProvider/defaultModel/provider/modelso the Pi runtime boots and post-run-persists on the right model.Tests
src/gateway/session-utils.test.tscover the runtime subagent fallback (with and withoutentry.spawnDepth).agentConfig.subagents.modelis unset,agents.defaults.subagents.modelis set, andagentConfig.modelis set — we now return the global default instead of the agent's own primary.src/auto-reply/reply/directive-handling.defaults.subagent.test.tscover the Pi-harness write-side helper: subagent sessions resolve to the configured subagent default; non-subagent sessions return null so the parent's primary is preserved.Scope
Commit 1:
src/gateway/session-utils.ts,src/agents/model-selection.ts, and their tests.Commit 2:
src/auto-reply/reply/directive-handling.defaults.ts,src/auto-reply/reply/get-reply.ts, and one new test plus two existing test-mock updates.No drive-by changes.
Real behavior proof
anthropic/claude-opus-4-7) into~/.openclaw/agents/<agent>/sessions/sessions.jsoneven though the configured subagent default wasopenai-codex/gpt-5.5and the spawn-side resolver had already written the correct ref. Subsequent turns then ran on the wrong model.agents.list[main].model.primary = anthropic/claude-opus-4-7,agents.defaults.subagents.model.primary = openai-codex/gpt-5.5, andagents.list[main].subagents.model.primary = openai-codex/gpt-5.5. Subagent spawned viasessions_spawn(task=..., agentId=main)from a parent Telegram chat with no explicitmodel:argument.pnpm openclaw doctorto confirm config parses cleanly.pnpm openclaw gateway start.~/.openclaw/agents/main/sessions/sessions.jsonand~/.openclaw/subagents/runs.json.~/.openclaw/agents/main/sessions/sessions.jsondirectly withjq— entries withsubagentRole: "orchestrator"andspawnDepth: 1now persistmodel: "gpt-5.5"andmodelProvider: "openai-codex"instead ofclaude-opus-4-7.~/.openclaw/subagents/runs.jsoncontinues to recordgpt-5.5as before, and the gateway run trace prints theopenai-codex/gpt-5.5provider/model pair when the run starts. Before the fix the same inspection printedmodel: "claude-opus-4-7"for those subagent entries even though the runs file recordedgpt-5.5.openai-codex/gpt-5.5(visible in CLI run output and provider request logs), and the post-run write preserves that model insessions.json, so subsequent turns of the same subagent session keep usinggpt-5.5. Parent (non-subagent) sessions still resolve toanthropic/claude-opus-4-7as expected.subagents.modeland onlyagents.defaults.subagents.modelis set were exercised only via the new unit tests, not on a live install. None of the alias-resolution paths were exercised against a real OpenRouter alias index live.Post-rebase verification (2026-05-12)
Rebase merged
upstream/main(e7e3e903bf) into the branch and addressed the clawsweeper P2 finding onsrc/auto-reply/reply/get-reply.ts:404-416. Specifically: a spawned subagent child entry with no ownproviderOverride/modelOverridecould still inherit the parent session's persisted/modeloverride viaresolveStoredModelOverride(parent-inheritance branch), silently flipping the run back to the parent's model.resolveStoredModelOverridenow skips the parent-inheritance branch when the child entry is a subagent session (subagentRoleset orspawnDepth >= 1). Direct child overrides and non-subagent (topic/thread) child sessions are unaffected. New regression coverage insrc/auto-reply/reply/stored-model-override.test.ts.Commands run:
pnpm tsgo(core typecheck) -> exit 0, no diagnosticspnpm test src/auto-reply/reply/stored-model-override.test.ts src/auto-reply/reply/directive-handling.defaults.subagent.test.ts src/agents/model-selection.test.ts src/gateway/session-utils.test.ts src/cron/isolated-agent.model-formatting.test.ts-> 4 vitest shards passed (236 tests)pnpm test src/auto-reply/reply.block-streaming.test.ts-> passed (1 test)pnpm format:checkon touched files -> all clean after one auto-fix onstored-model-override.tsnode scripts/run-oxlint.mjson touched files -> 0 warnings, 0 errors