Skip to content

fix(agents): honor subagent cron fallbacks#82295

Closed
steipete wants to merge 2 commits into
mainfrom
fix/subagent-fallback-policy-74985
Closed

fix(agents): honor subagent cron fallbacks#82295
steipete wants to merge 2 commits into
mainfrom
fix/subagent-fallback-policy-74985

Conversation

@steipete

@steipete steipete commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix isolated cron subagent fallback resolution so subagents.model.fallbacks drives both the outer model fallback runner and the embedded runner fallback flag.
  • Preserve payload fallback precedence, keep payload model overrides on their configured default model policy, and keep string subagent models strict.
  • Share cron subagent model-config selection between primary model resolution and fallback resolution, including metadata-only model config fallthrough coverage.
  • Add regression coverage and changelog entry for [Bug]: Embedded agent Kimi timeout with no fallback despite fallbacks configured in model_stack #74985.

Real behavior proof

Behavior addressed: #74985 isolated cron embedded runs with subagents.model.fallbacks now pass the fallback chain into model fallback execution and embedded failover state.
Real environment tested: local checkout, importing the real OpenClaw cron fallback resolver through the TS runtime loader.
Exact steps or command run after this patch: node --import tsx with a cron isolated agentTurn fixture that configures agents.defaults.subagents.model.fallbacks, then the same fixture with payload.model override; node scripts/run-vitest.mjs src/cron/isolated-agent.model-formatting.test.ts src/cron/isolated-agent/model-selection.test.ts src/cron/isolated-agent/run-fallback-policy.test.ts src/cron/isolated-agent/run.payload-fallbacks.test.ts src/agents/agent-scope.test.ts src/agents/pi-embedded-runner/run/failover-policy.test.ts; pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/agents/agent-scope.ts src/agents/agent-scope.test.ts src/cron/isolated-agent.model-formatting.test.ts src/cron/isolated-agent/model-selection.ts src/cron/isolated-agent/run-model-selection.runtime.ts src/cron/isolated-agent/run.test-harness.ts src/cron/isolated-agent/run-fallback-policy.ts src/cron/isolated-agent/run-fallback-policy.test.ts src/cron/isolated-agent/run.payload-fallbacks.test.ts src/cron/isolated-agent/run-executor.ts src/agents/pi-embedded-runner/run.ts; git diff --check; codex-review --full-access.
Evidence after fix: Terminal output from the runtime resolver probe:

{
  "isolated": [
    "openai-codex/gpt-5.4",
    "zai/glm-5"
  ],
  "payloadOverride": [
    "openai/gpt-5.4"
  ]
}

Focused Vitest also passed 6 files / 134 tests after rebase onto origin/main; oxfmt passed; git diff --check passed; codex-review clean after accepted fixups.
Observed result after fix: isolated cron agent turns resolve to the configured subagent fallback chain, payload model overrides use the configured default model fallback policy instead of subagent fallbacks, embedded runs receive modelFallbacksOverride, and metadata-only subagents.model falls through to the next selected model config.
What was not tested: live Kimi/provider timeout E2E.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M maintainer Maintainer-authored PR labels May 15, 2026
@clawsweeper

clawsweeper Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge.

Summary
The PR adds subagent-aware fallback resolution for isolated cron agent turns, forwards that fallback override into embedded Pi runs, and adds regression coverage plus a changelog entry.

Reproducibility: yes. at source level. The linked failure is traceable through isolated cron fallback resolution and embedded failover state, and the PR diff shows a remaining resolver-order case that skips global subagent fallbacks when the parent agent has its own model.

Real behavior proof
Override: A maintainer applied proof: override for this PR.

Next step before merge
Protected maintainer-labeled PR with overlapping current-main implementation and a precedence correctness finding; maintainer should decide whether to close it as superseded or request a rebase/fix.

Security
Cleared: The diff is limited to TypeScript fallback plumbing, focused tests, and changelog text, with no concrete security or supply-chain regression found.

Review findings

  • [P2] Check global subagent fallbacks before the parent model — src/agents/agent-scope.ts:196
Review details

Best possible solution:

Use the current-main implementation as the baseline, then either close this stale protected branch as superseded or rebase it with fallback selection that preserves per-agent subagent, global subagent, then parent-agent model precedence.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. The linked failure is traceable through isolated cron fallback resolution and embedded failover state, and the PR diff shows a remaining resolver-order case that skips global subagent fallbacks when the parent agent has its own model.

Is this the best way to solve the issue?

No, not as-is. Threading the fallback override into cron and embedded runs is the right narrow direction, but this branch's shared resolver still lets the parent agent model shadow global subagent fallback defaults; current main has a safer guarded implementation to start from.

Full review comments:

  • [P2] Check global subagent fallbacks before the parent model — src/agents/agent-scope.ts:196
    The new shared resolver walks agentConfig?.model before agents.defaults.subagents.model. In the common configuration from the related reports, the parent agent has its own model while the global subagent default carries the Kimi fallback chain; this returns the parent model's fallback policy, or [], and never reaches the configured global subagent fallbacks. Keep the global subagent default ahead of the parent agent model, or avoid sharing this helper for fallback policy.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.78

Acceptance criteria:

  • node scripts/run-vitest.mjs src/cron/isolated-agent/run-fallback-policy.test.ts src/cron/isolated-agent/run.payload-fallbacks.test.ts src/agents/agent-scope.test.ts src/agents/pi-embedded-runner/run/fallbacks.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/agents/agent-scope.ts src/cron/isolated-agent/run-fallback-policy.ts src/cron/isolated-agent/run.payload-fallbacks.test.ts src/agents/pi-embedded-runner/run/fallbacks.ts

What I checked:

Likely related people:

  • steipete: Authored the current-main cron subagent fallback commit and this PR's branch commits on the same fallback-policy and embedded-runner path. (role: recent fallback and cron area contributor; confidence: high; commits: e22a7e45a40a, 0dcc93f6672a, e632eab373f3; files: src/cron/isolated-agent/run-fallback-policy.ts, src/cron/isolated-agent/run-executor.ts, src/agents/agent-scope.ts)
  • joeykrug: Opened the related subagent model precedence issue and an overlapping subagent model-resolution PR that describe the same global subagent default versus parent model ordering problem. (role: related subagent model precedence reporter and PR author; confidence: medium; files: src/agents/model-selection.ts, src/cron/isolated-agent/model-selection.ts)
  • Ayaan Zaidi: Recent history on src/agents/pi-embedded-runner includes room-event and model-run adjacent work, so this person is a secondary routing candidate for embedded-runner side effects. (role: recent adjacent embedded-runner contributor; confidence: low; commits: 503c3d139c9a, 4db5cb1a203d; files: src/agents/pi-embedded-runner)

Remaining risk / open question:

  • No live Kimi/provider timeout E2E was inspected; the verdict is based on source, PR context, and focused regression coverage.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e22a7e45a40a.

@steipete steipete force-pushed the fix/subagent-fallback-policy-74985 branch from 757bd51 to e632eab Compare May 15, 2026 20:38
@steipete steipete added the proof: override Maintainer override for the external PR real behavior proof gate. label May 15, 2026
@steipete

Copy link
Copy Markdown
Contributor Author

Closing as superseded by current main.

The #74985 behavior is already fixed on main by e22a7e45a4 (fix(cron): honor subagent model fallbacks), and I posted the proof on #74985:

  • subagent cron fallback policy now resolves agents.defaults.subagents.model.fallbacks
  • run.payload-fallbacks.test.ts verifies those fallbacks reach the embedded runner as modelFallbacksOverride
  • focused regression bundle passed on e22a7e45a40a: 7 files / 121 tests

Thank you. The branch here still has useful related refactor/test work, but the user-visible fix this PR targets is no longer open against main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR proof: override Maintainer override for the external PR real behavior proof gate. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant