Skip to content

fix(agent): prevent executor handoff role confusion / 防止执行器交接角色混淆#3541

Merged
esengine merged 3 commits into
main-v2from
codex/executor-handoff-guard
Jun 8, 2026
Merged

fix(agent): prevent executor handoff role confusion / 防止执行器交接角色混淆#3541
esengine merged 3 commits into
main-v2from
codex/executor-handoff-guard

Conversation

@SivanCola

@SivanCola SivanCola commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Refs #3490

Summary

  • strengthen the planner-to-executor handoff so executor models ignore planner-only limitations
  • add an executor handoff guard that retries when the executor answers as the planner instead of using tools
  • cover the Chinese planner-style confusion case with a coordinator regression test

Testing

  • go test ./internal/agent ./internal/boot ./internal/config ./internal/cli
  • cd desktop && go test . -run 'TestSettings|TestSetAgent|Test.*Settings|TestProviderViewFromEntry'
  • npm run check:css && npm run test:typecheck

@SivanCola SivanCola requested a review from esengine as a code owner June 8, 2026 07:32
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development agent Core agent loop (internal/agent, internal/control) and removed v2 Go rewrite (1.x) — main-v2 branch, active development labels Jun 8, 2026
@github-actions github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 8, 2026
Bare permission claims (no write access / 没有写入权限) are the exact
vocabulary a correctly-executing model uses to report a real blocker;
matching them mis-fired the planner-confusion guard and could hard-error
a legitimately-blocked task. Drop those phrases, keep role-identity ones,
and lock it with a regression test.

@esengine esengine left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for the planner→executor confusion (#3490), with a regression test. I tightened the confusion matcher to role-identity phrases only — bare permission claims like "no write access" / "没有写入权限" were also matching legitimate blocker reports (the exact vocabulary the handoff prompt asks the executor to use), which could hard-error a genuinely-blocked task. Added a regression test for that. Full go test ./internal/agent is green.

@esengine esengine enabled auto-merge (squash) June 8, 2026 09:27
@esengine esengine disabled auto-merge June 8, 2026 09:33
The keyword list matched on vocabulary, so it both missed paraphrases
and false-flagged legitimate blocker reports ("no write access" is the
exact phrasing the handoff prompt asks the executor to use). Replace it
with a structural signal: if the executor reaches a final answer in
handoff mode having called zero tools, nudge once with the executor
instructions, then trust it — no keyword table, no hard error. Any tool
use (including read-only) exempts it, so only a true punt triggers the
nudge. Language-independent.

Tests cover both the nudge and the no-nudge-when-acting path; verified
end-to-end with a live deepseek-pro planner / deepseek-flash executor
run that wrote and ran the file without tripping the guard.
@esengine esengine merged commit 5cf486f into main-v2 Jun 8, 2026
9 checks passed
@esengine esengine deleted the codex/executor-handoff-guard branch June 8, 2026 10:04
dorokuma pushed a commit to dorokuma/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
…engine#3541)

* fix(agent): keep executor from answering as planner

* fix(agent): scope handoff-confusion match to role-identity phrases

Bare permission claims (no write access / 没有写入权限) are the exact
vocabulary a correctly-executing model uses to report a real blocker;
matching them mis-fired the planner-confusion guard and could hard-error
a legitimately-blocked task. Drop those phrases, keep role-identity ones,
and lock it with a regression test.

* fix(agent): detect executor punt by behavior, not keywords

The keyword list matched on vocabulary, so it both missed paraphrases
and false-flagged legitimate blocker reports ("no write access" is the
exact phrasing the handoff prompt asks the executor to use). Replace it
with a structural signal: if the executor reaches a final answer in
handoff mode having called zero tools, nudge once with the executor
instructions, then trust it — no keyword table, no hard error. Any tool
use (including read-only) exempts it, so only a true punt triggers the
nudge. Language-independent.

Tests cover both the nudge and the no-nudge-when-acting path; verified
end-to-end with a live deepseek-pro planner / deepseek-flash executor
run that wrote and ran the file without tripping the guard.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants