Skip to content

test(e2e): add subagent-delegation task forcing a task sub-agent#3725

Merged
esengine merged 1 commit into
main-v2from
test/e2e-subagent-delegation
Jun 9, 2026
Merged

test(e2e): add subagent-delegation task forcing a task sub-agent#3725
esengine merged 1 commit into
main-v2from
test/e2e-subagent-delegation

Conversation

@esengine

@esengine esengine commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Adds an e2e suite task (benchmarks/e2e/tasks/subagent-delegation) that reliably forces the agent to delegate via the task tool, so the suite actually exercises the sub-agent path end-to-end against the real provider.

  • Seeds workdir/data/{alpha,beta,gamma}.txt with arbitrary name=number lines (17, 28, 41).
  • The prompt mandates reading them through a single task sub-agent (not inline) and writing their sum to result.txt.
  • verify.sh grades result.txt == 86.

Why

The existing suite (compaction / fix-add-bug / fizzbuzz / palindrome) never reliably triggers delegation, so it gives no signal on the sub-agent machinery. Because e2ebench drives the headless reasonix run path, this task covers a fresh task delegation in headless mode — the exact path that regressed when persisted sub-agent transcripts made a parent session mandatory (fixed in #3586). The arbitrary numbers mean the answer can only come from actually reading the files; the run log shows the task invocation as corroboration.

Scope / notes

  • continue_from / fork_from need a persisted session, which headless run never has, so they are intentionally out of scope here (they correctly error in that path). Those still need manual TUI/serve/desktop verification.
  • The e2e suite is always copied from the default branch (main-v2), so this must land here to be picked up by /e2e. Until Continue subagent transcripts #3586's fix reaches main-v2, this task will fail for binaries built without the ephemeral fallback — which is the intended regression signal (accuracy is reported, not a hard gate).

Test plan

  • task.toml parses with the harness's BurntSushi/toml loader (no unknown keys).
  • Grader verified locally: 86 (incl. surrounding whitespace) → pass; 85/99/missing → fail.
  • go build ./cmd/e2ebench green.
  • Real-provider run requires /e2e after merge.

@esengine esengine requested a review from SivanCola as a code owner June 9, 2026 15:27
@github-actions github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 9, 2026
The committed e2e suite has no task that reliably makes the model delegate,
so it never exercises the sub-agent path end-to-end. This task seeds three
files with arbitrary numbers and instructs the agent to read them via a
single `task` sub-agent (not inline), then write their sum to result.txt.

Because e2ebench drives the headless `reasonix run` path, this covers a fresh
`task` delegation there — the exact path that regressed when persisted
sub-agent transcripts made a parent session mandatory. continue_from/fork_from
need a persisted session and are out of scope for the headless harness.
@esengine esengine force-pushed the test/e2e-subagent-delegation branch from 99be7c3 to 15d215c Compare June 9, 2026 16:00
@esengine esengine merged commit 0234218 into main-v2 Jun 9, 2026
13 checks passed
@esengine esengine deleted the test/e2e-subagent-delegation branch June 9, 2026 16:04
SuMuxi66 pushed a commit to SuMuxi66/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
esengine#3725)

The committed e2e suite has no task that reliably makes the model delegate,
so it never exercises the sub-agent path end-to-end. This task seeds three
files with arbitrary numbers and instructs the agent to read them via a
single `task` sub-agent (not inline), then write their sum to result.txt.

Because e2ebench drives the headless `reasonix run` path, this covers a fresh
`task` delegation there — the exact path that regressed when persisted
sub-agent transcripts made a parent session mandatory. continue_from/fork_from
need a persisted session and are out of scope for the headless harness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant