test(e2e): add subagent-delegation task forcing a `task` sub-agent by esengine · Pull Request #3725 · esengine/DeepSeek-Reasonix

esengine · 2026-06-09T15:27:37Z

Summary

Adds an e2e suite task (benchmarks/e2e/tasks/subagent-delegation) that reliably forces the agent to delegate via the task tool, so the suite actually exercises the sub-agent path end-to-end against the real provider.

Seeds workdir/data/{alpha,beta,gamma}.txt with arbitrary name=number lines (17, 28, 41).
The prompt mandates reading them through a single task sub-agent (not inline) and writing their sum to result.txt.
verify.sh grades result.txt == 86.

Why

The existing suite (compaction / fix-add-bug / fizzbuzz / palindrome) never reliably triggers delegation, so it gives no signal on the sub-agent machinery. Because e2ebench drives the headless reasonix run path, this task covers a fresh task delegation in headless mode — the exact path that regressed when persisted sub-agent transcripts made a parent session mandatory (fixed in #3586). The arbitrary numbers mean the answer can only come from actually reading the files; the run log shows the task invocation as corroboration.

Scope / notes

continue_from / fork_from need a persisted session, which headless run never has, so they are intentionally out of scope here (they correctly error in that path). Those still need manual TUI/serve/desktop verification.
The e2e suite is always copied from the default branch (main-v2), so this must land here to be picked up by /e2e. Until Continue subagent transcripts #3586's fix reaches main-v2, this task will fail for binaries built without the ephemeral fallback — which is the intended regression signal (accuracy is reported, not a hard gate).

Test plan

task.toml parses with the harness's BurntSushi/toml loader (no unknown keys).
Grader verified locally: 86 (incl. surrounding whitespace) → pass; 85/99/missing → fail.
go build ./cmd/e2ebench green.
Real-provider run requires /e2e after merge.

The committed e2e suite has no task that reliably makes the model delegate, so it never exercises the sub-agent path end-to-end. This task seeds three files with arbitrary numbers and instructs the agent to read them via a single `task` sub-agent (not inline), then write their sum to result.txt. Because e2ebench drives the headless `reasonix run` path, this covers a fresh `task` delegation there — the exact path that regressed when persisted sub-agent transcripts made a parent session mandatory. continue_from/fork_from need a persisted session and are out of scope for the headless harness.

esengine#3725) The committed e2e suite has no task that reliably makes the model delegate, so it never exercises the sub-agent path end-to-end. This task seeds three files with arbitrary numbers and instructs the agent to read them via a single `task` sub-agent (not inline), then write their sum to result.txt. Because e2ebench drives the headless `reasonix run` path, this covers a fresh `task` delegation there — the exact path that regressed when persisted sub-agent transcripts made a parent session mandatory. continue_from/fork_from need a persisted session and are out of scope for the headless harness.

esengine requested a review from SivanCola as a code owner June 9, 2026 15:27

github-actions Bot added the v2 Go rewrite (1.x) — main-v2 branch, active development label Jun 9, 2026

esengine force-pushed the test/e2e-subagent-delegation branch from 99be7c3 to 15d215c Compare June 9, 2026 16:00

esengine merged commit 0234218 into main-v2 Jun 9, 2026
13 checks passed

esengine deleted the test/e2e-subagent-delegation branch June 9, 2026 16:04

Bernardxu123 mentioned this pull request Jun 10, 2026

[Meta] Issues 分组审核报告 — 按模块分类 & 优先级排序 #3275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): add subagent-delegation task forcing a `task` sub-agent#3725

test(e2e): add subagent-delegation task forcing a `task` sub-agent#3725
esengine merged 1 commit into
main-v2from
test/e2e-subagent-delegation

esengine commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented Jun 9, 2026

Summary

Why

Scope / notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant