Skip to content

subagent example: parallel mode truncates output to 100 chars and drops failed-task diagnostics #4710

@TheSemicolon

Description

@TheSemicolon

Title

subagent example: parallel mode truncates output to 100 chars and drops failed-task diagnostics

Labels (suggested)

bug, area:examples, area:subagent

Affected versions

Reproduced on @earendil-works/pi-coding-agent 0.74.0, 0.74.1, and main HEAD (commit 3e5ad67e0f, 2026-05-07). Both bugs predate 0.74.0 — the file's last logic change is older than the 0.74.0 tag; only a package-scope rename has touched it since.


Summary

Two related defects in packages/coding-agent/examples/extensions/subagent/index.ts parallel mode result aggregator (lines 619–632) make the example unusable for orchestrator-style fan-outs:

  1. Output truncated to 100 chars per taskoutput.slice(0, 100) is the only path back to the parent model. Full text exists in details but the parent LLM never sees it, so synthesizing parallel research results is impossible.
  2. Failed-task diagnostics swallowed — only getFinalOutput(r.messages) is consulted. A child that exits non-zero before emitting message_end (unknown agent, model error, transport failure, abort) surfaces only "(no output)" to the parent — indistinguishable from a successful-but-silent run. The result object's errorMessage, stderr, and stopReason (all populated in such cases) are dropped.

The same file already implements the correct four-step fallback in single mode (line 652) and chain mode (line 541):

result.errorMessage || result.stderr || getFinalOutput(result.messages) || "(no output)"

This issue proposes mirroring that pattern in parallel mode and lifting the 100-char clamp to a documented byte cap.


Reproduction

Setup: any pi 0.74.x install with the subagent example loaded, plus one valid user-level agent producing more than ~100 chars of output (any built-in or example).

Invoke the subagent tool with two parallel tasks — one valid, one with a deliberately invalid agent name:

{
  "tasks": [
    { "agent": "<any-valid-agent>", "task": "produce a short paragraph (300+ chars) on any topic" },
    { "agent": "this-agent-does-not-exist", "task": "anything" }
  ]
}

This single invocation triggers both defects: the valid task's output is truncated to 100 chars, and the failed task's Unknown agent diagnostic is dropped.


Captured evidence

Real output from the repro above, run against pi 0.74.1.

Valid task's full output (380 chars; captured from details):

Semantic line breaks place a newline after each complete clause or sentence rather than wrapping at a fixed column width, so source structure mirrors meaning. Because each line is a self-contained unit, edits show up as isolated diff hunks instead of cascading reflow churn, making code review, blame, and conflict resolution dramatically cleaner.

Failed task's available diagnostic (runSingleAgent already constructs this at index.ts:257-265):

Unknown agent: "this-agent-does-not-exist". Available agents: [list elided].

What the parent model actually receives under current upstream code (the content[].text returned by the parallel tool):

Parallel: 1/2 succeeded

[docs-expert] completed: Semantic line breaks place a newline after each complete clause or sentence rather than wrapping at...

[this-agent-does-not-exist] failed: (no output)

The valid task is clipped at 100 chars mid-sentence; the failed task's diagnostic is gone.

What the parent model receives with the proposed patch applied (same invocation, same data):

Parallel: 1/2 succeeded

### [docs-expert] completed

Semantic line breaks place a newline after each complete clause or sentence rather than wrapping at a fixed column width, so source structure mirrors meaning. Because each line is a self-contained unit, edits show up as isolated diff hunks instead of cascading reflow churn, making code review, blame, and conflict resolution dramatically cleaner.

---

### [this-agent-does-not-exist] failed

Unknown agent: "this-agent-does-not-exist". Available agents: [list elided].

The valid task arrives intact (well under the 50 KB cap); the failed task's diagnostic — already computed by runSingleAgent and stored on result.errorMessage — is surfaced instead of dropped.


Why this matters

The subagent example is the natural substrate for orchestrator-style fan-outs (multi-perspective code review, parallel research, cross-domain investigation). Both defects above silently break the orchestrator pattern:

  • Defect 1: the parent LLM cannot synthesize parallel outputs it cannot see.
  • Defect 2: operator-facing errors (rate-limit hit, unknown agent name, model error, child abort) become invisible — the orchestrator misclassifies a hard failure as a silent success and produces a confident-but-wrong synthesis.

Defect 2 is particularly insidious because the same diagnostic the parallel branch discards is already correctly surfaced by single mode and chain mode in the same file — so any user comparing modes finds the inconsistency surprising rather than expected.


Proposed fix

Single change, ~20 lines, covering both defects in one place. Hoists a named module-scope constant for the cap, mirrors the existing single/chain-mode failure predicate and fallback chain, and tags failed tasks with their stopReason so the cause is visible without expanding TUI details.

--- a/packages/coding-agent/examples/extensions/subagent/index.ts
+++ b/packages/coding-agent/examples/extensions/subagent/index.ts
@@ -<existing module-scope constants block>
 const MAX_PARALLEL_TASKS = 8;
 const MAX_CONCURRENCY = 4;
+// Per-task byte cap for the parallel summary returned to the parent model.
+// Matches the DEFAULT_MAX_BYTES (50 KB) convention used by truncated-tool.ts
+// and tool-override.ts elsewhere in the examples. Large enough to carry a
+// structured review or research report; small enough to stay well under the
+// harness tool-result limit even with 8 tasks.
+const PER_TASK_OUTPUT_CAP = 50_000;
@@ -619,11 +625,28 @@
                 const successCount = results.filter((r) => r.exitCode === 0).length;
+                // Parallel summary needs to (a) return enough output for the parent
+                // model to actually synthesize results — a 100-char preview is
+                // unusable — and (b) surface diagnostics for failed children,
+                // mirroring the fallback already used by single mode (~line 652)
+                // and chain mode (~line 541). Without (b), a child that exits
+                // non-zero before emitting an assistant message_end (unknown agent,
+                // model error, transport failure, abort) surfaces only "(no output)"
+                // — indistinguishable from a silent success.
                 const summaries = results.map((r) => {
-                    const output = getFinalOutput(r.messages);
-                    const preview = output.slice(0, 100) + (output.length > 100 ? "..." : "");
-                    return `[${r.agent}] ${r.exitCode === 0 ? "completed" : "failed"}: ${preview || "(no output)"}`;
+                    const isError =
+                        r.exitCode !== 0 || r.stopReason === "error" || r.stopReason === "aborted";
+                    const output = isError
+                        ? r.errorMessage || r.stderr || getFinalOutput(r.messages) || "(no output)"
+                        : getFinalOutput(r.messages) || "(no output)";
+                    const status = isError
+                        ? `failed${r.stopReason && r.stopReason !== "end" ? ` (${r.stopReason})` : ""}`
+                        : "completed";
+                    const body =
+                        output.length > PER_TASK_OUTPUT_CAP
+                            ? `${output.slice(0, PER_TASK_OUTPUT_CAP)}\n\n…[truncated ${output.length - PER_TASK_OUTPUT_CAP} bytes; full output preserved in tool details]`
+                            : output;
+                    return `### [${r.agent}] ${status}\n\n${body}`;
                 });
                 return {
                     content: [
                         {
                             type: "text",
-                            text: `Parallel: ${successCount}/${results.length} succeeded\n\n${summaries.join("\n\n")}`,
+                            text: `Parallel: ${successCount}/${results.length} succeeded\n\n${summaries.join("\n\n---\n\n")}`,
                         },
                     ],
                     details: makeDetails("parallel")(results),

Cross-references in the same file that establish the precedent:

  • Single-mode fallback: index.ts:652
  • Chain-mode fallback: index.ts:541
  • Failure predicate (single mode): index.ts:678
  • Failure predicate (chain mode): index.ts:545
  • SingleResult shape: index.ts:142-155 (errorMessage?, stderr, stopReason? all populated on failure paths at index.ts:338-339)
  • getFinalOutput returns "" on empty messages (index.ts:163-173) — this is the state of a child that crashed before any message_end, which is why the fallback chain matters.

Tool-result shape change

The parallel mode content[].text shape changes from:

[agent] completed: <≤100-char preview>...

To:

### [agent] completed

<full output, up to 50 KB>

with --- separators between tasks. Intentional, model-friendlier format; calling out for any downstream snippets that may assert on the old preview format.


Open questions for maintainer

  1. Test pattern for examples/extensions/*? Happy to add a minimal node --test file alongside index.ts that constructs synthetic SingleResult[] and asserts on the rendered summary, if that fits the conventions.
  2. PER_TASK_OUTPUT_CAP configurable? Should this be wired through the extension's init options (alongside agentScope / confirmProjectAgents), or is a hard-coded module-scope constant preferable for an example?
  3. Harness tool-result content[].text cap? 50 KB was chosen to match the DEFAULT_MAX_BYTES precedent in truncated-tool.ts and tool-override.ts. If the harness applies its own cap to content[].text, the value should align (or stay comfortably under).
  4. TUI renderer parallel-branch predicate (exitCode > 0 at index.ts:896-897, treating -1 as running) differs from the single/chain-mode predicate this patch mirrors (exitCode !== 0 || stopReason in {error, aborted}). Should the renderer be unified in this PR or kept as a follow-up?
  5. PR shape preference if you would like one — the diff above is a single self-contained change; happy to open it as a PR on request. Two commits (preview-clamp first, diagnostics-fallback second) would split cleanly along the two conceptual fixes, but the line ranges overlap, so one squashed commit may be preferable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    inprogressIssue is being worked on

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions