fix(agent): suppress tool-use assistant text#71589
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Greptile SummaryThis PR fixes a bug where agent runs were emitting both intermediate tool-use assistant text and the final answer as user-visible replies. It expands Confidence Score: 4/5Safe to merge; the fix is well-scoped and the two P2 findings are non-blocking. Only P2-level findings: a gap in test coverage for the text_end channel and a theoretical miss of snake_case content-type variants. The core suppression logic is correct and the existing commentary phase behaviour is preserved. No files require special attention beyond the P2 notes above. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.suppresses-commentary-phase-output.test.ts
Line: 10-40
Comment:
**Missing `blockReplyBreak: "text_end"` coverage**
The new regression test only exercises the `blockReplyBreak: "message_end"` path. The `handleMessageUpdate` code has a separate `text_end`-channel flush at lines 617–634 (`ctx.blockChunker?.drain` and the deferred `flushBlockReplyBuffer`). A parallel fixture with `blockReplyBreak: "text_end"` would confirm those paths are also suppressed for unphased tool-use messages and guard against future regressions there.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.messages.ts
Line: 40
Comment:
**Snake-case content-type variants not covered**
`TOOL_CALL_CONTENT_TYPES` contains `"toolCall"`, `"toolUse"`, and `"functionCall"` (camelCase). Some providers (e.g. OpenAI-compatible adapters) may emit `"tool_call"`, `"tool_use"`, or `"function_call"` (snake_case). If any upstream adapter normalises content block types to snake_case before emitting events, the content-block branch of `hasToolUseContinuation` would miss them. The `stopReason === "toolUse"` path would still fire, so this only matters when `stopReason` is absent and the detection falls back to content scanning.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(agent): suppress tool-use assistant ..." | Re-trigger Greptile |
| it("suppresses assistant messages that continue into tool use without phase metadata", () => { | ||
| const onBlockReply = vi.fn(); | ||
| const onPartialReply = vi.fn(); | ||
| const { emit, subscription } = createSubscribedSessionHarness({ | ||
| runId: "run", | ||
| onBlockReply, | ||
| onPartialReply, | ||
| blockReplyBreak: "message_end", | ||
| }); | ||
|
|
||
| const toolUseMessage = { | ||
| role: "assistant", | ||
| content: [ | ||
| { type: "text", text: "I'll check this now." }, | ||
| { type: "toolCall", id: "call-1", name: "read", input: { path: "README.md" } }, | ||
| ], | ||
| stopReason: "toolUse", | ||
| } as AssistantMessage; | ||
|
|
||
| emit({ type: "message_start", message: toolUseMessage }); | ||
| emit({ | ||
| type: "message_update", | ||
| message: toolUseMessage, | ||
| assistantMessageEvent: { type: "text_delta", delta: "I'll check this now." }, | ||
| }); | ||
| emit({ type: "message_end", message: toolUseMessage }); | ||
|
|
||
| expect(onBlockReply).not.toHaveBeenCalled(); | ||
| expect(onPartialReply).not.toHaveBeenCalled(); | ||
| expect(subscription.assistantTexts).toEqual([]); | ||
| }); |
There was a problem hiding this comment.
Missing
blockReplyBreak: "text_end" coverage
The new regression test only exercises the blockReplyBreak: "message_end" path. The handleMessageUpdate code has a separate text_end-channel flush at lines 617–634 (ctx.blockChunker?.drain and the deferred flushBlockReplyBuffer). A parallel fixture with blockReplyBreak: "text_end" would confirm those paths are also suppressed for unphased tool-use messages and guard against future regressions there.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.suppresses-commentary-phase-output.test.ts
Line: 10-40
Comment:
**Missing `blockReplyBreak: "text_end"` coverage**
The new regression test only exercises the `blockReplyBreak: "message_end"` path. The `handleMessageUpdate` code has a separate `text_end`-channel flush at lines 617–634 (`ctx.blockChunker?.drain` and the deferred `flushBlockReplyBuffer`). A parallel fixture with `blockReplyBreak: "text_end"` would confirm those paths are also suppressed for unphased tool-use messages and guard against future regressions there.
How can I resolve this? If you propose a fix, please make it concise.| @@ -37,8 +37,37 @@ import { | |||
| promoteThinkingTagsToBlocks, | |||
| } from "./pi-embedded-utils.js"; | |||
|
|
|||
| const TOOL_CALL_CONTENT_TYPES = new Set(["toolCall", "toolUse", "functionCall"]); | |||
There was a problem hiding this comment.
Snake-case content-type variants not covered
TOOL_CALL_CONTENT_TYPES contains "toolCall", "toolUse", and "functionCall" (camelCase). Some providers (e.g. OpenAI-compatible adapters) may emit "tool_call", "tool_use", or "function_call" (snake_case). If any upstream adapter normalises content block types to snake_case before emitting events, the content-block branch of hasToolUseContinuation would miss them. The stopReason === "toolUse" path would still fire, so this only matters when stopReason is absent and the detection falls back to content scanning.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.messages.ts
Line: 40
Comment:
**Snake-case content-type variants not covered**
`TOOL_CALL_CONTENT_TYPES` contains `"toolCall"`, `"toolUse"`, and `"functionCall"` (camelCase). Some providers (e.g. OpenAI-compatible adapters) may emit `"tool_call"`, `"tool_use"`, or `"function_call"` (snake_case). If any upstream adapter normalises content block types to snake_case before emitting events, the content-block branch of `hasToolUseContinuation` would miss them. The `stopReason === "toolUse"` path would still fire, so this only matters when `stopReason` is absent and the detection falls back to content scanning.
How can I resolve this? If you propose a fix, please make it concise.8aa46ee to
83f7a5d
Compare
|
Rebased this branch on current |
|
Pushed a follow-up commit: This addresses the remaining gap behind #71588. The existing suppression path covers commentary/tool-use assistant text once the phase is already resolved, but it does not fully cover the delayed-phase stream case:
The new commit keeps the narrow suppression behavior, but additionally clears suppressed delta/block/chunker buffers and rolls Local validation:
|
ba51b30 to
144cc6c
Compare
144cc6c to
3d3a202
Compare
|
Codex review: needs real behavior proof before merge. Latest ClawSweeper review: 2026-05-20 18:37 UTC / May 20, 2026, 2:37 PM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. Source inspection gives a high-confidence path: emit an assistant PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land one rebased embedded-subscribe suppression path that quarantines unclassified assistant text until visibility is known, recognizes the repo’s structured tool-call shapes, preserves explicit Do we have a high-confidence way to reproduce the issue? Yes. Source inspection gives a high-confidence path: emit an assistant Is this the best way to solve the issue? No. The handler is the right surface, but clearing buffers only after classification is too late for already emitted chunks; the safer fix is to quarantine phase-pending text before visible callbacks and cover the full structured tool-call contract. Label justifications:
Full review comments:
Overall correctness: patch is incorrect What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 447a3643c69b. |
844c674 to
edddde9
Compare
d02542f to
07b0b62
Compare
07b0b62 to
a6f2ffc
Compare
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
Summary
stopReason: "toolUse"or tool-call content) from user-visible assistant outputfinal_answerphase output deliverableFixes #71588.
Notes
The Feishu group trigger policy (
groupPolicy=open/requireMention=false) can make this easier to hit because more group messages start agent runs, but it is not the direct root cause. The direct issue is one agent run exposing both intermediate tool-use assistant text and the later final answer as user-visible replies.Validation
corepack pnpm vitest run src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.suppresses-commentary-phase-output.test.tscorepack pnpm tsgo:corecorepack pnpm exec oxlint --tsconfig tsconfig.oxlint.core.json src/agents/pi-embedded-subscribe.handlers.messages.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.suppresses-commentary-phase-output.test.tscorepack pnpm exec oxfmt --check src/agents/pi-embedded-subscribe.handlers.messages.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.suppresses-commentary-phase-output.test.tsFull
corepack pnpm lint:corecurrently fails on an unrelated pre-existingno-array-sortfinding insrc/agents/cli-runner.spawn.test.ts:1029.