Skip to content

fix: dedupe persistent toolset-failure notifications#2943

Merged
dgageot merged 3 commits into
docker:mainfrom
dgageot:board/9bf56e61f12ed8e5
Jun 1, 2026
Merged

fix: dedupe persistent toolset-failure notifications#2943
dgageot merged 3 commits into
docker:mainfrom
dgageot:board/9bf56e61f12ed8e5

Conversation

@dgageot

@dgageot dgageot commented Jun 1, 2026

Copy link
Copy Markdown
Member

Toolset initialization and Tools() listing failures were stacking persistent notifications in the TUI when a toolset remained down across multiple conversation turns. The root cause was that the agent side had no mechanism to suppress duplicate warnings, so the TUI received the same error on every turn and piled them on-screen.

This fix adds a once-per-streak guard to the StartableToolSet so a failure warning is emitted only once until the streak clears (either via a successful Start/Tools or an explicit Stop). The failureStreak type encapsulates the streak-tracking logic and is reused for both Start and Tools listing failures, keeping the warning deduplication uniform and reliable across all callers—including non-TUI consumers like ACP, which persist warnings into the transcript.

The TUI-layer dedup logic is removed as it becomes redundant; the agent-side guard is the single source of truth for this behavior.

Closes #2884.

dgageot added 3 commits June 1, 2026 12:09
Collapse the two duplicated once-per-streak warning guards (Start and
Tools listing) into a small reusable failureStreak type, removing four
bool fields and the repeated fail/reset/shouldReport logic.
The agent-side once-per-streak guard already ensures a toolset-failure
warning is emitted only once per streak, so the on-screen dedup in
handleShow was redundant defense-in-depth. Dropping it restores the
simpler notification component; an identical error reappearing after the
previous one auto-hides (10s) is acceptable.
@dgageot dgageot requested a review from a team as a code owner June 1, 2026 11:18

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

Reviewed pkg/tools/startable.go, pkg/tools/startable_test.go, pkg/agent/agent.go, and pkg/agent/agent_test.go.

Summary: The failureStreak helper is correctly implemented — fail() only sets active/pending on the first call of a streak, reset() clears both fields, and shouldReport() returns true exactly once per streak. Mutex discipline is sound across Start, Stop, ShouldReportFailure, and ShouldReportListFailure. The Tools() method calls the inner toolset outside the lock (correct — avoids holding the lock during a potentially slow RPC) and updates the streak under the lock before any caller can observe it in the sequential collectTools loop. The listFlappyToolSet test helper correctly increments callIdx unconditionally before the nil check, so error sequences are consumed faithfully. Test coverage addresses the once-per-streak guarantee, recovery/reset behavior, and the Stop path. No bugs found in the changed code.

@aheritier aheritier added area/tui For features/issues/fixes related to the TUI area/mcp MCP protocol, MCP tool servers, integration kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 1, 2026

@aheritier aheritier left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. CI green across all checks; changes are sound with appropriate test coverage for the de-duplicated list warnings and failure-streak reset behavior. No blocking concerns.

@dgageot dgageot merged commit 0c2cf04 into docker:main Jun 1, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mcp MCP protocol, MCP tool servers, integration area/tui For features/issues/fixes related to the TUI kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tui: persistent toolset-failure notifications stack without dedup, masking the screen on long MCP failure streaks

3 participants