Skip to content

feat(stdio): structured eviction notice when stale client stomped#10

Merged
frostbun merged 1 commit into
betafrom
feat/stdio-eviction-notice
Jun 2, 2026
Merged

feat(stdio): structured eviction notice when stale client stomped#10
frostbun merged 1 commit into
betafrom
feat/stdio-eviction-notice

Conversation

@tuha263

@tuha263 tuha263 commented May 27, 2026

Copy link
Copy Markdown
Collaborator

Problem

In stdio transport (single-client by design), when a new MCP client (e.g. a second Claude Code session) connects to the Unity bridge, the existing client's TCP socket is silently closed. The stale client sees only Connection closed before reading expected bytes — explicitly classified isBenign and only debug-logged in the bridge. From the stomped Claude session's POV, this looks identical to: Unity crashed, the bridge died, or a network blip — diagnosis requires manual inspection of Unity process state.

This is recurring enough that a downstream consumer wrote a dedicated rule (unity-forbidden-operations.md § "MCP timeout ≠ bridge disconnect — diagnose before escalating") to encode the manual diagnostic steps.

Fix

Two coordinated changes — must land together to be useful:

  1. C# (MCPForUnity/Editor/Services/Transport/Transports/StdioBridgeHost.cs): Before closing each stale client, send a structured eviction frame:

    {
      "status": "evicted",
      "reason": "stdio_single_client_stomped",
      "error": "Another MCP client connected from <ep>. Stdio transport allows only one MCP client at a time; this connection has been evicted. If you need concurrent MCP clients (e.g. multiple Claude Code sessions), switch the bridge to HTTP transport mode.",
      "new_client_endpoint": "127.0.0.1:54321",
      "evicted_at_unix_ms": 1748345678901
    }

    Best-effort write with 500ms timeout — does not block the accept loop if the socket is already half-dead.

  2. Python (Server/src/transport/legacy/unity_connection.py): Frame reader detects status="evicted" payloads and raises a new EvictedByOtherClientError (subclasses ConnectionError) instead of seeing the subsequent socket close as a generic IOException. Non-eviction frames pass through unchanged. Non-JSON / non-UTF8 payloads also pass through so the existing JSON-validation in send_command surfaces the correct error for malformed responses.

Downstream consumers (Claude Code sessions) now get a clear, actionable error: "Another MCP client connected from ; switch to HTTP transport for concurrent clients" instead of "Connection closed before reading expected bytes."

Test plan

  • C# (StdioBridgeReconnectTests.cs) — new NewClient_SendsStructuredEvictionFrame_BeforeClosingStaleClient: 2 sequential TCP connects to the bridge, asserts first receives an eviction frame with all required fields (status, reason, error, new_client_endpoint, evicted_at_unix_ms).
  • Python (Server/tests/integration/test_eviction_frame.py) — 4 new tests: typed exception raised with payload, non-eviction passthrough, non-JSON passthrough, exception subclasses ConnectionError. All 4 pass locally; existing test_transport_framing.py (4 tests) + test_transport_characterization.py (55 tests) regress to 0 failures.

Notes

  • Companion change for Gap 2 (per-exception classification in Server/src/transport/unity_transport.py) is on a sibling branch feat/transport-error-classification — orthogonal change that can land independently.
  • Source review at consumer: plans/reports/review-260527-unity-mcp-error-clarity.md in DOTS-AI workspace.

When a new MCP client connects to the stdio bridge, the existing client
was silently closed — leaving the stomped client with a generic
"Connection closed before reading expected bytes" that classifies as
benign and offers no diagnostic guidance.

C# (StdioBridgeHost.cs): before closing each stale client, send a
structured JSON frame {status:"evicted", reason:"stdio_single_client_stomped",
error:<human msg>, new_client_endpoint, evicted_at_unix_ms}. Best-effort
write with 500ms tight timeout — never blocks the accept loop.

Python (unity_connection.py): frame reader detects status="evicted"
payloads and raises a new EvictedByOtherClientError (ConnectionError
subclass) carrying the C#-side payload. Non-eviction frames pass
through unchanged. Non-JSON / non-UTF8 payloads also pass through so
the existing JSON validation in send_command surfaces the correct
error type for malformed responses.

Tests:
- C# StdioBridgeReconnectTests: verifies 2nd client connect triggers
  eviction frame on 1st client with all required fields (status,
  reason, error, new_client_endpoint, evicted_at_unix_ms).
- Python test_eviction_frame: verifies typed exception raised with
  payload fields, non-eviction frames pass through, non-JSON payloads
  pass through, EvictedByOtherClientError subclasses ConnectionError.

@frostbun frostbun left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triage merit-pass: structured eviction notice (C#+Python) with typed exception, best-effort 500ms write, ships integration + EditMode tests. Low risk.

@frostbun frostbun merged commit 154ed51 into beta Jun 2, 2026
1 check failed
@frostbun frostbun deleted the feat/stdio-eviction-notice branch June 2, 2026 02:39

@the1studio-code-reviewer the1studio-code-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Claude Code Review

Decision: 💬 COMMENT

Summary

My review is complete. Here's a summary of what I found:

Review Summary

Overall: Well-motivated, well-scoped PR. The eviction frame approach is the right fix for silent socket closes being indistinguishable from crashes.

Key Issue (requesting changes)

UnicodeDecodeError not caught in Python code: The check_for_eviction_frame function catches json.JSONDecodeError but not UnicodeDecodeError. If a raw frame contains invalid UTF-8, .decode('utf-8') raises before json.loads is reached — causing an unhandled exception instead of the claimed passthrough behavior. This needs a fix and a corresponding test case.

Suggestions (non-blocking)

  • Use DateTimeOffset.UtcNow.ToUnixTimeMilliseconds() instead of manual epoch math in C#
  • Consider a JSON serializer instead of string interpolation for the eviction frame
  • Add a non-UTF-8 test case to the Python test suite
  • Document buffer size/timeout assumptions in the C# test

Strengths

  • Good error hierarchy (EvictedByOtherClientError subclassing ConnectionError)
  • Best-effort 500ms timeout on eviction write
  • Actionable error message telling users to switch to HTTP transport
  • Solid test coverage on both sides

I was unable to read the C# file (StdioBridgeHost.cs) — it wasn't found at the expected path. My C# comments are based on the PR description and diff context. The author should verify my inline comments on that file still apply to the correct line numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants