Skip to content

Fix malformed Unicode in outgoing MCP responses #1447

@frankhommers

Description

@frankhommers

A browser action can succeed, but the follow-up MCP response fails to serialize with:

the request body is not valid JSON: invalid high surrogate in string

This appears to happen when tool output includes malformed Unicode from page-derived text.

Reproduction

  • Start @playwright/mcp with --extension
  • Navigate to a page containing problematic text content
  • Perform a browser action such as browser_click
  • Observe that the action succeeds, but the returned MCP response fails

Observed behavior

The browser action completes, but the client receives an error during response handling:

invalid high surrogate in string

Expected behavior

If page-derived text contains malformed Unicode, MCP should still return a valid JSON-RPC response instead of failing serialization.

Likely root cause

The outgoing MCP message contains a JavaScript string with a lone surrogate or other non-well-formed Unicode sequence. This is most likely coming from page-derived strings such as accessibility snapshot text, console text, page title, URL, or other extracted content.

Evidence

A local wrapper that sanitizes outgoing JSON-RPC string values before transport serialization fixes the issue without changing browser behavior. That strongly suggests the bug is in outgoing message serialization, not in the browser action itself.

Suggested fix

Sanitize outgoing MCP message strings before transport serialization:

  • use String.prototype.toWellFormed() when available
  • otherwise replace lone surrogates with \uFFFD

Apply this at the transport boundary so it covers stdio, SSE, and streamable HTTP responses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions