Skip to content

Session corruption from invalid Unicode surrogates in browser tool output #7275

@CarlosMundim

Description

@CarlosMundim

Bug Description

Session transcripts can become corrupted when the browser tool captures text containing invalid Unicode surrogate pairs from web pages. Once corrupted, ALL subsequent API calls fail with:

400 {"type":"error","error":{"type":"invalid_request_error","message":"The request body is not valid JSON: invalid high surrogate in string"}}

The session becomes completely unusable - even /clear and other commands fail. The only recovery is a full session reset, losing all context.

Steps to Reproduce

  1. Use browser tool to interact with a web page containing special characters (emojis, CJK characters, etc.)
  2. Evaluate JavaScript that captures and returns text content
  3. Text with invalid surrogate pairs gets stored in the JSONL transcript
  4. Next API call fails with JSON parse error
  5. Session is permanently broken

Observed Behavior

  • Session stuck in error loop
  • No auto-recovery attempted
  • User forced to /new or wait for session reset
  • Context lost without warning or compaction

Expected Behavior

  1. Browser tool output should be sanitized to remove/replace invalid Unicode surrogates
  2. Transcript writer should validate JSON before appending
  3. If corruption is detected, attempt auto-repair or graceful recovery

Environment

  • Clawdbot: 2026.1.24-3
  • OS: Windows 11
  • Model: anthropic/claude-opus-4-5

Workaround

Currently none - must reset session after corruption occurs.

Related

This may also affect other tools that capture external text content.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions