Bug Description
Session transcripts can become corrupted when the browser tool captures text containing invalid Unicode surrogate pairs from web pages. Once corrupted, ALL subsequent API calls fail with:
400 {"type":"error","error":{"type":"invalid_request_error","message":"The request body is not valid JSON: invalid high surrogate in string"}}
The session becomes completely unusable - even /clear and other commands fail. The only recovery is a full session reset, losing all context.
Steps to Reproduce
- Use browser tool to interact with a web page containing special characters (emojis, CJK characters, etc.)
- Evaluate JavaScript that captures and returns text content
- Text with invalid surrogate pairs gets stored in the JSONL transcript
- Next API call fails with JSON parse error
- Session is permanently broken
Observed Behavior
- Session stuck in error loop
- No auto-recovery attempted
- User forced to /new or wait for session reset
- Context lost without warning or compaction
Expected Behavior
- Browser tool output should be sanitized to remove/replace invalid Unicode surrogates
- Transcript writer should validate JSON before appending
- If corruption is detected, attempt auto-repair or graceful recovery
Environment
- Clawdbot: 2026.1.24-3
- OS: Windows 11
- Model: anthropic/claude-opus-4-5
Workaround
Currently none - must reset session after corruption occurs.
Related
This may also affect other tools that capture external text content.
Bug Description
Session transcripts can become corrupted when the browser tool captures text containing invalid Unicode surrogate pairs from web pages. Once corrupted, ALL subsequent API calls fail with:
The session becomes completely unusable - even /clear and other commands fail. The only recovery is a full session reset, losing all context.
Steps to Reproduce
Observed Behavior
Expected Behavior
Environment
Workaround
Currently none - must reset session after corruption occurs.
Related
This may also affect other tools that capture external text content.