Fix emoji encoding in recorded .har files#1557
Conversation
Add UnsafeRelaxedJsonEscaping to ProxyUtils.JsonSerializerOptions to preserve BMP non-ASCII characters as literal UTF-8 in JSON output. Add surrogate pair unescaping in HarGeneratorPlugin to convert \uXXXX\uYYYY escape sequences for non-BMP characters (like emoji U+1F600+) back to their literal UTF-8 representation. Co-authored-by: waldekmastykarz <11164679+waldekmastykarz@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request fixes emoji and non-ASCII character encoding in recorded HAR files. Previously, emoji like 😀✨ were represented as unreadable escape sequences (\uD83D\uDE00\u2728) in HAR files. The fix addresses two issues:
-
BMP non-ASCII characters (U+0080–U+FFFF): System.Text.Json's default encoder escapes all non-ASCII characters. Fixed by configuring
JavaScriptEncoder.UnsafeRelaxedJsonEscapingin the globalProxyUtils.JsonSerializerOptions. -
Non-BMP characters (U+10000+, most emoji): Even with
UnsafeRelaxedJsonEscaping, characters above U+FFFF are still emitted as surrogate pair escapes. Fixed by post-processing the serialized JSON with a regex that decodes UTF-16 surrogate pairs back to literal UTF-8 characters.
Changes:
- Added
JavaScriptEncoder.UnsafeRelaxedJsonEscapingto global JSON serialization options - Implemented surrogate pair decoding in
HarGeneratorPluginto convert\uD83D\uDE00patterns to literal emoji - Added necessary imports for encoding and regex functionality
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| DevProxy.Abstractions/Utils/ProxyUtils.cs | Added UnsafeRelaxedJsonEscaping encoder to global JsonSerializerOptions for proper Unicode handling |
| DevProxy.Plugins/Generation/HarGeneratorPlugin.cs | Added regex-based surrogate pair decoder to convert UTF-16 escape sequences to literal emoji in HAR files |
garrytrinder
left a comment
There was a problem hiding this comment.
Tested: builds clean, BMP non-ASCII chars render correctly in HAR files (verified with accented characters), surrogate pair regex logic validated. UnsafeRelaxedJsonEscaping is appropriate for a CLI tool. LGTM.
Emoji and other non-ASCII characters in API responses are written as
\uXXXXescape sequences in HAR files, making them unreadable.Before:
"text": "Hello \uD83D\uDE00\u2728 World"After:
"text": "Hello 😀✨ World"Two issues at play:
BMP non-ASCII characters (U+0080–U+FFFF):
System.Text.Json's default encoder escapes all non-ASCII. Fixed by settingEncoder = JavaScriptEncoder.UnsafeRelaxedJsonEscapingonProxyUtils.JsonSerializerOptions. Safe here since JSON output goes to files, never embedded in HTML.Non-BMP characters (U+10000+, most emoji): Even
UnsafeRelaxedJsonEscapingstill emits surrogate pair escapes (\uD83D\uDE00) for characters above U+FFFF. Fixed by post-processing the serialized HAR JSON with a regex that decodes surrogate pairs back to literal UTF-8.Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.