Fix #67488: Handle Unicode smart quotes in tool call argument JSON#67551
Fix #67488: Handle Unicode smart quotes in tool call argument JSON#67551mushuiyu886 wants to merge 8 commits into
Conversation
When models (e.g., MiniMax, Moonshot) emit tool-call arguments using Unicode smart quotes (U+201C/U+201D) as JSON string delimiters instead of ASCII `"`, JSON.parse fails with a SyntaxError. This caused cron jobs using the edit tool on files with special characters to be marked as error even though the agent completed successfully. - Add normalizeStructuralUnicodeQuotes() that scans the raw JSON, tracks string boundaries using both ASCII and Unicode quote chars, and rewrites only structural Unicode quotes to ASCII `"` while preserving content-level smart quotes inside strings. - Update extractBalancedJsonPrefix() to recognize Unicode smart quotes as string delimiters for correct brace-depth tracking. - In tryExtractUsableToolCallArguments(), attempt the normalized version before falling back to balanced-prefix extraction. Fixes openclaw#67488
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a2d261ae2d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryAdds a
Confidence Score: 4/5Mostly safe to merge for the primary MiniMax/Moonshot Unicode-delimiter fix, but a regression in the balanced-prefix repair path should be addressed first. The P1 finding is a real behavioural regression: inputs that were repaired correctly before this PR (ASCII-delimited JSON with U+201C/U+201D content characters plus ≤3-char trailing garbage) will now silently fail repair. This is plausible for CJK content — the exact scenario mentioned in the PR description. The fix is a one-liner change to the extractBalancedJsonPrefix call site. src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts — specifically the extractBalancedJsonPrefix(normalized) call at line 218 and the redundant double-normalization at line 236. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts
Line: 218
Comment:
**Regression: balanced-prefix extraction uses corrupted `normalized` instead of `raw`**
When the raw JSON uses ASCII delimiters but has U+201C/U+201D characters as legitimate string content (common in CJK tool arguments, e.g. `{"content": "File with \u201Csmart\u201D quotes"}`), `normalizeStructuralUnicodeQuotes` treats the content smart quotes as structural delimiters and corrupts the string — producing e.g. `{"content": "File with "smart" quotes"}`. Passing that corrupted string to `extractBalancedJsonPrefix` then yields a candidate that `JSON.parse` rejects (line 238), so repair returns `undefined`.
Before this PR, `extractBalancedJsonPrefix(raw)` was called on the original string where U+201C/U+201D were invisible to the ASCII-only quote tracker, so the balanced prefix was extracted cleanly and `JSON.parse` succeeded. The change to `normalized` silently breaks that path for any input where:
1. `JSON.parse(raw)` fails for a reason unrelated to smart-quote delimiters (e.g. 1-3 char trailing garbage), AND
2. The ASCII-delimited strings contain U+201C/U+201D as content.
Suggested approach: fall back to `extractBalancedJsonPrefix(raw)` (the original behaviour) and only use `normalized` in `JSON.parse`, or try `raw`-extracted JSON first and only re-attempt with `normalized` if that parse fails:
```
// Prefer extracting from the original string so ASCII-delimited JSON with
// Unicode content characters is not corrupted by normalisation.
const extractedRaw = extractBalancedJsonPrefix(raw);
const extracted = extractedRaw ?? extractBalancedJsonPrefix(normalized);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts
Line: 236-237
Comment:
**Redundant double-normalization**
`extracted.json` is a slice of `normalized`, which `normalizeStructuralUnicodeQuotes` already processed. Calling it again here is a no-op (all structural Unicode quotes were replaced in the first pass) and can be replaced with a direct assignment:
```suggestion
const candidateJson = extracted.json;
try {
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(cron): handle Unicode smart quotes i..." | Re-trigger Greptile |
Address review feedback from Greptile on PR openclaw#67551: 1. Prefer extractBalancedJsonPrefix(raw) over normalized to avoid corrupting CJK content smart quotes when the original JSON uses ASCII delimiters. Fall back to normalized only when raw extraction yields nothing. 2. Remove redundant double-normalization of extracted.json — it is already a slice of either raw or normalized. 3. Add fallback: if the raw-extracted JSON fails to parse, try normalizing Unicode quotes in the extracted slice and re-attempt.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 51a75c09e1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Address Greptile P1 review feedback: both normalizeStructuralUnicodeQuotes
and extractBalancedJsonPrefix now track whether a string was opened with
ASCII `"` or a Unicode smart quote, and only close on the matching type.
Before this fix, `{"text":"hello\u201D world"}x` would incorrectly close
the string at the U+201D character even though the string was opened with
ASCII `"`, causing mis-tokenization and failed repair of otherwise
recoverable payloads containing CJK smart quotes in string content.
- normalizeStructuralUnicodeQuotes: only replaces Unicode quotes that
open/close their own strings; ASCII-opened strings with Unicode content
are left untouched.
- extractBalancedJsonPrefix: same opener-tracking logic applied.
|
Codex review: needs changes before merge. Summary Reproducibility: yes. A focused parser reproduction is enough: current main has no Unicode quote normalization and the shared scanner only tracks ASCII quotes, so structural smart-quoted argument JSON still fails before tool-call repair can produce usable args. Next step before merge Security Review findings
Review detailsBest possible solution: Ship a rebased narrow parser repair on current main that adds delimiter-aware smart-quote normalization in the shared helper/current runner structure, with focused tests and the required changelog entry. Do we have a high-confidence way to reproduce the issue? Yes. A focused parser reproduction is enough: current main has no Unicode quote normalization and the shared scanner only tracks ASCII quotes, so structural smart-quoted argument JSON still fails before tool-call repair can produce usable args. Is this the best way to solve the issue? No, not as-is. The repair direction is reasonable, but the implementation needs to move onto Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 0a798af4fc7a. |
Summary
Fixes #67488
Cron jobs using the
edittool on files containing Chinese quotation marks, markdown bold syntax, or mixed CJK + ASCII content were incorrectly marked aserrorwith aJSON.parseSyntaxError, even though the agent completed successfully and delivery was sent.Root Cause
Some models (e.g., MiniMax, Moonshot) occasionally emit tool-call arguments using Unicode smart quotes (
",", U+201C/U+201D) as JSON string delimiters instead of ASCII". The existingJSON.parsecall intryExtractUsableToolCallArgumentsfails because JSON only recognizes ASCII"as a string delimiter.Additionally, the
extractBalancedJsonPrefixrepair function only tracked ASCII"for string boundaries, causing incorrect brace-depth counting when Unicode quotes were present.Changes
normalizeStructuralUnicodeQuotes(): New function that performs a character-level scan tracking string boundaries (recognizing both ASCII"and Unicode smart quotes). Only structural Unicode quotes at key/value boundaries are rewritten to ASCII", preserving content-level smart quotes inside strings.extractBalancedJsonPrefix(): Updated to recognize Unicode smart quotes (\u201C,\u201D,\u201E,\u201F) as string delimiters for correct brace-depth tracking.tryExtractUsableToolCallArguments(): After the initialJSON.parsefails, attempts normalization before falling back to balanced-prefix extraction. The balanced-prefix extraction also uses normalized input for correct results.Testing
Backward Compatibility
Fully backward compatible. The fix only activates when the initial
JSON.parsefails and the raw string contains Unicode smart quotes — no behavior change for standard ASCII JSON arguments.