Skip to content

Fix #67488: Handle Unicode smart quotes in tool call argument JSON#67551

Closed
mushuiyu886 wants to merge 8 commits into
openclaw:mainfrom
mushuiyu886:fix/issue-67488-cron-serialization-special-chars
Closed

Fix #67488: Handle Unicode smart quotes in tool call argument JSON#67551
mushuiyu886 wants to merge 8 commits into
openclaw:mainfrom
mushuiyu886:fix/issue-67488-cron-serialization-special-chars

Conversation

@mushuiyu886

Copy link
Copy Markdown
Contributor

Summary

Fixes #67488

Cron jobs using the edit tool on files containing Chinese quotation marks, markdown bold syntax, or mixed CJK + ASCII content were incorrectly marked as error with a JSON.parse SyntaxError, even though the agent completed successfully and delivery was sent.

Root Cause

Some models (e.g., MiniMax, Moonshot) occasionally emit tool-call arguments using Unicode smart quotes (", ", U+201C/U+201D) as JSON string delimiters instead of ASCII ". The existing JSON.parse call in tryExtractUsableToolCallArguments fails because JSON only recognizes ASCII " as a string delimiter.

Additionally, the extractBalancedJsonPrefix repair function only tracked ASCII " for string boundaries, causing incorrect brace-depth counting when Unicode quotes were present.

Changes

  • normalizeStructuralUnicodeQuotes(): New function that performs a character-level scan tracking string boundaries (recognizing both ASCII " and Unicode smart quotes). Only structural Unicode quotes at key/value boundaries are rewritten to ASCII ", preserving content-level smart quotes inside strings.
  • extractBalancedJsonPrefix(): Updated to recognize Unicode smart quotes (\u201C, \u201D, \u201E, \u201F) as string delimiters for correct brace-depth tracking.
  • tryExtractUsableToolCallArguments(): After the initial JSON.parse fails, attempts normalization before falling back to balanced-prefix extraction. The balanced-prefix extraction also uses normalized input for correct results.

Testing

  • The existing tool-call argument repair test suite covers the preserved/repaired paths.
  • Manual verification: the normalization correctly rewrites structural Unicode quotes while leaving content-level quotes intact.

Backward Compatibility

Fully backward compatible. The fix only activates when the initial JSON.parse fails and the raw string contains Unicode smart quotes — no behavior change for standard ASCII JSON arguments.

When models (e.g., MiniMax, Moonshot) emit tool-call arguments using
Unicode smart quotes (U+201C/U+201D) as JSON string delimiters instead
of ASCII `"`, JSON.parse fails with a SyntaxError. This caused cron
jobs using the edit tool on files with special characters to be marked
as error even though the agent completed successfully.

- Add normalizeStructuralUnicodeQuotes() that scans the raw JSON,
  tracks string boundaries using both ASCII and Unicode quote chars,
  and rewrites only structural Unicode quotes to ASCII `"` while
  preserving content-level smart quotes inside strings.
- Update extractBalancedJsonPrefix() to recognize Unicode smart quotes
  as string delimiters for correct brace-depth tracking.
- In tryExtractUsableToolCallArguments(), attempt the normalized version
  before falling back to balanced-prefix extraction.

Fixes openclaw#67488
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Apr 16, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2d261ae2d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts Outdated
@greptile-apps

greptile-apps Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a normalizeStructuralUnicodeQuotes repair step for models (MiniMax, Moonshot) that emit tool-call arguments with Unicode smart quotes (U+201C/U+201D) as JSON string delimiters instead of ASCII ". The extractBalancedJsonPrefix function is also updated to recognise those code-points for correct brace-depth tracking.

  • The core normalizeStructuralUnicodeQuotes logic is correct for the primary target case (fully Unicode-delimited JSON), but when JSON.parse(raw) fails for a different reason (e.g. 1-3 char trailing garbage) and the raw string uses ASCII delimiters with U+201C/U+201D as content (common in CJK text), the normalisation step corrupts those content characters before passing normalized to extractBalancedJsonPrefix. The old code used raw here and would have repaired these inputs successfully; the new code returns undefined instead.
  • normalizeStructuralUnicodeQuotes(extracted.json) at line 236 is a no-op because extracted.json is already a slice of the already-normalised string; it can be dropped.

Confidence Score: 4/5

Mostly safe to merge for the primary MiniMax/Moonshot Unicode-delimiter fix, but a regression in the balanced-prefix repair path should be addressed first.

The P1 finding is a real behavioural regression: inputs that were repaired correctly before this PR (ASCII-delimited JSON with U+201C/U+201D content characters plus ≤3-char trailing garbage) will now silently fail repair. This is plausible for CJK content — the exact scenario mentioned in the PR description. The fix is a one-liner change to the extractBalancedJsonPrefix call site.

src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts — specifically the extractBalancedJsonPrefix(normalized) call at line 218 and the redundant double-normalization at line 236.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts
Line: 218

Comment:
**Regression: balanced-prefix extraction uses corrupted `normalized` instead of `raw`**

When the raw JSON uses ASCII delimiters but has U+201C/U+201D characters as legitimate string content (common in CJK tool arguments, e.g. `{"content": "File with \u201Csmart\u201D quotes"}`), `normalizeStructuralUnicodeQuotes` treats the content smart quotes as structural delimiters and corrupts the string — producing e.g. `{"content": "File with "smart" quotes"}`. Passing that corrupted string to `extractBalancedJsonPrefix` then yields a candidate that `JSON.parse` rejects (line 238), so repair returns `undefined`.

Before this PR, `extractBalancedJsonPrefix(raw)` was called on the original string where U+201C/U+201D were invisible to the ASCII-only quote tracker, so the balanced prefix was extracted cleanly and `JSON.parse` succeeded. The change to `normalized` silently breaks that path for any input where:
1. `JSON.parse(raw)` fails for a reason unrelated to smart-quote delimiters (e.g. 1-3 char trailing garbage), AND
2. The ASCII-delimited strings contain U+201C/U+201D as content.

Suggested approach: fall back to `extractBalancedJsonPrefix(raw)` (the original behaviour) and only use `normalized` in `JSON.parse`, or try `raw`-extracted JSON first and only re-attempt with `normalized` if that parse fails:

```
// Prefer extracting from the original string so ASCII-delimited JSON with
// Unicode content characters is not corrupted by normalisation.
const extractedRaw = extractBalancedJsonPrefix(raw);
const extracted = extractedRaw ?? extractBalancedJsonPrefix(normalized);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts
Line: 236-237

Comment:
**Redundant double-normalization**

`extracted.json` is a slice of `normalized`, which `normalizeStructuralUnicodeQuotes` already processed. Calling it again here is a no-op (all structural Unicode quotes were replaced in the first pass) and can be replaced with a direct assignment:

```suggestion
    const candidateJson = extracted.json;
    try {
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(cron): handle Unicode smart quotes i..." | Re-trigger Greptile

Comment thread src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts Outdated
Address review feedback from Greptile on PR openclaw#67551:

1. Prefer extractBalancedJsonPrefix(raw) over normalized to avoid
   corrupting CJK content smart quotes when the original JSON uses
   ASCII delimiters. Fall back to normalized only when raw extraction
   yields nothing.

2. Remove redundant double-normalization of extracted.json — it is
   already a slice of either raw or normalized.

3. Add fallback: if the raw-extracted JSON fails to parse, try
   normalizing Unicode quotes in the extracted slice and re-attempt.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51a75c09e1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts Outdated
Address Greptile P1 review feedback: both normalizeStructuralUnicodeQuotes
and extractBalancedJsonPrefix now track whether a string was opened with
ASCII `"` or a Unicode smart quote, and only close on the matching type.

Before this fix, `{"text":"hello\u201D world"}x` would incorrectly close
the string at the U+201D character even though the string was opened with
ASCII `"`, causing mis-tokenization and failed repair of otherwise
recoverable payloads containing CJK smart quotes in string content.

- normalizeStructuralUnicodeQuotes: only replaces Unicode quotes that
  open/close their own strings; ASCII-opened strings with Unicode content
  are left untouched.
- extractBalancedJsonPrefix: same opener-tracking logic applied.
@clawsweeper

clawsweeper Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Summary
The PR changes embedded runner tool-call argument repair to normalize Unicode smart quotes in malformed argument JSON and teach balanced-prefix extraction to treat smart quotes as string delimiters.

Reproducibility: yes. A focused parser reproduction is enough: current main has no Unicode quote normalization and the shared scanner only tracks ASCII quotes, so structural smart-quoted argument JSON still fails before tool-call repair can produce usable args.

Next step before merge
This is a narrow, concrete parser repair whose remaining blocker is a current-main adaptation plus tests, so it is suitable for an automated repair lane.

Security
Cleared: The diff only changes TypeScript tool-call argument parsing and does not touch workflows, dependencies, lockfiles, install/build/release scripts, secrets, downloaded artifacts, or publishing metadata.

Review findings

  • [P2] Move smart-quote scanning onto the shared helper — src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts:120
Review details

Best possible solution:

Ship a rebased narrow parser repair on current main that adds delimiter-aware smart-quote normalization in the shared helper/current runner structure, with focused tests and the required changelog entry.

Do we have a high-confidence way to reproduce the issue?

Yes. A focused parser reproduction is enough: current main has no Unicode quote normalization and the shared scanner only tracks ASCII quotes, so structural smart-quoted argument JSON still fails before tool-call repair can produce usable args.

Is this the best way to solve the issue?

No, not as-is. The repair direction is reasonable, but the implementation needs to move onto src/shared/balanced-json.ts and the current runner call path instead of preserving the older local scanner layout.

Full review comments:

  • [P2] Move smart-quote scanning onto the shared helper — src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts:120
    Current main imports extractBalancedJsonPrefix from src/shared/balanced-json.ts, but this diff updates the older local scanner inside the runner. Rebase the change and put the delimiter-aware scan where current callers read it, otherwise the core parser fix will be lost or conflict-resolved away.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.88

Acceptance criteria:

  • pnpm test src/shared/balanced-json.test.ts src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.test.ts
  • pnpm test src/agents/pi-embedded-runner/run/attempt.test.ts -- --runInBand is not allowed; use the repo wrapper without Jest flags if this broader runner surface is needed
  • pnpm check:changed

What I checked:

Likely related people:

  • steipete: Authored the shared balanced JSON extraction refactor and recently maintained the embedded stream wrapper/repair area that this PR must now target. (role: recent maintainer and shared-helper owner; confidence: high; commits: f006678f3c9a, 8879ed153d27, 2ba43f5d270c; files: src/shared/balanced-json.ts, src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts)
  • MonkeyLeeT: Authored the merged fix(agents): repair malformed tool-call args on openai-completions #70294 openai-completions repair enablement and focused tests in the same runner repair path. (role: adjacent behavior author; confidence: medium; commits: 49c7319ea5a2, 2ba43f5d270c; files: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts, src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.test.ts)

Remaining risk / open question:

  • The parser-level gap is clear, but the final fix should still add focused regression tests for structural smart-quote delimiters, ASCII-delimited content smart quotes, and trailing-junk repair.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0a798af4fc7a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cron job result serialization fails on special characters in edit tool arguments

1 participant