fix(agent): recover gemma4 tool_call.arguments corrupted by leaked chat-template markers by Naroh091 · Pull Request #19887 · NousResearch/hermes-agent

Naroh091 · 2026-05-04T19:43:27Z

Summary

When using Gemma 4 served via vLLM (NaN Builders, and others), the harmony parser intermittently structures the tool call cleanly — tool_calls is non-empty and the upstream returns finish_reason='tool_calls' — but lets the model's chat-template quote markers (<|"|> in its full and partial forms) escape into the JSON of tool_call.arguments as literal characters.

Hermes' downstream json.loads(arguments) then fails. The retry sees the same deterministic corruption, fails again, and the run gets surfaced to the user as Response truncated (finish_reason='length') — model hit max output tokens despite no token cap being hit and the upstream having returned successfully.

This is a complement to the existing content-side recovery path proposed in #7449 — that one fires on if not tool_calls and content:, assuming the structured channel is safe. This PR handles the case where the structured channel is taken but the JSON inside arguments is corrupted.

Symptom

Captured payload (kanban_complete from a worker, gemma4 via NaN Builders, full 1658-byte arg string, captured via SSE pass-through proxy):

{"metadata": {"findings": [{"date": "<|\"|"2023-09-18<|\", "identifier": "<|\"|"BOE-A-2023-19616<|\", "title": "<|Resolución de 28 de julio de 2023, ...<|"}, ... ]}, "summary": "...", "task_id": "t_8345a55c"}

json.loads chokes at column 43 because "<|\"|" parses as a closed string <|"| followed by a bare 2023-09-18 token. Same payload retried twice (deterministic), same parse error twice, session aborts.

Four marker shapes turn up in the wild:

Marker (raw)	After JSON-escape inside arguments	Sanitizer rule
`<\|"\|>` opening (full)	`"<\|\"\|"`	replace with `"`
`<\|"\|>` closing (full)	`<\|\"`	replace with `"`
`<\|"\|>` opening (truncated, no inner `\"`)	`"<\|`	replace with `"`
`<\|"\|>` closing (truncated, no inner `\\`)	`<\|"`	replace with `"`

Approach

agent/transports/chat_completions.py::normalize_response already populates tool_calls. After that loop, if any tool_call.arguments contains <| and fails json.loads, run the marker-stripping substitution and accept the result only if the cleaned variant parses. Concretely:

if tool_calls:
    import json as _json_for_validation

    def _strip_gemma4_quote_markers(s: str) -> str:
        if not s or "<|" not in s:
            return s
        return (s
                .replace('"<|\\"|"', '"')
                .replace('<|\\"', '"')
                .replace('"<|', '"')
                .replace('<|"', '"'))

    for tc in tool_calls:
        args_str = tc.arguments
        if not args_str or "<|" not in args_str:
            continue
        try:
            _json_for_validation.loads(args_str)
            continue  # already valid
        except (ValueError, TypeError):
            pass
        cleaned = _strip_gemma4_quote_markers(args_str)
        if cleaned == args_str:
            continue
        try:
            _json_for_validation.loads(cleaned)
        except (ValueError, TypeError):
            continue  # cleaned still broken
        tc.arguments = cleaned

The double guard (original unparseable AND cleaned parseable) means already-valid args are never touched, even if they cosmetically contain <| substrings — legitimate user content (regex with alternation, certain code patterns) can do that.

Tests

TestChatCompletionsGemma4ArgsSanitization (6 cases) added to tests/agent/transports/test_chat_completions.py:

✅ Real-world full-marker payload (the dump shape above) → recovered.
✅ Full open marker + bare close on the same value → recovered.
✅ Bare-only markers that DON'T break JSON → arguments left intact (strict guard).
✅ Already-valid arguments → not mutated.
✅ <| present but cleaned variant still unparseable → original left for downstream error visibility.
✅ Multiple tool_calls in a single response, mixed clean/dirty → each handled independently.

Full test file: 68 passed in 4.11s (61 existing + 7 new).

Validation in production

Validated against 14 tool_calls captured from a failing gemma4 + NaN Builders session: 12 already-valid args untouched, 2 broken kanban_complete payloads (with nested metadata.findings) parse cleanly post-sanitization, 0 collateral damage. Worker resumes normally.

Relationship to #7449

Independent. #7449 (and the three follow-up issues in its thread) addresses Gemma 4's tool calls landing in message.content instead of tool_calls. This PR addresses the orthogonal case where tool_calls IS populated but its arguments JSON is corrupted. Either fix can land first; both are needed for full coverage.

…at-template markers When using Gemma 4 served via vLLM (Nous Builders, NaN Builders, others), the harmony parser intermittently structures the tool call cleanly (so ``tool_calls`` is non-empty and ``finish_reason='tool_calls'`` from the upstream) but lets the model's chat-template quote markers (``<|"|>`` in its full and partial forms) escape into ``tool_call.arguments`` as literal characters. Hermes' downstream ``json.loads(arguments)`` then fails. The retry sees the same deterministic corruption, fails again, and the run is surfaced to the user as ``Response truncated (finish_reason='length') — model hit max output tokens`` even though no token cap was hit and the upstream returned successfully. This is on top of the existing content-side recovery path (the structured channel was assumed safe). Add a conservative sanitizer that strips the leaked markers from structured ``tool_call.arguments``. To avoid masking unrelated bugs the sanitizer only rewrites ``arguments`` when both: * the ORIGINAL is unparseable as JSON, and * the CLEANED variant IS parseable as JSON. Already-valid args are never touched, even if they cosmetically contain ``<|`` substrings (legitimate user content can do this). Validated against captured failing payloads from a gemma4 + NaN Builders session (``kanban_complete`` with nested ``metadata.findings``): 12 already-valid tool_calls untouched, 2 broken tool_calls recovered, 0 collateral damage.

Naroh091 · 2026-05-04T19:53:35Z

For reviewer context: this is a defensive client-side fix on top of an upstream vLLM root cause that's separately tracked.

vllm-project/vllm#39069 — "gemma4_utils._parse_tool_arguments truncates string values containing internal quotes" — identifies the exact mechanism. The offline parser at vllm/tool_parsers/gemma4_utils.py does args_str.replace("<|\"|>", '"') and then falls back to a [^"]* regex when JSON parse fails. The combination produces exactly the corrupted JSON we observe in tool_call.arguments.

Two related upstream bugs round out the picture:

vllm-project/vllm#40911 — partial markers like <| leaking into content during reasoning→tool transitions (this is what feat: add Gemma 4 tool call parser #7449 in this repo addresses on the Hermes side).
vllm-project/vllm#39885 — same family of leak in streaming multi-turn.

Even after vllm#39069 lands, the Hermes-side sanitizer remains useful as defense-in-depth: it protects clients running older vLLM, custom-served Gemma 4 endpoints, and any provider that derives its tool-call parser from the same code. The conservative double-guard ("only rewrite when original is unparseable AND cleaned IS parseable") means it's a no-op once upstream is fixed — the original arguments will already parse, and the sanitizer will skip them.

Symptom: local Qwen3.5 via llama-server with --jinja occasionally generates "\nuser\n[<user_name>] ..." past the assistant stop boundary. Hermes formats user messages as "[<user_name>] ..." (gateway/run.py:5692), so the model learned that pattern as the user-turn delimiter; when sampling drifts past the assistant stop, it continues into a plausible user turn. The gateway delivers that verbatim to Telegram — the bot appears to literally echo "[ilyass] nudge" back at the user. Belt + suspenders: - run_agent.py: inject stop sequences ["\nuser\n[", "\n\nuser\n"] into chat_completions api_kwargs after _build_api_kwargs(). Cuts generation at the leak boundary, saving tokens too. Disable via HERMES_DISABLE_USER_PREFIX_STOP=1. - gateway/run.py: post-process final_response with a regex that detects and strips "\nuser\n[<name>] ..." patterns. Safety net for any leak that escapes the stop sequence (e.g., on streaming providers where stop detection differs). Logs a warning when it fires so we can monitor. Related upstream (all open, none merged): NousResearch#6272 (strip [CONTEXT COMPACTION] echoes), NousResearch#19887 (strip leaked template markers from gemma4 tool args), NousResearch#15369 (strip MEDIA directives), NousResearch#18529 (title leaks <thinking>).

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 4, 2026

0xarkstar mentioned this pull request May 18, 2026

feat: add Gemma 4 tool call parser #7449

Closed

Lx mentioned this pull request Jun 11, 2026

Gemma 4 tool calling support (parser availability & required configuration) #6626

Open

hugo-valcourt mentioned this pull request Jun 12, 2026

Gemma 4 26B (A4B MoE): long agentic contexts trigger parenthesized call:name(...) tool calls; gemma4 parser silently strips them (empty content, no recovery path) jundot/omlx#1846

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): recover gemma4 tool_call.arguments corrupted by leaked chat-template markers#19887

fix(agent): recover gemma4 tool_call.arguments corrupted by leaked chat-template markers#19887
Naroh091 wants to merge 1 commit into
NousResearch:mainfrom
Naroh091:fix/gemma4-structured-tool-call-args

Naroh091 commented May 4, 2026 •

edited

Loading

Uh oh!

Naroh091 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Naroh091 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Symptom

Approach

Tests

Validation in production

Relationship to #7449

Uh oh!

Naroh091 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Naroh091 commented May 4, 2026 •

edited

Loading