feat: add Gemma 4 tool call parser by 0xarkstar · Pull Request #7449 · NousResearch/hermes-agent

0xarkstar · 2026-04-10T23:39:03Z

Summary

Adds a client-side text parser for Gemma 4's tool call format, enabling reliable tool calling for Gemma 4 models (31B, 26B) in Hermes Agent.

Problem

Gemma 4 outputs tool calls in a unique text format:

<|tool_call>call:func_name(arg1: "value1", arg2: "value2")<tool_call|>

Currently, Hermes has no parser for this format. When using Gemma 4 via Nous API / OpenRouter:

The tool_calls field is always None (native tool calling doesn't work reliably through the proxy)
Tool call markup remains embedded in message.content
Hermes does not execute the tool → silent failure

This was reported in #6626.

Solution

New gemma4 parser (environments/tool_call_parsers/gemma4_parser.py) that extracts structured ChatCompletionMessageToolCall objects from Gemma 4's text output.

Supported formats:

Parenthesis syntax: call:name(key: "value", key2: 'value2')
Brace syntax (Gemma 4 tool calling support (parser availability & required configuration) #6626): call:name{key: "value"}
Special quote tokens: <|"|> delimiters (as reported in Gemma 4 tool calling support (parser availability & required configuration) #6626)
Python literals: strings (single/double quotes), ints, floats, nested dicts/lists
Unicode content (Korean, CJK, etc.)
Multiple tool calls in one response
Truncated/unclosed tags (graceful degradation)

Testing

Real-world validation against Gemma 4 31B via Nous Research API:

Test Case	Input	Result
Single arg (double quotes)	`call:search(query: "blockchain news")`	✅
Single arg (single quotes)	`call:search(query: 'blockchain news')`	✅
Multiple args	`call:send_message(target='#ann', message='Hello')`	✅
Integer arg	`call:add_item(database='M', name='Alice', priority=1)`	✅
Nested dict	`call:create_db(title='M', props={'name': 'text'})`	✅
Unicode content	`call:send(target='research_team', message='hello world')`	✅
Empty args	`call:status()`	✅
Multiple tool calls	Two `<\|tool_call>` blocks	✅
Brace syntax (#6626)	`call:sf{pattern: "/var", target: "f"}`	✅

Unit tests: 17 test cases added to tests/tools/test_tool_call_parsers.py

Changes

environments/tool_call_parsers/gemma4_parser.py — New parser (234 lines)
environments/tool_call_parsers/__init__.py — Register Gemma4ToolCallParser
tests/tools/test_tool_call_parsers.py — 17 test cases for TestGemma4Parser

Context

Discovered while building a Discord bot using Hermes + Gemma 4 31B via Nous API. The bot would describe tool calls in text instead of executing them, causing repeated retry loops. Switching to Qwen (which has a parser) worked immediately, confirming the parser gap.

Closes #6626

🤖 Generated with Claude Code

Additional fix: model-aware fallback parser in agent_loop.py

The fallback parser in agent_loop.py:268-289 was hardcoded to only detect <tool_call> markers and always use the "hermes" parser. This meant all other model families (Gemma 4, Kimi K2, DeepSeek V3, Mistral) whose tool calls went unparsed by the server would silently fail in the fallback path.

Changes:

Auto-detect tool call markers for all registered parser formats
Use server-configured parser (tool_parser) when available
Fall back to marker-based auto-detection when no parser is configured
Covers: Hermes/Qwen (<tool_call>), Gemma 4 (<|tool_call>), Kimi K2 (<|tool_calls_section_begin|>), DeepSeek V3 (<｜tool▁calls▁begin｜>), Mistral ([TOOL_CALLS])

This benefits not only Gemma 4 but all model families when the server-side parser fails or is unavailable.

Add a client-side parser for Gemma 4's text-based tool call format: <|tool_call>call:func_name(arg1: "value1", arg2: "value2")<tool_call|> Gemma 4 outputs tool calls in Python function-call-like syntax wrapped in <|tool_call> ... <tool_call|> tags. The native OpenAI-style `tools` parameter via OpenRouter often produces incomplete/truncated tool calls, making a dedicated text parser necessary. Supported formats: - Parenthesis syntax: call:name(key: "value", key2: 'value2') - Brace syntax (issue NousResearch#6626): call:name{key: "value"} - Special quote tokens: <|"|> delimiters - Python literals: strings, ints, floats, nested dicts/lists - Unicode/Korean content in arguments - Multiple tool calls in one response - Truncated/unclosed tags (graceful handling) Tested against real Gemma 4 31B output via Nous Research API with 6 different prompt patterns (single arg, multi arg, complex JSON, multi tool call, nested args, Korean). Closes NousResearch#6626 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The fallback parser in agent_loop.py was hardcoded to only detect Hermes-format tool calls (<tool_call>) and always use the "hermes" parser. This meant other model families (Gemma 4, Kimi K2, DeepSeek, Mistral) whose tool calls went unparsed by the server would silently fail in the fallback path. Changes: - Auto-detect tool call markers for all registered parser formats - Use server-configured parser (tool_parser) when available - Fall back to marker-based auto-detection when no parser is configured - Covers: Hermes/Qwen, Gemma 4, Kimi K2, DeepSeek V3, Mistral Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

spprod35 · 2026-04-12T12:48:20Z

+1

raritytiks · 2026-04-17T11:58:27Z

this does not fix local tool calls for gemma-4-E4B-it over vllm.

@0xarkstar please see my PR to your branch 0xarkstar#1

Naroh091 · 2026-05-04T02:05:12Z

Tested this PR locally against gemma4 served via vLLM. The parser itself works for the documented cases, but I hit three follow-up issues running it in production with the hermes CLI.

1. Fallback parser isn't invoked from the production CLI path

The new _TOOL_CALL_MARKERS detection lives in environments/agent_loop.py, which is only imported by the RL/benchmark environments (hermes_base_env, terminalbench2_env, web_research_env, etc.). The actual hermes CLI goes through run_agent.py → agent/transports/chat_completions.py::normalize_response(), and that path never calls get_parser(...). So in CLI sessions, when vLLM's harmony parser intermittently leaks <|tool_call>...<tool_call|> markup into message.content with empty tool_calls, Hermes still treats it as a final text response and ends the session.

Reproduced with a worker session that did 16 successful native tool calls, then on the 17th call vLLM dropped the raw markup into content. With this PR alone, the session terminated. After mirroring the same fallback into normalize_response (~30 lines), the same payload extracts cleanly and the worker continues.

2. Parser stores nested unquoted-key dicts/lists as raw strings

When the model emits brace-syntax args containing nested objects with bare keys, e.g.:

<|tool_call>call:kanban_complete{
  metadata:{findings:[
    {date:<|"|>2026-04-09<|"|>, identifier:<|"|>BOE-A-2026-7968<|"|>, title:<|"|>...<|"|>},
    ...
  ], sources_searched:<|"|>/path<|"|>},
  summary:<|"|>...<|"|>,
  task_id:<|"|>t_xxx<|"|>
}<tool_call|>

_parse_kwargs_to_dict correctly splits the top-level pairs, but for metadata's value, ast.literal_eval fails (bare keys aren't valid Python) and the except branch falls back to value = value_str.strip("'\"") so metadata ends up as a literal string instead of a dict, and the tool rejects the call. The model then retries with an even bigger payload, eventually hitting finish_reason='length'.

Fix is small: replace the bare ast.literal_eval fallback with a recursive _parse_value helper that, on failure, descends into {...} (recursive _parse_kwargs_to_dict) and [...] (split on top-level commas, recurse per element). With that change metadata parses as the expected nested dict and kanban_complete succeeds. The 15 existing tests still pass.

def _parse_value(value_str: str):
    value_str = value_str.strip()
    if not value_str:
        return ""
    try:
        return ast.literal_eval(value_str)
    except (ValueError, SyntaxError):
        pass
    if value_str.startswith("{") and value_str.endswith("}"):
        return _parse_kwargs_to_dict(value_str[1:-1])
    if value_str.startswith("[") and value_str.endswith("]"):
        return [_parse_value(x) for x in _split_top_level(value_str[1:-1])]
    return value_str.strip("'\"")

3. String parser is naïve about embedded quotes (causes a hard infinite loop)

_parse_brace_kwargs does a blind <|"|> → " substitution. When the content between markers contains its own " or ' (typical in shell commands), the result is a malformed Python string. Concrete repro from a session that locked into a tight loop:

<|tool_call>call:terminal{command:<|"|>grep -l "department: \"Ministerio de Justicia\"" /path/*.md | xargs grep -l "status: \"in_force\""<|"|>}<tool_call|>

After substitution the parser sees:

command:"grep -l "department: \"Ministerio..." | xargs grep -l "status: \"in_force\""

ast.literal_eval fails, falls back to value_str.strip("'\""), and the result is the command truncated mid-quote. Bash returns "unmatched quote", the model sees the error, retries with the same format, gets the same truncation, retries again. The session loops forever (no cross-turn dedup catches it).

Fix: capture each <|"|>...<|"|> segment as an opaque string with a non-printable placeholder (\x00HSTR\x00<n>\x00) BEFORE parsing the structure, then substitute the originals back in at the end. The parser never sees the inner quotes during structural parsing, so embedded " and ' no longer break anything. Same approach plays well with the recursive _parse_value from issue 2.

All three issues are reliably hit by tool calls with rich payloads (long shell commands, kanban_complete with metadata={findings: [...], decisions: [...]}). I'd be glad to send PRs for any subset; let me know if you'd prefer one combined or three separate.

Naroh091 · 2026-05-04T19:13:57Z

Following up on this thread — running the previous patch + fixes 1–3 against gemma4 via NaN Builders surfaces a fourth variant that none of the existing fallbacks catch.

4. Markers leak inside structured `tool_call.arguments`, not just `content`

The fallback in 1 fires on if not tool_calls and content: — it assumes the upstream either delivers a proper structured tool call OR dumps the raw markup into content. In practice, vLLM's harmony parser for Gemma 4 has a third failure mode: it does emit a structured tool_calls (so tool_calls is non-empty, the content path is bypassed) but the chat-template quote markers (<|"|> in their full and partial forms) leak straight into the JSON string of tool_call.arguments.

Hermes' downstream json.loads(arguments) fails. Hermes can't recover, and because the failure happens during retry of an existing tool call, it gets reported to the user as Response truncated (finish_reason='length') — model hit max output tokens, which is doubly misleading: the upstream actually returned finish_reason: "tool_calls" cleanly with completion_tokens=536, no token cap was hit.

Captured payload (kanban_complete from a worker, gemma4 via NaN Builders, full 1658-byte arg string, captured via SSE pass-through proxy):

```
{"metadata": {"findings": [{"date": "<|"|"2023-09-18<|", "identifier": "<|"|"BOE-A-2023-19616<|", "title": "<|Resolución de 28 de julio de 2023, de la Subsecretaría, ...<|"}, ... ]}, "summary": "...", "task_id": "t_8345a55c"}
```

`json.loads` chokes at `column 43` because `"<|"|"` parses as a closed string `<|"|` followed by a bare `2023-09-18` token. Same payload retried twice (deterministic), same parse error twice → session aborts. Four marker shapes turn up in the wild:

Marker (raw)	After JSON-escape inside arguments	Fix
`<\|"\|>` opening (full)	`"<\|"\|"`	replace with `"`
`<\|"\|>` closing (full)	`<\|"`	replace with `"`
`<\|"\|>` opening (truncated, no inner `"`)	`"<\|`	replace with `"`
`<\|"\|>` closing (truncated, no inner `\`)	`<\|"`	replace with `"`

A small companion to the existing block in `agent/transports/chat_completions.py::normalize_response` handles all four, with a guard to never silently rewrite already-valid JSON:

```python
if tool_calls:
import json as _json_for_validation

def _strip_gemma4_quote_markers(s: str) -> str:
    if not s or "<|" not in s:
        return s
    return (s
            .replace('"<|\\\\"|"', '"')   # full open:  "<|\\"|" → "
            .replace('<|\\\\"', '"')       # full close: <|\\"   → "
            .replace('"<|', '"')         # bare open:  "<|     → "
            .replace('<|"', '"'))        # bare close: <|"     → "

for tc in tool_calls:
    args_str = tc.arguments
    if not args_str or "<|" not in args_str:
        continue
    try:
        _json_for_validation.loads(args_str)
        continue   # already valid — never mutate
    except (ValueError, TypeError):
        pass
    cleaned = _strip_gemma4_quote_markers(args_str)
    if cleaned == args_str:
        continue
    try:
        _json_for_validation.loads(cleaned)
    except (ValueError, TypeError):
        continue   # cleaned still broken — leave original for downstream error
    tc.arguments = cleaned

```

Validated against 14 tool_calls captured from a failing gemma4 + NaN Builders session: 12 already-valid args untouched, 2 broken args (both `kanban_complete` payloads with nested `metadata.findings`) parse cleanly post-sanitization, 0 collateral damage.

0xarkstar · 2026-05-18T16:22:03Z

Closing this PR. Two material things have changed since 2026-04:

Target directory removed. Upstream commit 5af672c75 chore: remove Atropos RL environments and tinker-atropos integration (#26106) deleted environments/tool_call_parsers/ entirely. This PR's parser cannot be rebased onto current main — the registration site is gone.
Better-positioned PR exists. fix(agent): recover gemma4 tool_call.arguments corrupted by leaked chat-template markers #19887 (@Naroh091) implements the Gemma 4 fix at the modern code path — agent/transports/chat_completions.py::normalize_response() — exactly where the four issues raised in this thread (CLI fallback path, nested-dict parsing, embedded-quote handling, structured-args marker leakage) need to be addressed.

Thanks @spprod35, @raritytiks, @Naroh091 for testing and the detailed bug reports. The work is consolidating around #19887; please direct future fixes there.

0xarkstar and others added 2 commits April 11, 2026 08:37

0xarkstar mentioned this pull request Apr 11, 2026

feat: Gemma 4 tool calling support via Nous inference API #7457

Closed

raritytiks mentioned this pull request Apr 17, 2026

fix(gemma4): recover raw tool calls when parser import fails 0xarkstar/hermes-agent#1

Closed

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder provider/nous Nous Research API (OAuth) labels Apr 29, 2026

alt-glitch mentioned this pull request Apr 29, 2026

Gemma 4 tool calling support (parser availability & required configuration) #6626

Open

Naroh091 mentioned this pull request May 4, 2026

fix(agent): recover gemma4 tool_call.arguments corrupted by leaked chat-template markers #19887

Open

Werner1303 mentioned this pull request May 12, 2026

feat: heuristic tool-name normalizer for hallucinated tool names #24411

Open

0xarkstar closed this May 18, 2026

0xarkstar mentioned this pull request Jun 11, 2026

feat(gateway,skills): unified skill trigger framework with Discord components #16530

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemma 4 tool call parser#7449

feat: add Gemma 4 tool call parser#7449
0xarkstar wants to merge 2 commits into
NousResearch:mainfrom
0xarkstar:feat/gemma4-tool-call-parser

0xarkstar commented Apr 10, 2026 •

edited

Loading

Uh oh!

spprod35 commented Apr 12, 2026

Uh oh!

raritytiks commented Apr 17, 2026

Uh oh!

Naroh091 commented May 4, 2026

Uh oh!

Naroh091 commented May 4, 2026 •

edited

Loading

Uh oh!

0xarkstar commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

0xarkstar commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Changes

Context

Additional fix: model-aware fallback parser in agent_loop.py

Uh oh!

spprod35 commented Apr 12, 2026

Uh oh!

raritytiks commented Apr 17, 2026

Uh oh!

Naroh091 commented May 4, 2026

1. Fallback parser isn't invoked from the production CLI path

2. Parser stores nested unquoted-key dicts/lists as raw strings

3. String parser is naïve about embedded quotes (causes a hard infinite loop)

Uh oh!

Naroh091 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

4. Markers leak inside structured tool_call.arguments, not just content

Uh oh!

0xarkstar commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

0xarkstar commented Apr 10, 2026 •

edited

Loading

Naroh091 commented May 4, 2026 •

edited

Loading

4. Markers leak inside structured `tool_call.arguments`, not just `content`