Skip to content

feat: add Gemma 4 tool call parser#7449

Closed
0xarkstar wants to merge 2 commits into
NousResearch:mainfrom
0xarkstar:feat/gemma4-tool-call-parser
Closed

feat: add Gemma 4 tool call parser#7449
0xarkstar wants to merge 2 commits into
NousResearch:mainfrom
0xarkstar:feat/gemma4-tool-call-parser

Conversation

@0xarkstar

@0xarkstar 0xarkstar commented Apr 10, 2026

Copy link
Copy Markdown

Summary

Adds a client-side text parser for Gemma 4's tool call format, enabling reliable tool calling for Gemma 4 models (31B, 26B) in Hermes Agent.

Problem

Gemma 4 outputs tool calls in a unique text format:

<|tool_call>call:func_name(arg1: "value1", arg2: "value2")<tool_call|>

Currently, Hermes has no parser for this format. When using Gemma 4 via Nous API / OpenRouter:

  • The tool_calls field is always None (native tool calling doesn't work reliably through the proxy)
  • Tool call markup remains embedded in message.content
  • Hermes does not execute the tool → silent failure

This was reported in #6626.

Solution

New gemma4 parser (environments/tool_call_parsers/gemma4_parser.py) that extracts structured ChatCompletionMessageToolCall objects from Gemma 4's text output.

Supported formats:

Testing

Real-world validation against Gemma 4 31B via Nous Research API:

Test Case Input Result
Single arg (double quotes) call:search(query: "blockchain news")
Single arg (single quotes) call:search(query: 'blockchain news')
Multiple args call:send_message(target='#ann', message='Hello')
Integer arg call:add_item(database='M', name='Alice', priority=1)
Nested dict call:create_db(title='M', props={'name': 'text'})
Unicode content call:send(target='research_team', message='hello world')
Empty args call:status()
Multiple tool calls Two <|tool_call> blocks
Brace syntax (#6626) call:sf{pattern: "/var", target: "f"}

Unit tests: 17 test cases added to tests/tools/test_tool_call_parsers.py

Changes

  • environments/tool_call_parsers/gemma4_parser.py — New parser (234 lines)
  • environments/tool_call_parsers/__init__.py — Register Gemma4ToolCallParser
  • tests/tools/test_tool_call_parsers.py — 17 test cases for TestGemma4Parser

Context

Discovered while building a Discord bot using Hermes + Gemma 4 31B via Nous API. The bot would describe tool calls in text instead of executing them, causing repeated retry loops. Switching to Qwen (which has a parser) worked immediately, confirming the parser gap.

Closes #6626

🤖 Generated with Claude Code

Additional fix: model-aware fallback parser in agent_loop.py

The fallback parser in agent_loop.py:268-289 was hardcoded to only detect <tool_call> markers and always use the "hermes" parser. This meant all other model families (Gemma 4, Kimi K2, DeepSeek V3, Mistral) whose tool calls went unparsed by the server would silently fail in the fallback path.

Changes:

  • Auto-detect tool call markers for all registered parser formats
  • Use server-configured parser (tool_parser) when available
  • Fall back to marker-based auto-detection when no parser is configured
  • Covers: Hermes/Qwen (<tool_call>), Gemma 4 (<|tool_call>), Kimi K2 (<|tool_calls_section_begin|>), DeepSeek V3 (<|tool▁calls▁begin|>), Mistral ([TOOL_CALLS])

This benefits not only Gemma 4 but all model families when the server-side parser fails or is unavailable.

0xarkstar and others added 2 commits April 11, 2026 08:37
Add a client-side parser for Gemma 4's text-based tool call format:
  <|tool_call>call:func_name(arg1: "value1", arg2: "value2")<tool_call|>

Gemma 4 outputs tool calls in Python function-call-like syntax wrapped
in <|tool_call> ... <tool_call|> tags. The native OpenAI-style `tools`
parameter via OpenRouter often produces incomplete/truncated tool calls,
making a dedicated text parser necessary.

Supported formats:
- Parenthesis syntax: call:name(key: "value", key2: 'value2')
- Brace syntax (issue NousResearch#6626): call:name{key: "value"}
- Special quote tokens: <|"|> delimiters
- Python literals: strings, ints, floats, nested dicts/lists
- Unicode/Korean content in arguments
- Multiple tool calls in one response
- Truncated/unclosed tags (graceful handling)

Tested against real Gemma 4 31B output via Nous Research API with
6 different prompt patterns (single arg, multi arg, complex JSON,
multi tool call, nested args, Korean).

Closes NousResearch#6626

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The fallback parser in agent_loop.py was hardcoded to only detect
Hermes-format tool calls (<tool_call>) and always use the "hermes"
parser. This meant other model families (Gemma 4, Kimi K2, DeepSeek,
Mistral) whose tool calls went unparsed by the server would silently
fail in the fallback path.

Changes:
- Auto-detect tool call markers for all registered parser formats
- Use server-configured parser (tool_parser) when available
- Fall back to marker-based auto-detection when no parser is configured
- Covers: Hermes/Qwen, Gemma 4, Kimi K2, DeepSeek V3, Mistral

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@spprod35

Copy link
Copy Markdown

+1

@raritytiks

Copy link
Copy Markdown

this does not fix local tool calls for gemma-4-E4B-it over vllm.

@0xarkstar please see my PR to your branch 0xarkstar#1

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder provider/nous Nous Research API (OAuth) labels Apr 29, 2026
@Naroh091

Naroh091 commented May 4, 2026

Copy link
Copy Markdown

Tested this PR locally against gemma4 served via vLLM. The parser itself works for the documented cases, but I hit three follow-up issues running it in production with the hermes CLI.

1. Fallback parser isn't invoked from the production CLI path

The new _TOOL_CALL_MARKERS detection lives in environments/agent_loop.py, which is only imported by the RL/benchmark environments (hermes_base_env, terminalbench2_env, web_research_env, etc.). The actual hermes CLI goes through run_agent.pyagent/transports/chat_completions.py::normalize_response(), and that path never calls get_parser(...). So in CLI sessions, when vLLM's harmony parser intermittently leaks <|tool_call>...<tool_call|> markup into message.content with empty tool_calls, Hermes still treats it as a final text response and ends the session.

Reproduced with a worker session that did 16 successful native tool calls, then on the 17th call vLLM dropped the raw markup into content. With this PR alone, the session terminated. After mirroring the same fallback into normalize_response (~30 lines), the same payload extracts cleanly and the worker continues.

2. Parser stores nested unquoted-key dicts/lists as raw strings

When the model emits brace-syntax args containing nested objects with bare keys, e.g.:

<|tool_call>call:kanban_complete{
  metadata:{findings:[
    {date:<|"|>2026-04-09<|"|>, identifier:<|"|>BOE-A-2026-7968<|"|>, title:<|"|>...<|"|>},
    ...
  ], sources_searched:<|"|>/path<|"|>},
  summary:<|"|>...<|"|>,
  task_id:<|"|>t_xxx<|"|>
}<tool_call|>

_parse_kwargs_to_dict correctly splits the top-level pairs, but for metadata's value, ast.literal_eval fails (bare keys aren't valid Python) and the except branch falls back to value = value_str.strip("'\"") so metadata ends up as a literal string instead of a dict, and the tool rejects the call. The model then retries with an even bigger payload, eventually hitting finish_reason='length'.

Fix is small: replace the bare ast.literal_eval fallback with a recursive _parse_value helper that, on failure, descends into {...} (recursive _parse_kwargs_to_dict) and [...] (split on top-level commas, recurse per element). With that change metadata parses as the expected nested dict and kanban_complete succeeds. The 15 existing tests still pass.

def _parse_value(value_str: str):
    value_str = value_str.strip()
    if not value_str:
        return ""
    try:
        return ast.literal_eval(value_str)
    except (ValueError, SyntaxError):
        pass
    if value_str.startswith("{") and value_str.endswith("}"):
        return _parse_kwargs_to_dict(value_str[1:-1])
    if value_str.startswith("[") and value_str.endswith("]"):
        return [_parse_value(x) for x in _split_top_level(value_str[1:-1])]
    return value_str.strip("'\"")

3. String parser is naïve about embedded quotes (causes a hard infinite loop)

_parse_brace_kwargs does a blind <|"|>" substitution. When the content between markers contains its own " or ' (typical in shell commands), the result is a malformed Python string. Concrete repro from a session that locked into a tight loop:

<|tool_call>call:terminal{command:<|"|>grep -l "department: \"Ministerio de Justicia\"" /path/*.md | xargs grep -l "status: \"in_force\""<|"|>}<tool_call|>

After substitution the parser sees:

command:"grep -l "department: \"Ministerio..." | xargs grep -l "status: \"in_force\""

ast.literal_eval fails, falls back to value_str.strip("'\""), and the result is the command truncated mid-quote. Bash returns "unmatched quote", the model sees the error, retries with the same format, gets the same truncation, retries again. The session loops forever (no cross-turn dedup catches it).

Fix: capture each <|"|>...<|"|> segment as an opaque string with a non-printable placeholder (\x00HSTR\x00<n>\x00) BEFORE parsing the structure, then substitute the originals back in at the end. The parser never sees the inner quotes during structural parsing, so embedded " and ' no longer break anything. Same approach plays well with the recursive _parse_value from issue 2.

All three issues are reliably hit by tool calls with rich payloads (long shell commands, kanban_complete with metadata={findings: [...], decisions: [...]}). I'd be glad to send PRs for any subset; let me know if you'd prefer one combined or three separate.

@Naroh091

Naroh091 commented May 4, 2026

Copy link
Copy Markdown

Following up on this thread — running the previous patch + fixes 1–3 against gemma4 via NaN Builders surfaces a fourth variant that none of the existing fallbacks catch.

4. Markers leak inside structured tool_call.arguments, not just content

The fallback in 1 fires on if not tool_calls and content: — it assumes the upstream either delivers a proper structured tool call OR dumps the raw markup into content. In practice, vLLM's harmony parser for Gemma 4 has a third failure mode: it does emit a structured tool_calls (so tool_calls is non-empty, the content path is bypassed) but the chat-template quote markers (<|"|> in their full and partial forms) leak straight into the JSON string of tool_call.arguments.

Hermes' downstream json.loads(arguments) fails. Hermes can't recover, and because the failure happens during retry of an existing tool call, it gets reported to the user as Response truncated (finish_reason='length') — model hit max output tokens, which is doubly misleading: the upstream actually returned finish_reason: "tool_calls" cleanly with completion_tokens=536, no token cap was hit.

Captured payload (kanban_complete from a worker, gemma4 via NaN Builders, full 1658-byte arg string, captured via SSE pass-through proxy):

```
{"metadata": {"findings": [{"date": "<|"|"2023-09-18<|", "identifier": "<|"|"BOE-A-2023-19616<|", "title": "<|Resolución de 28 de julio de 2023, de la Subsecretaría, ...<|"}, ... ]}, "summary": "...", "task_id": "t_8345a55c"}
```

`json.loads` chokes at `column 43` because `"<|"|"` parses as a closed string `<|"|` followed by a bare `2023-09-18` token. Same payload retried twice (deterministic), same parse error twice → session aborts. Four marker shapes turn up in the wild:

Marker (raw) After JSON-escape inside arguments Fix
`<|"|>` opening (full) `"<|"|"` replace with `"`
`<|"|>` closing (full) `<|"` replace with `"`
`<|"|>` opening (truncated, no inner `"`) `"<|` replace with `"`
`<|"|>` closing (truncated, no inner `\`) `<|"` replace with `"`

A small companion to the existing block in `agent/transports/chat_completions.py::normalize_response` handles all four, with a guard to never silently rewrite already-valid JSON:

```python
if tool_calls:
import json as _json_for_validation

def _strip_gemma4_quote_markers(s: str) -> str:
    if not s or "<|" not in s:
        return s
    return (s
            .replace('"<|\\\\"|"', '"')   # full open:  "<|\\"|" → "
            .replace('<|\\\\"', '"')       # full close: <|\\"   → "
            .replace('"<|', '"')         # bare open:  "<|     → "
            .replace('<|"', '"'))        # bare close: <|"     → "

for tc in tool_calls:
    args_str = tc.arguments
    if not args_str or "<|" not in args_str:
        continue
    try:
        _json_for_validation.loads(args_str)
        continue   # already valid — never mutate
    except (ValueError, TypeError):
        pass
    cleaned = _strip_gemma4_quote_markers(args_str)
    if cleaned == args_str:
        continue
    try:
        _json_for_validation.loads(cleaned)
    except (ValueError, TypeError):
        continue   # cleaned still broken — leave original for downstream error
    tc.arguments = cleaned

```

Validated against 14 tool_calls captured from a failing gemma4 + NaN Builders session: 12 already-valid args untouched, 2 broken args (both `kanban_complete` payloads with nested `metadata.findings`) parse cleanly post-sanitization, 0 collateral damage.

@0xarkstar

Copy link
Copy Markdown
Author

Closing this PR. Two material things have changed since 2026-04:

  1. Target directory removed. Upstream commit 5af672c75 chore: remove Atropos RL environments and tinker-atropos integration (#26106) deleted environments/tool_call_parsers/ entirely. This PR's parser cannot be rebased onto current main — the registration site is gone.

  2. Better-positioned PR exists. fix(agent): recover gemma4 tool_call.arguments corrupted by leaked chat-template markers #19887 (@Naroh091) implements the Gemma 4 fix at the modern code path — agent/transports/chat_completions.py::normalize_response() — exactly where the four issues raised in this thread (CLI fallback path, nested-dict parsing, embedded-quote handling, structured-args marker leakage) need to be addressed.

Thanks @spprod35, @raritytiks, @Naroh091 for testing and the detailed bug reports. The work is consolidating around #19887; please direct future fixes there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have provider/nous Nous Research API (OAuth) type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemma 4 tool calling support (parser availability & required configuration)

5 participants