Skip to content

[Model] Add HyperCLOVAX-SEED-Think-14B reasoning and tool parsers#44171

Open
meanwo wants to merge 1 commit into
vllm-project:mainfrom
meanwo:hcx-seed-think-14b-parser
Open

[Model] Add HyperCLOVAX-SEED-Think-14B reasoning and tool parsers#44171
meanwo wants to merge 1 commit into
vllm-project:mainfrom
meanwo:hcx-seed-think-14b-parser

Conversation

@meanwo

@meanwo meanwo commented Jun 1, 2026

Copy link
Copy Markdown

Summary

Adds built-in reasoning and tool parsers for naver-hyperclovax/HyperCLOVAX-SEED-Think-14B,
registered as hyperclovax_seed_think_14b for both --reasoning-parser and
--tool-call-parser:

vllm serve naver-hyperclovax/HyperCLOVAX-SEED-Think-14B \
  --reasoning-parser hyperclovax_seed_think_14b \
  --tool-call-parser hyperclovax_seed_think_14b \
  --enable-auto-tool-choice

The model emits tool calls as <|im_start|>assistant -> tool/function_call\n[{...}] and
reasoning after a /think prompt; this parser turns that into structured reasoning_content
and tool_calls. (It is a separate parser from the 32B variant, whose chat template and
tool-call format differ.)

Attribution & test versions

The parser logic is adapted from the official HyperCLOVA X vLLM plugin
(https://github.com/NAVER-Cloud-HyperCLOVA-X/hcx-vllm-plugin) — the plugin linked from the
HuggingFace model card — and ported to vLLM's built-in conventions (single self-contained
module, lazy registration, modernized imports).

The reference plugin's last tool-parser commit is ~10 months old, so to preserve its import
compatibility it is tested on vllm/vllm-openai:v0.9.2, while this PR (modernized imports)
is tested on v0.21.0.

The problem and the fix (tool_choice="required")

The reference plugin's reasoning-end boundary detection (is_reasoning_end /
extract_content_ids) is not tuned for the structured-output path, so vLLM's grammar engages
at the wrong token for tool_choice="required". This PR tightens that boundary so the grammar
applies from the correct token.

Test matrix — 54 cases (each axis varied independently):

group axes count
tool tool_choice {auto, required} × stream {true, false} × force_reasoning {none, true, false} × skip_reasoning {none, true, false} 2×2×3×3 = 36
reasoner force_reasoning {none, true, false} × skip_reasoning {none, true, false} × stream {true, false} 3×3×2 = 18
total 54

Each matrix was run with and without the model card's recommended tool-use system prompt:

parser (vLLM version) × system prompt required (ok/18) auto tool_calls (/18) reasoner (/18) errors
reference (v0.9.2) + official prompt 0/18 18/18 18/18 3× HTTP 400
this PR (v0.21.0) + official prompt 18/18 18/18 18/18 0
this PR (v0.21.0) + no prompt 18/18 0/18 18/18 0
reference (v0.9.2) + no prompt 0/18 0/18 18/18 4× HTTP 400

Two independent axes:

  • required is decided by the parser (system-prompt-independent). The reference plugin
    fails 18/18 regardless of prompt — finish_reason=stop instead of tool_calls, HTTP 400
    with force_reasoning, or no tool call when streaming. This PR passes 18/18 regardless.
  • auto is decided by the system prompt (parser-independent). With the recommended
    tool-use prompt both reach 18/18; without it both reach 0/18 (see model-format limitation
    below).
  • reasoner is correct in all four conditions.

Model-format limitation under auto (not a parser bug)

Without the model card's recommended tool-use system prompt, the model follows the prompt
format it was trained on less reliably, and tool_choice="auto" produces two off-spec cases.
Ideal output (per the chat template):

<|im_start|>assistant -> tool/function_call
[{"name": "get_current_weather", "arguments": {"location": "Seoul", "unit": "celsius"}}]<|im_end|>

Observed off-spec cases (auto, empty system prompt, input "서울 날씨 알려줘." + a
get_current_weather tool; temperature=0, seed=42):

# Case 1 — real function name instead of "function_call", bare object instead of array.
#          (The tool intent is present, just off-format.)
-> tool/get_current_weather
{"location": "Seoul", "unit": "celsius"}

# Case 2 — plain text, no tool call at all. (The model chose not to call a tool.)
현재 서울의 날씨 정보를 가져올 수 없습니다. 하지만, 아래 링크를 통해 ...
  • Under tool_choice="required" the boundary fix lets vLLM's grammar force the canonical
    [{...}] form, so both cases come out as valid tool calls (the 18/18 above).
  • Under auto there is no grammar, so neither is recovered: Case 1 is off-format and
    Case 2 has no tool intent to recover from. The parser returns the text as content rather
    than fabricating a call.

This is a model-format limitation (the model adheres to a specific system-prompt format),
not a parser defect: supplying the recommended tool-use system prompt makes auto produce
valid calls too (18/18 in the matrix). What the parser does add on top of the required-path
fix is robust handling of canonical-format variants (bare-array hand-off, arguments
parameters, dash-prefixed arrays, multiple calls).

Tests run

Unit (v0.21.0, mock tokenizer, no GPU) — 26/26 passed

python -m pytest tests/reasoning/test_hyperclovax_seed_think_14b_reasoning_parser.py \
                 tests/tool_parsers/test_hyperclovax_seed_think_14b_tool_parser.py -v

Covers the 9-combination force_reasoning × skip_reasoning matrix, reasoning↔content boundary
split, streaming, is_reasoning_end boundary detection, and the tool-call forms above, in both
non-streaming and streaming paths.

End-to-end: the 54-case matrix above (14B served with the built-in parser).
Lint: ruff check and ruff format --check pass.

AI assistance

This PR was prepared with AI assistance (Claude Code). The human submitter reviewed every
changed line and ran the tests above.

Naming note

Uses the _14b suffix to distinguish it from the 32B variant, whose chat template and
tool-call format differ. Happy to rename per maintainer preference.

Add built-in reasoning and tool parsers for
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B, registered as
`hyperclovax_seed_think_14b` for both --reasoning-parser and
--tool-call-parser.

The 14B Think model delimits reasoning with a `/think` generation prompt
and hands off to the assistant content/tool channel at the
`<|im_end|>\n<|im_start|>assistant` boundary, emitting tool calls as
`-> tool/function_call\n[{"name": ..., "arguments": ...}]`. Correctly
locating that boundary lets vLLM's structured-output grammar engage at the
right token for tool_choice="required".

The parser is adapted from the official HyperCLOVA X vLLM plugin
(github.com/NAVER-Cloud-HyperCLOVA-X/hcx-vllm-plugin), referenced in the model's
HuggingFace card. The 14B chat template and tool-call format differ from the 32B
variant, so it is added as a separate parser.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: meanwo <meanwo1017@gmail.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify

mergify Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Documentation preview: https://vllm--44171.org.readthedocs.build/en/44171/

@Incheonkirin

Copy link
Copy Markdown

Thanks for working on this — I ran some adversarial Korean tool-calling payloads against the branch.

One hardening case in the non-streaming path: a literal <|im_end|> inside a JSON string argument breaks extraction, because the boundary is found by text search (find("<|im_end|>") / the regex) rather than by JSON structure. The streaming path already survives it since it scans object boundaries.

Minimal failing case:

text = ' -> tool/function_call\n[{"name": "lookup", "arguments": {"query": "문자열 <|im_end|> 포함"}}]<|im_end|>'
parser.extract_tool_calls(text, request)
# tools_called=False — the literal <|im_end|> is taken as the boundary, so the JSON is cut short

A patch that parses the array with json.JSONDecoder().raw_decode (which respects JSON string escaping) instead of cutting at the literal token:

@@ class HyperCLOVAXSeedThink14BToolParser(ToolParser):
     def _arguments_json(self, function_call: dict) -> str:
         return json.dumps(function_call.get("arguments", {}), ensure_ascii=False)

+    def _decode_tool_call_array(self, text: str) -> list[dict]:
+        raw_function_calls, _ = json.JSONDecoder().raw_decode(text.lstrip())
+        if not isinstance(raw_function_calls, list):
+            raise ValueError("Tool-call payload must be a JSON array.")
+        return raw_function_calls
@@
             stripped = model_output.lstrip("\n")
             if stripped.startswith("[") and getattr(request, "tools", None):
                 try:
-                    bare_json = stripped
-                    end_idx = bare_json.find(self.tool_call_end_token)
-                    if end_idx >= 0:
-                        bare_json = bare_json[:end_idx]
-                    bare_json = bare_json.rstrip()
-                    raw_function_calls = json.loads(bare_json)
+                    raw_function_calls = self._decode_tool_call_array(stripped)
@@
-        # guard against no regex match before using raw_function_calls
-        tool_call_match = self.tool_call_regex.search(model_output)
-        if not tool_call_match:
-            return ExtractedToolCallInformation(
-                tools_called=False, tool_calls=[], content=model_output
-            )
-
         try:
-            if tool_call_match.group(1) is not None:
-                raw_function_calls = json.loads(tool_call_match.group(1))
-            else:
-                raw_function_calls = json.loads(tool_call_match.group(2) + "]")
+            _, _, function_call_text = model_output.partition(self.tool_call_start_token)
+            raw_function_calls = self._decode_tool_call_array(function_call_text)

One caveat I'll flag rather than fold in: raw_decode silently accepts trailing data after the array (junk between ] and <|im_end|>), where the old slicing-based json.loads rejected it — same edge class I ran into on the Hermes side (#45168). Whether to validate that is your call for this parser.

All 26 existing parser/reasoning tests in the branch still pass with it. I also have a small adversarial suite (char-level delta splits, Hangul tool names, emoji modifiers in arguments) — I can add it here if that's useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants