Skip to content

Add reasoning capability detection and automatic polyfills#20

Draft
ochafik wants to merge 11 commits intomainfrom
issue-19-thinking-capabilities
Draft

Add reasoning capability detection and automatic polyfills#20
ochafik wants to merge 11 commits intomainfrom
issue-19-thinking-capabilities

Conversation

@ochafik
Copy link
Owner

@ochafik ochafik commented Dec 23, 2025

Summary

This PR implements comprehensive reasoning capability detection and polyfills for chat templates. It enables automatic transformation of the canonical reasoning_content field to each template's native format, allowing applications to use a unified input format across all models.

Documentation: See docs/CAPABILITIES_AND_POLYFILLS.md for detailed usage guide.

Key Features

1. ReasoningFormat Detection

Automatically detects which reasoning format a template supports:

Format Field/Structure Example Models
REASONING_CONTENT_FIELD message.reasoning_content Qwen3, GLM-4.6/4.7
THOUGHT_FIELD message.thought MiniCPM3
THINKING_FIELD message.thinking GPT-OSS-120B
TOOL_PLAN_FIELD message.tool_plan Command-R7B (requires tool_calls)
THINKING_CONTENT_BLOCK content[].type == "thinking" Ministral, DeepSeek-R1
THOUGHTS_CONTENT_BLOCK content[].type == "thoughts" Apertus, Kimi K2

2. Automatic Polyfills

When input uses the canonical reasoning_content field but the template expects a different format, the library automatically transforms it:

reasoning_content  →  thought                    (MiniCPM3)
reasoning_content  →  thinking                   (GPT-OSS-120B)
reasoning_content  →  tool_plan                  (Command-R7B, with tool_calls)
reasoning_content  →  content[{type:"thinking"}] (Ministral, DeepSeek-R1)
reasoning_content  →  content[{type:"thoughts"}] (Kimi K2, Apertus)

3. New Capability Flags

struct chat_template_caps {
    // Reasoning capabilities
    bool supports_reasoning = false;
    ReasoningFormat reasoning_format = NONE;
    bool reasoning_requires_tools = false;        // Command-R7B pattern
    bool supports_clear_thinking = false;         // GLM-4.7 visibility control
    
    // Behavior detection
    bool supports_reasoning_without_content = false;
    bool supports_reasoning_with_content = false;
    bool respects_enable_reasoning = false;
    
    // Content format
    bool requires_typed_content_blocks = false;   // [{type:"text", text:...}]
};

4. Polyfill Control

Individual polyfills can be enabled/disabled:

chat_template_options opts;
opts.apply_polyfills = true;           // Master switch
opts.polyfill_reasoning = true;        // reasoning_content conversion
opts.polyfill_typed_content = true;    // String → content blocks
// ... other polyfills

5. tojson Enhancements

Added tojson(separators=(',', ':')) support for compact JSON output (used by Kimi K2 template).

Models That Benefit

Model Family Reasoning Format Polyfill Applied
Qwen3 REASONING_CONTENT_FIELD None (native)
GLM-4.6/4.7 REASONING_CONTENT_FIELD None (native)
MiniCPM3 THOUGHT_FIELD reasoning_contentthought
Command-R7B TOOL_PLAN_FIELD reasoning_contenttool_plan
DeepSeek-R1 THINKING_CONTENT_BLOCK → content blocks
Ministral THINKING_CONTENT_BLOCK → content blocks
Kimi K2 THOUGHTS_CONTENT_BLOCK → content blocks

Test Infrastructure

Template-Independent Validation

Added _test_metadata in context JSON files for assertions that work across all templates:

{
  "_test_metadata": {
    "expected_strings": ["always present"],
    "expected_strings_if_supports_reasoning": ["reasoning text"],
    "expected_strings_if_supports_tool_calls": ["tool name"],
    "forbidden_strings": ["[object Object]"]
  }
}

New Test Contexts

  • reasoning_only.json - Basic reasoning content
  • reasoning_multi_turn.json - Multi-turn conversation with reasoning
  • reasoning_position_based.json - Position-based visibility (Kimi K2)
  • reasoning_clear_thinking.json - clear_thinking=false behavior
  • reasoning_with_tools.json - Reasoning combined with tool calls
  • reasoning_disabled.json - enable_thinking=false
  • tool_plan_reasoning.json - TOOL_PLAN_FIELD pattern

llama.cpp Integration

This enables llama.cpp to:

  1. Detect reasoning format from model templates automatically
  2. Apply polyfills to convert reasoning_content to native formats
  3. Simplify parsers - output parsers only need to handle canonical format

See integration: sync-minja-reasoning branch

Test Results

All 880+ tests pass across Ubuntu, Windows, and macOS.


Closes #19

ochafik and others added 2 commits December 29, 2025 22:24
- Add supports_thinking flag to detect reasoning_content field support
- Add supports_disable_thinking, supports_reasoning_only, supports_reasoning_with_content flags
- Add reasoning_requires_tools flag for templates that only reason with tools
- Add tests for Qwen3-235B-A22B-Thinking-2507 and GLM-4.6
- Add model IDs: DeepSeek-V3.1, granite-3.3-2b-instruct, GLM-4.7

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…cture

ThinkingPattern detection & polyfills:
- Add polyfill logic to transform reasoning_content to template's native format
- Support for THOUGHT_FIELD (MiniCPM3), THINKING_FIELD (GPT-OSS), TOOL_PLAN_FIELD (Command-R7B)
- Add CONTENT_BLOCK patterns (Ministral/Apertus) with improved detection
- Improved content block detection: reject stringified output by checking for structural markers
- Add supports_clear_thinking detection for templates like GLM-4.7

Test infrastructure:
- Add test metadata (_test_metadata) to context JSON files for template-independent validation
- Add expected_strings/forbidden_strings checks to test-supported-template.cpp
- Support conditional checks: expected_strings_if_supports_thinking, _system_role, _tool_calls, _tool_responses
- Add ThinkingPattern capability tests to test-capabilities.cpp

New reasoning test contexts:
- reasoning_only.json - basic reasoning content
- reasoning_multi_turn.json - multi-turn conversation with reasoning
- reasoning_position_based.json - position-based visibility
- reasoning_clear_thinking.json - clear_thinking flag behavior
- reasoning_with_tools.json - reasoning with tool calls
- reasoning_disabled.json - enable_thinking=false

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ochafik ochafik force-pushed the issue-19-thinking-capabilities branch from 0c9ae97 to c12caa0 Compare December 29, 2025 22:28
ochafik and others added 7 commits December 29, 2025 22:40
Add the missing collapse_blank_lines function and regex include
that was lost during the rebase conflict resolution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The template is already in MODEL_IDS and gets downloaded to build/tests/
during cmake configure. No need to commit it separately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
API renames for consistency:
- ThinkingPattern → ReasoningFormat
- REASONING_CONTENT_FIELD → REASONING_CONTENT
- thinking_pattern → reasoning_format
- supports_thinking → supports_reasoning
- supports_clear_thinking → supports_reasoning_visibility

New behavior detection probes (computed via template rendering):
- supports_reasoning_without_content: Can emit reasoning with empty content
- supports_reasoning_with_content: Can emit both reasoning and content
- respects_enable_reasoning: Template honors enable_thinking=false

Added tool_plan_reasoning.json test context for TOOL_PLAN_FIELD format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The name directly matches the input flag (clear_thinking).
… tojson separators

- Rename `requires_typed_content` to `requires_typed_content_blocks` for clarity
- Rename ReasoningFormat enum values:
  - REASONING_CONTENT → REASONING_CONTENT_FIELD
  - CONTENT_BLOCK_THINKING → THINKING_CONTENT_BLOCK
  - CONTENT_BLOCK_THOUGHTS → THOUGHTS_CONTENT_BLOCK
- Add `tojson(separators=...)` support (used by Kimi K2 template)
- Add Kimi K2 (moonshotai/Kimi-K2-Instruct) to test suite
- Add capabilities tests for reasoning_requires_tools behavior
- Add stringification checks to test contexts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The tools directory is optional and may not exist in all environments.
Check for its existence before adding it as a subdirectory.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Command-R7B is excluded from MODEL_IDS on Windows due to known issues
(google#40). The test-capabilities test
for ToolPlanField_CommandR7B should also be skipped on Windows.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ochafik ochafik force-pushed the issue-19-thinking-capabilities branch from 7556a25 to 19141bd Compare December 31, 2025 14:10
@ochafik ochafik changed the title Add thinking/reasoning capability detection Add reasoning capability detection and automatic polyfills Dec 31, 2025
Comprehensive documentation for:
- Capability detection (tools, reasoning, content formats)
- ReasoningFormat enum and detection priority
- Automatic polyfill system
- Usage examples in C++
- Integration guidance for llama.cpp

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ochafik ochafik force-pushed the issue-19-thinking-capabilities branch from 5026655 to 0d8d3f0 Compare December 31, 2025 15:19
More consistent with the flag name (enable_thinking) and the naming
pattern of other capability flags (supports_*).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ochafik ochafik force-pushed the issue-19-thinking-capabilities branch from b36f5a1 to 3bf064d Compare December 31, 2025 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Auto-detect template capabilities (supports_disable_thinking, supports_reasoning_only, etc.)

1 participant