You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Organization: Zen LM (Hanzo AI × Zoo Labs Foundation)
Parameters: 1.04T total (1,044B — MoE with 32B active per token)
License: Apache 2.0
Context Window: 256K tokens
Thinking Capacity: 96K-128K thinking tokens per step
Architecture: MoE (Mixture of Experts)
Model Overview
Zen Max is the largest model in the Zen family — a 1T+ reasoning-first language model designed for test-time scaling through extended thinking and tool-calling capabilities.
Built as a thinking agent, Zen Max reasons step-by-step while using tools, executing 200-300 sequential tool calls without human interference, reasoning coherently across hundreds of steps to solve complex problems.
Key Capabilities
1. Agentic Reasoning (HLE: 44.9%)
Extended chain-of-thought reasoning with <think> tags
Multi-step planning and execution
Adaptive reasoning with hypothesis generation and refinement
Think → search → code → verify → think cycles
2. Agentic Search & Browsing (BrowseComp: 60.2%)
Goal-directed web-based reasoning
200-300 sequential tool calls for information gathering
Real-world information collection and synthesis
Dynamic search → browser → reasoning loops
3. Agentic Coding (SWE-Bench Verified: 71.3%)
Multi-language support (100+ languages)
Agentic coding workflows with tool integration
Component-heavy web development (React, HTML)
Terminal automation (Terminal-Bench: 47.1%)
4. Mathematical Reasoning
AIME 2025: 99.1% (with Python)
HMMT 2025: 95.1% (with Python)
IMO-AnswerBench: 78.6%
GPQA-Diamond: 84.5%
Architecture Features
Test-Time Scaling
Thinking Tokens: 96K-128K per reasoning step
Extended Context: 256K tokens
Sequential Tool Calls: 200-300 without human intervention
Parallel Rollouts: Heavy mode with 8 simultaneous trajectories
INT4 Quantization-Aware Training
Native INT4 inference support
2x generation speed improvement
State-of-the-art performance at INT4 precision
Optimized for low-bit quantization during post-training
Inference Efficiency
Quantization-aware training (QAT) for MoE components
INT4 weight-only quantization
~50% latency reduction
Minimal performance degradation
Benchmark Performance
Reasoning Tasks
Benchmark
Score
Notes
HLE (with tools)
44.9%
vs Human baseline 29.2%
AIME 2025 (with Python)
99.1%
75.2% without tools
HMMT 2025 (with Python)
95.1%
70.4% without tools
IMO-AnswerBench
78.6%
Mathematical olympiad
GPQA-Diamond
84.5%
Expert-level questions
Agentic Search
Benchmark
Score
Notes
BrowseComp
60.2%
vs Human 29.2%
BrowseComp-ZH
62.3%
Chinese browsing
Seal-0
56.3%
Real-world info
FinSearchComp-T3
47.4%
Financial search
Frames
87.0%
Multi-step search
Coding
Benchmark
Score
Notes
SWE-Bench Verified
71.3%
Software engineering
SWE-Multilingual
61.1%
Multi-language coding
Multi-SWE-Bench
41.9%
Multiple repositories
LiveCodeBench v6
83.1%
Competitive programming
Terminal-Bench
47.1%
Shell automation
General Capabilities
Benchmark
Score
Notes
MMLU-Pro
84.6%
Professional knowledge
MMLU-Redux
94.4%
General knowledge
Longform Writing
73.8%
Creative writing
HealthBench
58.0%
Medical knowledge
Training Approach
Architecture
1.04T parameter Mixture of Experts
32B active parameters per token
Extended thinking token support
Multi-modal reasoning capabilities
Zen Identity Fine-Tuning
Constitutional AI Training: Hanzo AI principles and values
fromtransformersimportAutoModelForCausalLM, AutoTokenizermodel=AutoModelForCausalLM.from_pretrained("zenlm/zen-max")
tokenizer=AutoTokenizer.from_pretrained("zenlm/zen-max")
# Enable thinking mode with tool accessmessages= [
{
"role": "user",
"content": "Research and analyze the latest developments in quantum computing, then write a comprehensive report."
}
]
# Model will:# 1. Think about search strategy# 2. Execute 50+ web searches# 3. Browse relevant pages# 4. Synthesize information# 5. Generate structured reportresponse=model.chat(tokenizer, messages, thinking_budget=128000, max_tool_calls=300)
2. Agentic Coding Workflow
# Component-heavy web developmentmessages= [
{
"role": "user",
"content": "Build a fully functional Word clone with React, including document editing, formatting, and export features."
}
]
# Model will:# 1. Plan component architecture# 2. Generate HTML/React code# 3. Implement styling and interactions# 4. Test and debug iteratively# 5. Deliver production-ready applicationresponse=model.chat(tokenizer, messages, thinking_budget=96000, enable_tools=True)
3. Mathematical Problem Solving
# PhD-level mathematics with Pythonmessages= [
{
"role": "user",
"content": "Solve the hyperbolic space sampling problem involving Lorentz model and Brownian bridge covariance."
}
]
# Model will:# 1. Analyze mathematical structure# 2. Execute Python computations# 3. Derive closed-form solutions# 4. Verify results numericallyresponse=model.chat(tokenizer, messages, thinking_budget=128000, python_enabled=True)
4. Heavy Mode (Parallel Reasoning)
# 8 parallel trajectories with reflective aggregationmessages= [
{
"role": "user",
"content": "Comprehensive analysis of climate change solutions across economics, technology, and policy."
}
]
response=model.chat(
tokenizer,
messages,
mode="heavy", # 8 parallel rolloutsthinking_budget=128000,
enable_reflection=True
)
Tool Call Limits: 300 steps may not suffice for extremely complex tasks
Context Management: Auto-hiding may lose important intermediate results
Quantization: INT4 optimized, but BF16 still preferred for maximum accuracy
Training Data
Zen Fine-Tuning:
Zoo-Gym framework with RAIS technology
Constitutional AI alignment data
Multi-turn tool-calling trajectories
Agentic workflow demonstrations
Verification: Human expert validation on HLE, AIME, coding tasks
Citation
@misc{zenmax2025,
title={Zen Max: Reasoning-First Language Model with Test-Time Scaling},
author={Hanzo AI and Zoo Labs Foundation},
year={2025},
url={https://zenlm.org}
}