DeepSeek Sparse Attention — O(n²) → O(n·k), 2–3× faster inference Thinking + Tool-Use — first open model to integrate both in one pass 1,800+ environments — 85,000+ complex agent instructions synthesized Speciale variant — IMO, IOI, ICPC & CMO 2025 Gold Medal GPT-5 level — V3.2 matches Kimi-K2-Thinking & GPT-5 on reasoning 671B / 37B active — same MoE efficiency, new attention mechanism MIT License — open-source, commercial use permitted DeepSeek Sparse Attention — O(n²) → O(n·k), 2–3× faster inference Thinking + Tool-Use — first open model to integrate both in one pass 1,800+ environments — 85,000+ complex agent instructions synthesized Speciale variant — IMO, IOI, ICPC & CMO 2025 Gold Medal GPT-5 level — V3.2 matches Kimi-K2-Thinking & GPT-5 on reasoning 671B / 37B active — same MoE efficiency, new attention mechanism MIT License — open-source, commercial use permitted
Released December 1, 2025 · Sparse Attention · Agentic Reasoning

The Open-Source
Frontier Leap.

DeepSeek V3.2 rewrites what open-source AI can do. DeepSeek Sparse Attention slashes long-context costs. Integrated thinking+tool-use enables real agent workflows. And Speciale earns Gold Medals at IMO, IOI, ICPC, and CMO 2025.

Try in Chat Free → 🤗 Download Weights Read Paper ↗
671BTotal params
37BActive per token
O(n·k)DSA complexity
1,800+Agent environments
4× GoldIMO·IOI·ICPC·CMO
MITLicense
Model Variants

Three Variants for Every Task

V3.2 ships in three configurations covering production agentic workloads, early experimentation, and frontier research reasoning — all sharing the same underlying architecture.

⚙️ Production Default
V3.2 · MAIN 🚀
DeepSeek-V3.2
deepseek-ai/DeepSeek-V3.2 · Dec 1, 2025

The production model — built for real agentic workflows. First open-source model to integrate chain-of-thought thinking directly into tool-use, enabling true end-to-end reasoning agents. Trained on 1,800+ environments and 85,000+ synthesized complex instructions. Supports tool-use in both thinking and non-thinking modes. GPT-5-level performance across multiple reasoning benchmarks.

671B
Total params
37B
Active params
128K
Context
DSA
Attention
Download on Hugging Face →
V3.2-Exp · EARLY 🧪
DeepSeek-V3.2-Exp
deepseek-ai/DeepSeek-V3.2-Exp · Sep 29, 2025

Experimental release — the infrastructure testbed that shipped first. Introduced DeepSeek Sparse Attention (DSA) on top of V3.1-Terminus. Released to prepare the ecosystem and inference stack before the full V3.2 launch. Same architecture as V3.2 — same parameter count, same DSA mechanism, different post-training. Recommended for research into the DSA architecture itself.

671B
Total params
37B
Active params
Same
Architecture
Sep '25
Release
Download Exp →
🥇 IMO · IOI · ICPC · CMO Gold
SPECIALE · RESEARCH 🏆
DeepSeek-V3.2-Speciale
Served via temporary endpoint (expired Dec 15, 2025)

The research frontier variant — built by relaxing length constraints to allow unlimited test-time compute. Achieves Gemini-3.0-Pro-level reasoning and surpasses GPT-5 on complex tasks. Gold Medal at IMO 2025, IOI 2025, ICPC World Finals 2025, and CMO 2025. Intended for deep reasoning research — does not support tool-calling. Higher token usage than V3.2.

671B
Total params
Unlimited
Think budget
4× Gold
Competitions
No tools
Research only
View on Hugging Face →
V3.2 vs V4: DeepSeek-V3.2 was superseded by DeepSeek-V4 (April 24, 2026), which adds a 1M token context window, Compressed Sparse Attention (CSA), and native production agentic tooling. For new API projects, use deepseek-v4-flash or deepseek-v4-pro. V3.2 weights remain the best open-source option for self-hosted agentic deployments that require the V3.2 architecture specifically.
Technical Architecture

Three Innovations That Define V3.2

DeepSeek V3.2 doesn't just improve benchmark numbers — it solves structural problems that blocked open-source models from competing with frontier proprietary systems.

DeepSeek Sparse Attention (DSA)

Standard transformer attention scales as O(n²) with sequence length — the reason long contexts are expensive. DeepSeek V3.2 replaces dense self-attention with DSA: a selective, relevance-driven attention mechanism that identifies the top-k most relevant tokens before applying attention.

A lightweight scorer called the Lightning Indexer evaluates token relevance without running full attention, then attention is applied only to the selected k tokens. This transforms the computational profile from quadratic to approximately linear:

Complexity comparison
Standard Attention: O(n²) per layer DSA Attention: O(n·k) per layer k << n, so DSA ≈ O(n) effectively

Results: 2–3× faster inference, 30–40% lower memory usage, and dramatically better performance on long-context reasoning without the quality degradation seen in other sparse attention approaches. DSA is combined with Multi-Head Latent Attention (MLA) from V3 for KV cache compression.

Dense (O(n²))
100% cost
DSA (O(n·k))
~35–40%
🔗
Integrated Thinking + Tool-Use

Before V3.2, reasoning and tool-use were separate behaviors — a model either reasoned deeply (like R1) or used tools effectively (like V3), but not both simultaneously. V3.2 is the first open-source model to integrate thinking directly into tool-use, enabling true end-to-end reasoning agents.

Three operating modes per request:

Tool-use modes
Non-thinking + tools: Fast, direct tool calls Thinking + tools: Reason about when/how to use tools Pure thinking: Deep reasoning, no tools (Speciale)

In "Thinking + tools" mode, V3.2 generates chain-of-thought reasoning about which tools to use, in what order, with what parameters — before making any API calls. This dramatically improves multi-step agent tasks where naive tool selection leads to failure cascades. In τ²-Bench, MCP-Universe, and Tool-Decathlon, V3.2 thinking mode performs competitively with GPT-5 High and Gemini 3.0 Pro.

🏗️
Agentic Task Synthesis Pipeline

Previous models trained on limited agentic data struggled with the long-tail of complex interactive environments — real-world agent scenarios that require multi-step planning, error recovery, and tool composition. V3.2 addresses this with a novel synthesis pipeline.

The pipeline systematically generates training data at scale:

Synthesis pipeline output
1,800+ distinct environments generated 85,000+ complex prompts synthesized Covers code, search, web, file, API agents Long-tail tasks deliberately emphasized

This synthesized data drives RL fine-tuning, significantly enhancing generalization and instruction-following robustness in complex interactive environments. DeepSeek's ablation shows that restricting RL to code and search alone doesn't improve agent benchmarks — the diversity of the synthetic environments is the key factor. The result is substantial improvements on Tau2Bench, MCP-Mark, and MCP-Universe versus any previous open-source model.

📐
MoE Architecture: V3.2 vs Predecessors

The core MoE structure is maintained from V3/V3.1: 671B total parameters, 37B activated per token. Each MoE layer uses Kᵣ=8 routed experts per token, chosen from 256 FFN experts plus 1 shared expert — resulting in 9 parallel expert computations per token.

Key changes in V3.2 vs V3.1:

Changes from V3.1 → V3.2
Attention: MLA only → MLA + DSA (new) Expert routing: Auxiliary loss → Dynamic biasing Training: General → Agentic synthesis pipeline Modes: Think OR tools → Think AND tools

The dynamic expert biasing (replacing auxiliary load balancing penalties) improves expert specialization — experts become more focused on specific task types, improving interpretability and stability. Load balancing quality is maintained without penalizing the loss function, improving overall model quality. Sampling recommendation: temperature 1.0, top-p 0.95 for local deployment.

Performance Benchmarks

GPT-5 Level. Open Source.

V3.2 matches GPT-5 and Kimi-K2-Thinking across reasoning benchmarks. Speciale matches Gemini-3.0-Pro and surpasses GPT-5 on frontier tasks. All data from the official paper (arXiv:2512.02556).

HMMT Feb 2025
Harvard-MIT Math Tournament — undergraduate-level competition
V3.2 competitive with GPT-5
Gemini 3.0 Pro
leads
GPT-5
strong
V3.2 / Kimi-K2
similar
HLE (text-only subset)
Humanity's Last Exam — expert-level research reasoning
V3.2 ≈ GPT-5
Gemini 3.0 Pro
leads
GPT-5
strong
DeepSeek-V3.2
≈ GPT-5
Kimi-K2-Thinking
≈ V3.2
MATH-500 (V3.2-Exp vs V3.1)
Improvement from V3.1 to V3.2-Exp post-training
+8% jump
V3.2-Exp
82.8%
V3.1 (base)
74.8%
LiveCodeBench (V3.2-Exp improvement)
Competitive programming - fresh problems from 2025
+5.2% improvement
V3.2-Exp
34.4%
V3.1 (base)
29.2%
IOI 2025 — International Olympiad in Informatics
V3.2-Speciale: Gold Medal
🥇 Gold - Speciale
DeepSeek-V3.2-Speciale achieved gold-medal performance at IOI 2025 - the International Olympiad in Informatics, the world's most prestigious competitive programming competition for high school students. The model's combination of extended reasoning compute (relaxed length constraints) and deep agentic code training enabled it to match the performance of the best human competitors. IOI problems require not just code generation but algorithmic insight, complexity analysis, and implementation precision across multiple test cases.
ICPC World Finals 2025
Top competitive programming - university-level
🥇 Gold — Speciale
V3.2-Speciale earned gold at the ICPC World Finals - a team competition requiring solving 12+ complex algorithmic problems under time pressure. This result reflects both the raw algorithmic reasoning capacity and the extended thinking budget that Speciale was designed for. DeepSeek published the final ICPC submissions for transparency.
τ²-Bench (Tool + Reasoning)
Multi-turn, multi-tool agentic reasoning benchmark
V3.2 competitive w/ GPT-5 High
GPT-5 High
leads
V3.2 (Thinking)
competitive
V3.2-SFT only
baseline
MCP-Universe
MCP server ecosystem — real-world tool-use diversity
V3.2: strongest open-source
Gemini 3.0 Pro
leads
V3.2 (Thinking)
best open
Long-Context Agent (with context management)
Context management enables 51.4% score vs 32.4% without
+19% from context mgmt
V3.2 + ctx mgmt
51.4%
V3.2 no ctx mgmt
32.4%
Four Gold Medals — 2025 Competitions
V3.2-Speciale with relaxed length constraints
🥇 IMO · IOI · ICPC · CMO
🥇
IMO 2025
International Math Olympiad
🥇
IOI 2025
Int'l Olympiad in Informatics
🥇
ICPC 2025
World Finals Programming
🥇
CMO 2025
Chinese Math Olympiad
Speciale achieves Gemini-3.0-Pro-level performance and surpasses GPT-5 on complex frontier reasoning tasks. DeepSeek published the actual competition submissions for IMO, IOI, ICPC, and CMO for independent verification.
Core Capabilities

What V3.2 Delivers

Every major capability area is improved in V3.2 — from the raw attention mechanism to the training pipeline to the agentic behavior.

DeepSeek Sparse Attention

O(n·k) complexity via Lightning Indexer scoring. 2–3× faster inference, 30–40% less memory. Long-context reasoning without quality degradation. Handles 128K tokens efficiently where dense attention buckles.

DSA architecture
🤝
Think + Tool in One Pass

First open model to chain thinking and tool-use — reason about which tools to call, then call them, all in a single model pass. Three modes: non-thinking+tools, thinking+tools, thinking-only (Speciale).

Novel capability
🌍
1,800+ Agent Environments

Trained on 85,000+ synthesized complex instructions across 1,800+ distinct environments: code execution, web search, file operations, API integrations, and more. Long-tail agent tasks are deliberately emphasized.

Broadest coverage
🧠
Dynamic Expert Biasing

Replaces auxiliary load balance penalties with dynamic expert biasing — improving expert specialization while maintaining load balance. More interpretable stepwise reasoning and smoother agentic behavior.

MoE improvement
📏
Context Management

When reasoning approaches 80% of context window, context management kicks in with simple but effective strategies to extend the token budget. Improves long agent task scores from 32.4% to 51.4% — a 19-point gain.

+19% agent score
🏆
Competition-Grade Math

MATH-500 improved from 74.8% to 82.8% vs V3.1. Speciale achieves IMO and CMO Gold. Extended thinking budget with relaxed length constraints enables frontier-level mathematical reasoning.

+8% MATH-500
💻
Competitive Programming

LiveCodeBench improved from 29.2% to 34.4%. Speciale earns IOI and ICPC Gold — top performance in the world's hardest algorithmic contests. Understands complexity analysis, algorithmic patterns, and test-case design.

IOI Gold
🔧
Function Calling & JSON

Full OpenAI-compatible function calling across all thinking modes (except Speciale, which is research-only). JSON output mode for structured responses. Production-ready tool integration APIs.

API ready
🔌
OpenAI API Compatible

Same endpoint structure as OpenAI. Change base URL and API key. Existing streaming, function calling, and structured output integrations work without modification.

2-line migration
🔓
MIT License

Full MIT license on all V3.2 weights — commercial use, fine-tuning, distillation, and redistribution all permitted. Download from Hugging Face and deploy anywhere.

Commercial ✓
📊
MCP Ecosystem

Deep compatibility with the Model Context Protocol (MCP) ecosystem — supports MCP-Mark and MCP-Universe benchmarks. Designed for integration with MCP servers across file, database, web, and code environments.

MCP ready
🏗️
ESS Offload Architecture

Expert Storage Server (ESS) enables high-throughput, memory-efficient inference at 128K context by offloading expert weights on demand. Makes 671B MoE practical on realistic hardware setups.

Hardware efficient
DeepSeek V3 Family Timeline

From V3 to V4: The Full Journey

How the V3 architecture evolved through V3.1, V3.2-Exp, and V3.2 before being succeeded by V4.

December 26, 2024
DeepSeek-V3 — The Model That Started the Wave

Original V3 released with 671B parameters, 37B active. MoE + MLA architecture. Multi-token prediction. First open-source model to challenge GPT-4o head-to-head across coding and reasoning. Built at an estimated $5.5M — far less than frontier closed-source models.

671B MoE · Challenged GPT-4o
March 24, 2025
DeepSeek-V3-0324 — Reasoning Upgrade

Post-training improvements drawing on RL techniques from R1. Math and coding gains — outperforms GPT-4.5 on math and coding evaluations. Smarter tool calling. API update: deepseek-chat alias routed to this version.

GPT-4.5 beater on math · V3.0324
August 21, 2025
DeepSeek-V3.1 — Hybrid Thinking

Major architectural milestone. Combines V3 general capabilities with R1 chain-of-thought reasoning in a single model — switch between thinking and non-thinking modes via chat template. 128K context via two-phase extension (630B + 209B tokens). Stronger tool calling and agent performance than both V3-0324 and R1-0528.

Hybrid CoT · 128K context
September 22, 2025
DeepSeek-V3.1-Terminus — Stability Checkpoint

Small but significant improvement to V3.1 checkpoint — improved training stability and base model quality. Became the foundation for V3.2-Exp's DSA continued training. Not widely publicized separately.

V3.1 stability refinement
September 29, 2025
DeepSeek-V3.2-Exp — DSA Infrastructure Prep

Experimental release introducing DeepSeek Sparse Attention (DSA) via continued training on V3.1-Terminus. Primary purpose: test inference infrastructure and ecosystem tools before the full V3.2 release. Benchmark performance on par with V3.1-Terminus — early release intentionally conservative. Identified and fixed a RoPE implementation discrepancy in November 2025.

First DSA model · Infrastructure prep
December 1, 2025
DeepSeek-V3.2 — The Full Release

The main event. DSA for efficient long-context attention. Integrated thinking+tool-use in a single model. 1,800+ environment agentic synthesis pipeline. GPT-5-level reasoning benchmarks. Three variants: V3.2 (production), V3.2-Exp (experimental), and V3.2-Speciale (research frontier). Context management for extended agent tasks. MIT licensed.

GPT-5 level · Agent reasoning · MIT
April 24, 2026
DeepSeek-V4 - Successor with 1M Context

V3.2 superseded by V4-Pro and V4-Flash. Key advances: 1M token context window, Compressed Sparse Attention (CSA) — the evolution of DSA — and Hierarchical Context Aggregation (HCA). Both V4 models trained on 32T+ tokens. V3.2 weights remain open-source and the best available for self-hosted deployments requiring the V3.2 architecture.

Superseded by V4 · V3.2 remains open-source
Use Cases

What V3.2 Enables

V3.2's combination of sparse attention, hybrid thinking, and agentic training opens up applications that previous models couldn't reliably handle.

01
Production AI Agents

The 1,800+ environment training and integrated thinking+tools make V3.2 the best open-source choice for building real production agents — code execution agents, search agents, file-processing agents. The thinking mode reasons about which tools to use before making any calls.

02
Long-Context Document Analysis

DSA's O(n·k) complexity makes 128K context affordable at inference time. Process entire research papers, legal contracts, codebases, or technical manuals in a single request without the latency and cost penalty of dense attention at scale.

03
MCP Server Integration

V3.2 was specifically designed for the Model Context Protocol ecosystem. Use it as the reasoning backbone for MCP-based agent systems — web browsing, database queries, code execution, and file management — all orchestrated through a single model.

04
Research Math & Proofs

MATH-500 at 82.8%, IMO Gold via Speciale. For mathematical research assistance — conjecture exploration, proof sketching, competition training — V3.2 delivers the best open-source results. Use Speciale for frontier-level work where token budget isn't a constraint.

05
Competitive Programming

IOI and ICPC Gold via Speciale; significant LiveCodeBench improvement in V3.2-Exp. For algorithm design, complexity analysis, and competitive programming practice, V3.2 matches the best closed-source models at the hardest levels.

06
Self-Hosted Agentic AI

MIT license + ESS offload architecture makes V3.2 the best self-hosted choice for enterprise agent infrastructure. Run full 671B reasoning agents on-premises — no API dependency, no data leaving your network.

API Integration

Use V3.2 Today

V3.2 is available via API and open-source weights. The Speciale endpoint expired Dec 15, 2025 — for production use, use deepseek-v4-pro which supersedes V3.2.

# DeepSeek V3.2 via API # Note: For new projects, deepseek-v4-flash is recommended from openai import OpenAI client = OpenAI( api_key="YOUR_DEEPSEEK_API_KEY", base_url="https://api.deepseek.com/v1" ) # V3.2 was served via deepseek-chat (now routes to V4-Flash) # Use deepseek-v4-flash for new integrations response = client.chat.completions.create( model="deepseek-v4-flash", # V4-Flash supersedes V3.2 messages=[{ "role": "user", "content": "Analyze this codebase and suggest refactors..." }], stream=True ) for chunk in response: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
# Integrated thinking + tool-use — V3.2's key innovation # The model reasons about tool use before calling tools from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) tools = [{ "type": "function", "function": { "name": "search_codebase", "description": "Search files in repository for a pattern", "parameters": { "type": "object", "properties": { "pattern": {"type": "string"}, "path": {"type": "string"} } } } }] # thinking mode: model reasons about search strategy first response = client.chat.completions.create( model="deepseek-v4-pro", # V4-Pro for think+tools messages=[{ "role": "user", "content": "Find all API endpoints that lack authentication" }], tools=tools, extra_body={"thinking": {"type": "enabled", "budget": "high"}} ) # Model thinks about search strategy before calling tools
# Run V3.2 locally via Hugging Face # Note: 671B requires 8×80GB GPU minimum # pip install transformers accelerate from transformers import AutoModelForCausalLM, AutoTokenizer import torch # V3.2 shares architecture with V3.2-Exp # See V3.2-Exp repo for inference code model_id = "deepseek-ai/DeepSeek-V3.2-Exp" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" # Distributes across GPUs ) # Recommended sampling params (official) gen_kwargs = { "temperature": 1.0, # Official recommendation "top_p": 0.95, "max_new_tokens": 8192 } # IMPORTANT: Non-interleaved RoPE layout required in indexer module # See updated inference demo code on HF for correct implementation
# Upgrade from V3.2 to V4 — recommended for new projects # V4 adds: 1M context, Compressed Sparse Attention, stronger benchmarks from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) # Was: "deepseek-chat" (V3.2 era) # Now: "deepseek-v4-flash" (faster, same cost, 1M context) # Was: "deepseek-reasoner" (R1/V3.2 thinking) # Now: "deepseek-v4-pro" with thinking enabled response = client.chat.completions.create( model="deepseek-v4-pro", # ← replaces V3.2 thinking messages=[{"role": "user", "content": "Your prompt..."}], extra_body={"thinking": {"type": "enabled", "budget": "max"}} ) # July 24, 2026: deepseek-chat and deepseek-reasoner aliases retire # Migrate before that date to avoid errors
Getting Started

How to Access V3.2

V3.2 is available via open-source weights and the DeepSeek API. For most production use cases, V4 is now recommended.

1
Try in chat (DeepThink)

Go to chat.deepseek.com. The platform now serves V4-Pro in Expert Mode — the successor with better benchmarks and 1M context. Toggle DeepThink for thinking mode.

2
Download V3.2-Exp weights

The V3.2-Exp weights are on Hugging Face. V3.2 and Speciale share the same architecture — refer to V3.2-Exp inference code. Requires 8×80GB GPUs for BF16.

3
Fix the RoPE layout

Critical: use non-interleaved RoPE layout in the indexer module, and interleaved in the MLA module. The original inference demo had this swapped — check for the November 2025 fix before running.

4
Set sampling parameters

Official recommendation for local deployment: temperature=1.0, top_p=0.95. Do not use lower temperatures — they cause repetition. Note: V3.2 ignores most sampling parameters beyond temp/top_p.

5
Enable thinking + tools

Pass "thinking": {"type": "enabled", "budget": "high"} in extra_body to activate reasoning-before-tool-calling mode. This is V3.2's flagship capability — essential for complex multi-step agent tasks.

6
Migrate to V4 for production

For new production projects, use deepseek-v4-flash or deepseek-v4-pro. V4 adds 1M context, Compressed Sparse Attention, and stronger benchmarks. deepseek-chat alias retires July 24, 2026.

Model Comparison

V3.2 in Context

How V3.2 and Speciale compare against the field at the time of release (December 2025).

Model Reasoning Agent perf Context Open source Think+Tools Competition
DeepSeek-V3.2 ≈ GPT-5 Best open 128K ✓ MIT ✓ Integrated Standard
V3.2-Speciale > GPT-5 No tools 128K+ ✓ MIT ✗ Research 🥇 IMO·IOI·ICPC·CMO
GPT-5 ≈ V3.2 Strong 128K ✗ Closed Silver
Gemini 3.0 Pro Leads Strong 1M ✗ Closed 🥇 IMO Gold
Kimi-K2-Thinking ≈ V3.2 Partial 128K ✗ Closed Partial

Data from official paper (arXiv:2512.02556, Dec 2025). Speciale endpoint expired Dec 15, 2025. V4 supersedes V3.2 for API use from April 2026.

FAQ

Frequently Asked Questions

What is DeepSeek V3.2 and what makes it different from V3.1?+

DeepSeek V3.2 (December 1, 2025) introduces three major improvements over V3.1: (1) DeepSeek Sparse Attention (DSA) — replacing O(n²) attention with O(n·k) using a Lightning Indexer scorer, delivering 2–3× faster inference and 30–40% lower memory for long contexts. (2) Integrated thinking + tool-use — the first open-source model where chain-of-thought reasoning is embedded directly into tool-calling workflows, enabling true reasoning agents. (3) Agentic synthesis pipeline — 1,800+ environments and 85,000+ complex instructions, dramatically improving agent task generalization.

What is DeepSeek Sparse Attention (DSA) and why does it matter?+

Standard transformer attention computes relationships between every pair of tokens — O(n²) complexity that makes long-context inference exponentially more expensive. DSA uses a lightweight Lightning Indexer to score token relevance and select the top-k most relevant tokens before applying attention — transforming the complexity to O(n·k) where k << n. Results: 2–3× faster inference, 30–40% lower memory usage, and no quality degradation on long contexts. This makes 128K context practically affordable on real hardware, and laid the technical foundation for V4's Compressed Sparse Attention (CSA) for 1M context.

What is the V3.2-Speciale variant and can I still access it?+

Speciale is a research variant built by relaxing length constraints — allowing unlimited test-time compute during reasoning. This pushed performance beyond GPT-5 on complex tasks, achieving Gemini-3.0-Pro-level reasoning and gold medals at IMO 2025, IOI 2025, ICPC World Finals 2025, and CMO 2025. The API endpoint (base_url ending in v3.2_speciale_expires_on_20251215) expired December 15, 2025. Speciale weights are on Hugging Face under MIT license — you can run it locally. It does not support tool calling (research use only).

Should I use V3.2 or V4 for new projects?+

For new API projects: use DeepSeek V4. V4-Flash and V4-Pro (April 2026) supersede V3.2 across all production metrics: 1M token context (vs 128K), stronger benchmarks (80.6% SWE-bench for V4-Pro), and Compressed Sparse Attention (CSA) — the evolution of V3.2's DSA. The deepseek-chat alias (which routed to V3.2) retires July 24, 2026. For self-hosted deployment where you specifically need the V3.2 architecture, the MIT-licensed weights remain the best available open-source option for 671B-scale reasoning.

How does the V3.2 thinking + tool-use integration work?+

V3.2 supports three modes via the API: (1) Non-thinking + tools: fast, direct tool calls without internal reasoning. (2) Thinking + tools: the model generates chain-of-thought reasoning about which tools to call, in what order, and with what parameters — before making any API calls. This dramatically improves complex multi-step tasks where naive tool selection cascades into failures. (3) Pure thinking (Speciale only): maximum reasoning depth with no tool-use. Enable thinking mode via extra_body={"thinking": {"type": "enabled", "budget": "high"}} in the API call.

What hardware do I need to run V3.2 locally?+

V3.2 is a 671B MoE model — the same size as the V3 family. Minimum for BF16 inference: 8×80GB GPUs (e.g., 8×H100 80GB or 8×A100 80GB). The Expert Storage Server (ESS) offload architecture helps by keeping inactive expert weights on storage rather than always in GPU memory, making 128K context inference more practical. For quantized inference (FP8 or INT4), requirements drop — community implementations exist for 4×80GB setups. There is no consumer-hardware path for the full 671B model. For self-hosted reasoning on consumer hardware, use the R1 distilled models (7B–70B via Ollama).

What is the RoPE bug in V3.2-Exp and is it fixed?+

A known implementation discrepancy was identified in November 2025: the input tensor to RoPE (Rotary Position Embedding) in the indexer module requires a non-interleaved layout, whereas RoPE in the MLA module expects an interleaved layout. Earlier inference demo code had these swapped, leading to degraded model performance — particularly on long sequences where RoPE position encoding matters most. The fix is in the updated inference demo code on Hugging Face. Before running V3.2-Exp locally, ensure you are using the post-November 2025 version of the inference code from the official repo.

Get Started

Open-source at the frontier.

Sparse attention that scales. Thinking agents that reason. Gold medals at the world's hardest competitions. MIT licensed. Download the weights or access via API today.

🤗 Download V3.2 → Read Paper Try Chat Free