DeepSeek R1 | Advanced Reasoning AI for Complex Problems

What is DeepSeek R1

The First Open-Source Model to Match o1

Released January 20, 2025, DeepSeek R1 was the first openly published reasoning model to achieve performance comparable to OpenAI's o1 — at 90–95% lower API cost, with full weights published under MIT license.

🧠What makes R1 different

Most language models answer directly. Reasoning models like R1 think before answering — they generate a chain-of-thought (CoT) before producing a final response. The thinking process is visible in the <think>...</think> tags of every response.

This internal reasoning enables R1 to self-verify, backtrack on wrong paths, try multiple approaches, and produce far more accurate answers on math, coding, and logic problems — the same tasks where direct answering models fail.

What shocked the AI world was how R1 learned to do this: purely through reinforcement learning, with no human-labeled reasoning examples as a starting point. The model discovered chain-of-thought reasoning organically as a strategy to maximize its reward signal.

💡The RL breakthrough

Before R1, the assumption was that reasoning models required massive supervised datasets of human-written reasoning traces. R1-Zero disproved this: trained via large-scale RL on the base model alone, with a simple reward signal (is the answer correct?), the model spontaneously developed self-verification, reflection, and multi-step reasoning behaviors.

DeepSeek-R1-Zero was followed by DeepSeek-R1, which added cold-start data and multi-stage training to address readability issues — but the core insight remained: reasoning capability can emerge from RL alone, without human-curated CoT data. This finding is now considered one of the most significant AI research results of 2025.

💭What R1 thinking looks like

Internal reasoning — visible in <think> tags

Let me think step by step... The problem asks for all primes p where p² + 14 is also prime. First, let me check small primes: - p=2: 4+14=18=2×9, not prime ✗ - p=3: 9+14=23, prime! ✓ - p=5: 25+14=39=3×13, not prime ✗ - p=7: 49+14=63=9×7, not prime ✗ For odd primes p>3, p≡1 or 2 (mod 3): - If p≡1(mod 3): p²≡1(mod 3), p²+14≡15≡0(mod 3) → divisible by 3 - If p≡2(mod 3): p²≡4≡1(mod 3), p²+14≡15≡0(mod 3) → divisible by 3 So only p=3 works. The answer is p=3.

The final answer: p = 3 is the only prime satisfying the condition.

📊Market impact

R1's release on January 20, 2025 caused NVIDIA's stock to drop ~17% in a single day — the largest single-day loss in market cap in US stock market history at the time. The reasoning: if world-class AI could be trained for a fraction of the previous cost, demand for high-end GPU clusters might not grow as expected.

Within days of release, R1 climbed to #3 on LMSYS Chatbot Arena and claimed #1 in both coding and math categories. The DeepSeek iOS app became the #1 download in the App Store in 157 countries, and deepseek.com traffic jumped +2,256% in a single month.

Training Pipeline

How R1 Was Built

Four stages of training - two RL phases and two SFT phases - producing a model that learns to reason from first principles while remaining human-readable.

Stage 1

Cold-Start SFT - Seeding the Reasoning Format

A small set of carefully curated long chain-of-thought examples provides the model with a readable reasoning format before RL begins. This "cold start" data prevents the chaotic outputs (language mixing, endless repetition) seen in pure-RL R1-Zero, while preserving the model's ability to develop novel reasoning strategies via RL. The model learns the structure of good reasoning — not the content.

Cold-start data · SFT

Stage 2

Reasoning-Focused RL - Discovering Better Patterns

Large-scale reinforcement learning on math and coding tasks. The reward signal is simple: if the final answer is correct, the response gets a positive reward. No hand-crafted reward shaping. The model explores its own reasoning strategies — developing self-verification, backtracking, and multi-approach problem-solving as emergent behaviors. GRPO (Group Relative Policy Optimization) keeps training stable without a critic model. This stage is where the core reasoning capability develops.

GRPO · Large-scale RL · Emergent CoT

Stage 3

Rejection Sampling SFT - Expanding Beyond Reasoning

Using the RL-trained model from Stage 2, reasoning data is generated via rejection sampling - producing hundreds of thousands of high-quality CoT examples across math, code, science, and logic. These are combined with curated non-reasoning data (writing, factual Q&A, role-playing) for a second SFT stage that gives the model strong general-purpose capabilities while preserving its reasoning depth. This is where R1's versatility comes from — it's not only a math model.

800K reasoning samples · Rejection sampling

Stage 4

Final RL — Human Preference Alignment

A final RL stage applies alignment signals from both reasoning tasks (correctness reward) and general tasks (helpfulness and safety reward). This stage polishes the model's behavior for real-world deployment — making it helpful, harmless, and readable while preserving the deep reasoning capabilities developed in Stage 2. The result is DeepSeek-R1: a model that reasons like a mathematician but communicates like an assistant.

Alignment · Helpfulness · Safety

R1-Zero vs R1: DeepSeek-R1-Zero skips Stage 1 entirely — pure RL from the base model. It demonstrates powerful reasoning but with poor readability (language mixing, repetition). R1 adds Stages 1 and 3 to produce a deployment-ready model. Both are open-sourced for the research community.

Performance Benchmarks

o1-Level Performance. Open Source.

Benchmark results from the official R1 paper (arXiv:2501.12948). R1 matches or exceeds OpenAI o1 across math, coding, and reasoning — all published data, no cherry-picking.

MATH-500

Curated competition math benchmark

R1 ties o1-121797.3%

DeepSeek-R1

97.3%

OpenAI o1-1217

96.4%

Claude 3.5 S.

78.3%

GPT-4o

76.6%

AIME 2024 Pass@1

American Invitational Math Exam

R1: 79.8%o1: 74.3%

DeepSeek-R1

79.8%

OpenAI o1-1217

74.3%

GPT-4o

9.3%

AIME 2025 (R1-0528)

Updated version — May 28, 2025

R1-0528: 87.5%

R1-0528

87.5%

R1 (original)

70.0%

Codeforces Rating (Elo)

Competitive programming — human professional level

R1: 2029 Elo

DeepSeek-R1

2029

OpenAI o1-1217

2061

LiveCodeBench (Pass@1)

Fresh competitive coding problems

R1: 65.9%o1: 63.4%

DeepSeek-R1

65.9%

OpenAI o1-1217

63.4%

Claude 3.5 S.

38.9%

GPT-4o

32.9%

SWE-bench Verified

Real GitHub issue resolution

R1: 49.2%

DeepSeek-R1

49.2%

OpenAI o1-1217

48.9%

Claude 3.5 S.

49.0%

GPQA Diamond

Graduate-level science (PhD-level)

R1: 71.5%

DeepSeek-R1

71.5%

OpenAI o1-1217

75.7%

Claude 3.5 S.

65.0%

GPT-4o

53.6%

MMLU

57-subject general knowledge

DeepSeek-R1

90.8%

OpenAI o1-1217

91.8%

GPT-4o

85.7%

R1-0528 — Released May 28, 2025

Minor version upgrade with significantly deeper reasoning compute

Approaches o3 & Gemini 2.5 Pro

R1-0528 leverages increased computational resources and algorithmic optimization during post-training. Average tokens per AIME question nearly doubled (12K → 23K), reflecting deeper reasoning chains. AIME 2025 accuracy jumped from 70% to 87.5%. The update also adds function calling and JSON output support — enabling structured reasoning in production applications. System prompts now supported. Fully backward-compatible — no API endpoint changes needed.

AIME 2025 R1

70.0%

AIME 2025 R1-0528

87.5%

Avg tokens/AIME Q

12K

Avg tokens/AIME Q v2

23K

R1-0528-Qwen3-8B distillation

Knowledge from R1-0528 distilled into Qwen3-8B base

Distilling R1-0528's chain-of-thought into Qwen3-8B base produces a model that outperforms the original Qwen3-8B by +10% on AIME 2024 and matches Qwen3-235B-thinking — demonstrating that R1's reasoning chains are exceptionally high-quality training signal for smaller models.

Distilled Model Family

Frontier Reasoning on a Laptop

Six dense models distilled from R1 — fine-tuned on 800K reasoning samples. Run on everything from consumer GPUs to cloud servers. Outperform models 10× their size.

QWEN · 1.5B

R1-Distill-Qwen-1.5B

Base: Qwen2.5-Math-1.5B · Apache 2.0

Smallest distilled model — meaningful reasoning on 4GB RAM devices. Edge deployment and mobile applications. Reasoning ability far exceeds what 1.5B parameters should theoretically achieve.

AIME 2024

28.9%

MATH-500

83.9%

VRAM: ~4GB · Ollama: deepseek-r1:1.5b

QWEN · 7B

R1-Distill-Qwen-7B

Base: Qwen2.5-Math-7B · Apache 2.0

Surpasses QwQ-32B-Preview — a 5× larger model — on AIME. Best price/performance for consumer GPU deployment. Excellent for math tutoring, problem-solving assistants, and coding on 8GB VRAM cards.

AIME 2024

55.5%

MATH-500

92.8%

VRAM: ~6GB · Ollama: deepseek-r1:7b

LLAMA · 8B

R1-Distill-Llama-8B

Base: Llama-3.1-8B · Llama 3.1 License

Llama-architecture variant — compatible with the vast Llama ecosystem of tools, quantization methods, and deployment frameworks. Best choice when Llama compatibility is required.

AIME 2024

50.4%

MATH-500

89.1%

VRAM: ~8GB · Ollama: deepseek-r1:8b

QWEN · 14B

R1-Distill-Qwen-14B

Base: Qwen2.5-14B · Apache 2.0

Outperforms QwQ-32B-Preview by a large margin despite being less than half the size. Best balance between capability and hardware requirements — professional GPU workstations can run this well.

AIME 2024

69.7%

MATH-500

93.9%

VRAM: ~10GB · Ollama: deepseek-r1:14b

QWEN · 32B · ★ BEST LOCAL

R1-Distill-Qwen-32B

Base: Qwen2.5-32B · Apache 2.0

Sets new SOTA for dense models — comparable to o1-mini. 72.6% AIME 2024, 94.3% MATH-500. Best open-source local reasoning model for developers with a 24–48GB GPU workstation. The recommended model for most local deployments.

AIME 2024

72.6%

MATH-500

94.3%

LiveCodeBench

57.2%

VRAM: ~20GB · Ollama: deepseek-r1:32b

LLAMA · 70B

R1-Distill-Llama-70B

Base: Llama-3.3-70B-Instruct · Llama 3.3 License

Top-tier dense model performance. Best MATH-500 score (94.5%) among all distilled models. Strong coding: 57.5% LiveCodeBench, 1633 Codeforces rating. Best choice for enterprise servers with multi-GPU setup.

AIME 2024

70.0%

MATH-500

94.5%

LiveCodeBench

57.5%

VRAM: ~48GB · Ollama: deepseek-r1:70b

Distillation note: All six models were fine-tuned using 800K reasoning samples curated from DeepSeek-R1 outputs. MIT license allows further distillation — you can train your own smaller models on R1's reasoning traces. The R1-0528 release produced R1-0528-Qwen3-8B, achieving SOTA for 8B models on AIME 2024.

Core Capabilities

What R1 Can Do

R1 excels anywhere human-like multi-step reasoning is needed — far beyond what standard language models can achieve.

🔗

Chain-of-Thought Reasoning

Shows its work in visible <think> tags before every answer. The full reasoning chain — including backtracking, self-correction, and alternative approaches — is transparent and auditable.

Full transparency

🏆

Competition Mathematics

97.3% MATH-500, 79.8% AIME 2024. Handles AMC, AIME, USAMO, Putnam, and Olympiad-level problems with step-by-step proofs. The DeepThink toggle in chat activates this mode.

IMO-level

💻

Algorithm Design

2029 Codeforces Elo — competitive with top human performers. Designs algorithms from scratch, proves correctness, and analyzes complexity. Strong on dynamic programming, graph theory, and combinatorics.

2029 Elo

✅

Self-Verification

R1 naturally learned to check its own answers during training — without being taught to. It revisits conclusions, tests edge cases, and corrects errors before finalizing, reducing hallucination on factual math tasks.

Emergent behavior

🔬

Scientific Reasoning

71.5% GPQA Diamond — graduate-level science questions requiring expert-level knowledge in physics, chemistry, and biology. R1's structured reasoning outperforms models that rely on pattern matching alone.

PhD-level

🐛

Code Debugging

Traces execution paths, identifies edge cases, and explains why bugs exist — not just where they are. R1's reasoning mode produces more reliable debugging than standard models on complex algorithmic errors.

Root cause

📐

Formal Proof Generation

Constructs rigorous mathematical proofs with explicit logical steps valid for academic use. Used in DeepSeekMath-V2's self-verification pipeline to achieve IMO 2025 Gold Medal.

Proof-grade

🔧

Function Calling (R1-0528)

Added in the May 2025 update — R1 can now call tools and return structured JSON during reasoning workflows, enabling agentic applications where extended thinking drives tool-use decisions.

v0528 feature

📱

Edge Deployment (Distilled)

The 1.5B–7B distilled models bring real reasoning to consumer devices. Run on MacBook M-series chips, RTX 3060 8GB, or any modern phone SoC. Reasoning AI without cloud dependency.

4GB VRAM

🔓

MIT License

Full MIT license on R1 weights — commercial use, fine-tuning, distillation into other models, modification, redistribution. No royalties, no restrictions. The most permissive license for a frontier reasoning model.

Commercial ✓

🔌

OpenAI API Compatible

Full OpenAI ChatCompletions API compatibility. Change base URL and API key — all existing streaming, tool-calling, and structured output code works unchanged. Backward-compatible after R1-0528.

2-line migration

💸

90–95% Cheaper Than o1

At launch: $0.55/1M input tokens vs OpenAI o1's $15/1M input — roughly 90–95% cheaper at comparable performance. Democratizes access to reasoning AI for startups and academic labs.

~15× cheaper

Use Cases

Where R1 Reasoning Matters Most

R1's strengths are clear: any task where step-by-step reasoning, self-verification, or extended thinking produces better outcomes than direct answering.

Competition Math Training

AMC, AIME, Putnam, and IMO preparation with full step-by-step solutions. 79.8% AIME 2024 — better than most human competitors. Generates worked solutions, alternative approaches, and practice problem sets on demand.

Research Problem Exploration

Mathematicians and scientists use R1 to explore conjecture approaches, verify proof sketches, and identify gaps in arguments. The visible CoT makes its assumptions auditable — critical for academic use where correctness matters.

Algorithm Competition Prep

Codeforces 2029 Elo — R1 designs algorithms, proves complexity, and explains data structure choices. Ideal for competitive programming training, coding interview prep, and algorithm education.

Complex Code Debugging

For non-obvious bugs — race conditions, algorithmic errors, subtle type issues — R1's step-by-step reasoning traces through execution paths that standard models guess at. Particularly valuable for hard-to-reproduce bugs.

Multi-Step Decision Analysis

Financial modeling, risk analysis, and quantitative decision-making where reasoning chains need to be auditable. R1's visible CoT lets you check the logic, not just the conclusion — critical for high-stakes decisions.

Local Private Reasoning

The distilled 7B–32B models bring o1-class reasoning to air-gapped or private environments. Run R1-distill on a laptop with no API calls, no data leaving your machine — ideal for sensitive research or proprietary codebases.

API & Local Integration

Access R1 Your Way

API, local Ollama, or Hugging Face — all three paths to R1's reasoning capability.

# DeepSeek R1 via API — deepseek-reasoner alias # Note: deepseek-reasoner retires July 24, 2026 # Use deepseek-v4-flash or deepseek-v4-pro for new projects from openai import OpenAI client = OpenAI( api_key="YOUR_DEEPSEEK_API_KEY", base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-reasoner", # or deepseek-v4-pro w/ thinking messages=[{ "role": "user", "content": "Find all prime p where p²+14 is also prime." }] ) # reasoning_content = the <think>...</think> chain chain = response.choices[0].message.reasoning_content answer = response.choices[0].message.content print(f"Chain: {len(chain)} chars") print(f"Answer: {answer}") # CRITICAL for multi-turn: ONLY include .content # in conversation history — not reasoning_content

# Run R1 distilled models locally with Ollama # Install: curl -fsSL https://ollama.com/install.sh | sh # Choose your size (match to available VRAM) ollama run deepseek-r1:1.5b # 4GB RAM — laptop ollama run deepseek-r1:7b # 6GB VRAM — gaming GPU ollama run deepseek-r1:8b # 8GB VRAM — Llama arch ollama run deepseek-r1:14b # 10GB VRAM ollama run deepseek-r1:32b # 20GB VRAM — recommended ollama run deepseek-r1:70b # 48GB VRAM — multi-GPU # Use via Python (OpenAI-compatible Ollama API) from openai import OpenAI client = OpenAI( base_url="http://localhost:11434/v1", api_key="ollama" ) r = client.chat.completions.create( model="deepseek-r1:32b", messages=[{"role":"user","content":"Solve AIME 2024 #1"}] ) print(r.choices[0].message.content)

# Force <think> tag — prevents R1 from skipping CoT # R1 occasionally outputs <think>\n\n</think> (empty), # which degrades reasoning quality on hard problems. # Solution: force the model to start reasoning explicitly. from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-reasoner", messages=[{ "role": "user", "content": "Prove that √2 is irrational." }], # Force the thinking chain to start extra_body={ "assistant_prefix": "<think>\n" } ) # Note: temperature 0.5-0.7 recommended for R1 # R1 ignores most sampling params other than temp/top_p

# Function calling — R1-0528 only (May 28 2025+) from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) # R1-0528 can reason about when/how to use tools response = client.chat.completions.create( model="deepseek-reasoner", messages=[{ "role": "user", "content": "What's the weather in Tokyo right now?" }], tools=[{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} } } } }] ) # R1 reasons about tool use before calling print(response.choices[0].message.tool_calls)

Model	MATH-500	AIME 2024	GPQA Diamond	LiveCodeBench	Open Source	API Cost /1M out
DeepSeek-R1	97.3%	79.8%	71.5%	65.9%	✓ MIT	~$5.4
OpenAI o1-1217	96.4%	74.3%	75.7%	63.4%	✗ Closed	$60.00
GPT-4o	76.6%	9.3%	53.6%	32.9%	✗ Closed	$10.00
Claude 3.5 Sonnet	78.3%	16.0%	65.0%	38.9%	✗ Closed	$15.00

Getting Started

Three Ways to Use R1

Web chat, API, or local — all give you access to R1's reasoning capability.

Free web chat (DeepThink)

Go to chat.deepseek.com and toggle the DeepThink button. No account required for basic use. The chat interface now runs V4-Pro's Think Max mode — the successor to R1 with stronger benchmarks.

Ollama (instant local)

Run ollama run deepseek-r1:7b for a 7B distilled model on 8GB VRAM. Or deepseek-r1:32b for near-o1-mini quality on a 24GB GPU. Zero API cost. Full privacy.

DeepSeek API

Use model deepseek-reasoner (retiring July 24, 2026 — migrate to deepseek-v4-pro with thinking enabled). OpenAI-compatible endpoint. R1's reasoning_content field returns the full CoT.

Force <think> on hard problems

R1 occasionally skips its reasoning phase on perceived "easy" questions. Append "Please start with <think>" or set assistant_prefix: "<think>\n" in API calls to always get full CoT.

Multi-turn: only include content

In multi-turn conversations, only include the final content field in history — never the reasoning_content. Including the reasoning chain degrades future response quality significantly.

Temperature 0.5–0.7

Official recommendation: use temperature 0.6, top-p 0.95 for R1. Values above 0.7 risk incoherent reasoning chains. Values below 0.5 cause repetition. R1 ignores most other sampling parameters.

FAQ

Frequently Asked Questions

What is DeepSeek R1 and why was it so significant?+

DeepSeek R1, released January 20, 2025, was the first open-source reasoning model to match OpenAI o1's performance on math, coding, and reasoning benchmarks — published with full weights under MIT license and 90–95% cheaper API pricing. The significance: it proved that frontier reasoning AI didn't require proprietary infrastructure. It was trained for an estimated $5.5M vs GPT-4's ~$100M, triggering questions about the economics of AI investment and causing NVIDIA's stock to drop 17% in a single day — the largest single-day US market cap loss on record at the time.

How is R1 different from standard DeepSeek chat models?+

Standard models (DeepSeek-V3, V4-Flash) answer directly. R1 thinks before answering — generating an internal chain-of-thought in <think>...</think> tags. This visible reasoning process enables R1 to self-verify, backtrack, and try multiple approaches — producing far more accurate results on math, logic, and complex coding tasks. The tradeoff: R1 responses take 15–60+ seconds for hard problems, vs sub-second for standard models.

What is the difference between R1 and R1-Zero?+

R1-Zero is trained via pure RL on the base model — no supervised fine-tuning, no human-labeled examples. It demonstrates that reasoning can emerge from RL alone, but has poor readability: language mixing, endless repetition, inconsistent formatting. R1 adds a cold-start SFT phase before RL and a second SFT stage afterward — fixing readability while preserving R1-Zero's core reasoning capability. Both are published open-source for research.

Which distilled model should I use?+

For most developers: R1-Distill-Qwen-32B — comparable to o1-mini at 72.6% AIME 2024, runs on a 24GB VRAM GPU (RTX 4090, A5000), via ollama run deepseek-r1:32b. For laptop use (16GB): Qwen-14B or Qwen-7B. For maximum privacy/portability: Qwen-7B on 8GB VRAM. For enterprise multi-GPU servers: Llama-70B (best math score: 94.5% MATH-500). For Llama-ecosystem compatibility: use the 8B or 70B Llama variants.

What is R1-0528 and how is it different from the original R1?+

R1-0528 (May 28, 2025) is a minor version upgrade with significantly deeper reasoning. Key improvements: AIME 2025 accuracy rose from 70% to 87.5%; average tokens per AIME question nearly doubled (12K → 23K); function calling and JSON output were added (not in original R1); system prompts now supported; the need to manually force <think> was reduced. Fully backward-compatible — no API endpoint changes. The original deepseek-reasoner API endpoint serves R1-0528 after the update.

Should I use R1 or V4-Pro for reasoning tasks today?+

For most production reasoning tasks in 2026, DeepSeek V4-Pro with Think Max is recommended — it surpasses original R1 on nearly all benchmarks (80.6% SWE-bench vs R1's 49.2%, 97.3% MATH-500 both similar) while supporting 1M token context and tool calling natively. V4-Pro uses the same OpenAI-compatible API. Use R1-0528 when: you specifically want the open research model by name, you're self-hosting the distilled variants, or you need the deepseek-reasoner alias (before July 24, 2026 retirement).

Can I run R1 completely offline and privately?+

Yes. Download the distilled model weights from Hugging Face and run locally via Ollama, vLLM, or Hugging Face Transformers. The MIT license permits completely offline deployment with no data sent externally. Recommended setup for private use: ollama run deepseek-r1:32b on a workstation with 24GB VRAM for near-o1-mini quality with full data privacy. The 7B and 14B models run on gaming GPUs (8–12GB VRAM) for lighter use cases.

The Model That
Changed Everything.

The First Open-Source Model to Match o1

How R1 Was Built

o1-Level Performance. Open Source.

Frontier Reasoning on a Laptop

What R1 Can Do

Where R1 Reasoning Matters Most

Access R1 Your Way

R1 vs the Reasoning Field

Three Ways to Use R1

Frequently Asked Questions

Start thinking deeply.

The Model ThatChanged Everything.

The First Open-Source Model to Match o1

How R1 Was Built

o1-Level Performance. Open Source.

Frontier Reasoning on a Laptop

What R1 Can Do

Where R1 Reasoning Matters Most

Access R1 Your Way

R1 vs the Reasoning Field

Three Ways to Use R1

Frequently Asked Questions

Start thinking deeply.

The Model That
Changed Everything.