DeepSeek R1 - the open-source reasoning model that matched OpenAI o1, crushed NVIDIA's stock price, and proved that frontier-level chain-of-thought AI didn't require proprietary infrastructure. Free. MIT licensed. Available to run on your laptop.
Released January 20, 2025, DeepSeek R1 was the first openly published reasoning model to achieve performance comparable to OpenAI's o1 — at 90–95% lower API cost, with full weights published under MIT license.
Most language models answer directly. Reasoning models like R1 think before answering — they generate a chain-of-thought (CoT) before producing a final response. The thinking process is visible in the <think>...</think> tags of every response.
This internal reasoning enables R1 to self-verify, backtrack on wrong paths, try multiple approaches, and produce far more accurate answers on math, coding, and logic problems — the same tasks where direct answering models fail.
What shocked the AI world was how R1 learned to do this: purely through reinforcement learning, with no human-labeled reasoning examples as a starting point. The model discovered chain-of-thought reasoning organically as a strategy to maximize its reward signal.
Before R1, the assumption was that reasoning models required massive supervised datasets of human-written reasoning traces. R1-Zero disproved this: trained via large-scale RL on the base model alone, with a simple reward signal (is the answer correct?), the model spontaneously developed self-verification, reflection, and multi-step reasoning behaviors.
DeepSeek-R1-Zero was followed by DeepSeek-R1, which added cold-start data and multi-stage training to address readability issues — but the core insight remained: reasoning capability can emerge from RL alone, without human-curated CoT data. This finding is now considered one of the most significant AI research results of 2025.
R1's release on January 20, 2025 caused NVIDIA's stock to drop ~17% in a single day — the largest single-day loss in market cap in US stock market history at the time. The reasoning: if world-class AI could be trained for a fraction of the previous cost, demand for high-end GPU clusters might not grow as expected.
Within days of release, R1 climbed to #3 on LMSYS Chatbot Arena and claimed #1 in both coding and math categories. The DeepSeek iOS app became the #1 download in the App Store in 157 countries, and deepseek.com traffic jumped +2,256% in a single month.
Four stages of training - two RL phases and two SFT phases - producing a model that learns to reason from first principles while remaining human-readable.
A small set of carefully curated long chain-of-thought examples provides the model with a readable reasoning format before RL begins. This "cold start" data prevents the chaotic outputs (language mixing, endless repetition) seen in pure-RL R1-Zero, while preserving the model's ability to develop novel reasoning strategies via RL. The model learns the structure of good reasoning — not the content.
Cold-start data · SFTLarge-scale reinforcement learning on math and coding tasks. The reward signal is simple: if the final answer is correct, the response gets a positive reward. No hand-crafted reward shaping. The model explores its own reasoning strategies — developing self-verification, backtracking, and multi-approach problem-solving as emergent behaviors. GRPO (Group Relative Policy Optimization) keeps training stable without a critic model. This stage is where the core reasoning capability develops.
GRPO · Large-scale RL · Emergent CoTUsing the RL-trained model from Stage 2, reasoning data is generated via rejection sampling - producing hundreds of thousands of high-quality CoT examples across math, code, science, and logic. These are combined with curated non-reasoning data (writing, factual Q&A, role-playing) for a second SFT stage that gives the model strong general-purpose capabilities while preserving its reasoning depth. This is where R1's versatility comes from — it's not only a math model.
800K reasoning samples · Rejection samplingA final RL stage applies alignment signals from both reasoning tasks (correctness reward) and general tasks (helpfulness and safety reward). This stage polishes the model's behavior for real-world deployment — making it helpful, harmless, and readable while preserving the deep reasoning capabilities developed in Stage 2. The result is DeepSeek-R1: a model that reasons like a mathematician but communicates like an assistant.
Alignment · Helpfulness · SafetyBenchmark results from the official R1 paper (arXiv:2501.12948). R1 matches or exceeds OpenAI o1 across math, coding, and reasoning — all published data, no cherry-picking.
Six dense models distilled from R1 — fine-tuned on 800K reasoning samples. Run on everything from consumer GPUs to cloud servers. Outperform models 10× their size.
Smallest distilled model — meaningful reasoning on 4GB RAM devices. Edge deployment and mobile applications. Reasoning ability far exceeds what 1.5B parameters should theoretically achieve.
Surpasses QwQ-32B-Preview — a 5× larger model — on AIME. Best price/performance for consumer GPU deployment. Excellent for math tutoring, problem-solving assistants, and coding on 8GB VRAM cards.
Llama-architecture variant — compatible with the vast Llama ecosystem of tools, quantization methods, and deployment frameworks. Best choice when Llama compatibility is required.
Outperforms QwQ-32B-Preview by a large margin despite being less than half the size. Best balance between capability and hardware requirements — professional GPU workstations can run this well.
Sets new SOTA for dense models — comparable to o1-mini. 72.6% AIME 2024, 94.3% MATH-500. Best open-source local reasoning model for developers with a 24–48GB GPU workstation. The recommended model for most local deployments.
Top-tier dense model performance. Best MATH-500 score (94.5%) among all distilled models. Strong coding: 57.5% LiveCodeBench, 1633 Codeforces rating. Best choice for enterprise servers with multi-GPU setup.
R1 excels anywhere human-like multi-step reasoning is needed — far beyond what standard language models can achieve.
Shows its work in visible <think> tags before every answer. The full reasoning chain — including backtracking, self-correction, and alternative approaches — is transparent and auditable.
97.3% MATH-500, 79.8% AIME 2024. Handles AMC, AIME, USAMO, Putnam, and Olympiad-level problems with step-by-step proofs. The DeepThink toggle in chat activates this mode.
IMO-level2029 Codeforces Elo — competitive with top human performers. Designs algorithms from scratch, proves correctness, and analyzes complexity. Strong on dynamic programming, graph theory, and combinatorics.
2029 EloR1 naturally learned to check its own answers during training — without being taught to. It revisits conclusions, tests edge cases, and corrects errors before finalizing, reducing hallucination on factual math tasks.
Emergent behavior71.5% GPQA Diamond — graduate-level science questions requiring expert-level knowledge in physics, chemistry, and biology. R1's structured reasoning outperforms models that rely on pattern matching alone.
PhD-levelTraces execution paths, identifies edge cases, and explains why bugs exist — not just where they are. R1's reasoning mode produces more reliable debugging than standard models on complex algorithmic errors.
Root causeConstructs rigorous mathematical proofs with explicit logical steps valid for academic use. Used in DeepSeekMath-V2's self-verification pipeline to achieve IMO 2025 Gold Medal.
Proof-gradeAdded in the May 2025 update — R1 can now call tools and return structured JSON during reasoning workflows, enabling agentic applications where extended thinking drives tool-use decisions.
v0528 featureThe 1.5B–7B distilled models bring real reasoning to consumer devices. Run on MacBook M-series chips, RTX 3060 8GB, or any modern phone SoC. Reasoning AI without cloud dependency.
4GB VRAMFull MIT license on R1 weights — commercial use, fine-tuning, distillation into other models, modification, redistribution. No royalties, no restrictions. The most permissive license for a frontier reasoning model.
Commercial ✓Full OpenAI ChatCompletions API compatibility. Change base URL and API key — all existing streaming, tool-calling, and structured output code works unchanged. Backward-compatible after R1-0528.
2-line migrationAt launch: $0.55/1M input tokens vs OpenAI o1's $15/1M input — roughly 90–95% cheaper at comparable performance. Democratizes access to reasoning AI for startups and academic labs.
~15× cheaperR1's strengths are clear: any task where step-by-step reasoning, self-verification, or extended thinking produces better outcomes than direct answering.
AMC, AIME, Putnam, and IMO preparation with full step-by-step solutions. 79.8% AIME 2024 — better than most human competitors. Generates worked solutions, alternative approaches, and practice problem sets on demand.
Mathematicians and scientists use R1 to explore conjecture approaches, verify proof sketches, and identify gaps in arguments. The visible CoT makes its assumptions auditable — critical for academic use where correctness matters.
Codeforces 2029 Elo — R1 designs algorithms, proves complexity, and explains data structure choices. Ideal for competitive programming training, coding interview prep, and algorithm education.
For non-obvious bugs — race conditions, algorithmic errors, subtle type issues — R1's step-by-step reasoning traces through execution paths that standard models guess at. Particularly valuable for hard-to-reproduce bugs.
Financial modeling, risk analysis, and quantitative decision-making where reasoning chains need to be auditable. R1's visible CoT lets you check the logic, not just the conclusion — critical for high-stakes decisions.
The distilled 7B–32B models bring o1-class reasoning to air-gapped or private environments. Run R1-distill on a laptop with no API calls, no data leaving your machine — ideal for sensitive research or proprietary codebases.
API, local Ollama, or Hugging Face — all three paths to R1's reasoning capability.
How R1 compares to OpenAI o1, GPT-4o, and Claude 3.5 Sonnet at launch (January 2025).
| Model | MATH-500 | AIME 2024 | GPQA Diamond | LiveCodeBench | Open Source | API Cost /1M out |
|---|---|---|---|---|---|---|
| DeepSeek-R1 | 97.3% | 79.8% | 71.5% | 65.9% | ✓ MIT | ~$5.4 |
| OpenAI o1-1217 | 96.4% | 74.3% | 75.7% | 63.4% | ✗ Closed | $60.00 |
| GPT-4o | 76.6% | 9.3% | 53.6% | 32.9% | ✗ Closed | $10.00 |
| Claude 3.5 Sonnet | 78.3% | 16.0% | 65.0% | 38.9% | ✗ Closed | $15.00 |
Data from DeepSeek R1 paper (arXiv:2501.12948, Jan 2025). Cost estimates approximate at launch. Today: V4-Pro surpasses R1 on most benchmarks at lower cost via API.
Web chat, API, or local — all give you access to R1's reasoning capability.
Go to chat.deepseek.com and toggle the DeepThink button. No account required for basic use. The chat interface now runs V4-Pro's Think Max mode — the successor to R1 with stronger benchmarks.
Run ollama run deepseek-r1:7b for a 7B distilled model on 8GB VRAM. Or deepseek-r1:32b for near-o1-mini quality on a 24GB GPU. Zero API cost. Full privacy.
Use model deepseek-reasoner (retiring July 24, 2026 — migrate to deepseek-v4-pro with thinking enabled). OpenAI-compatible endpoint. R1's reasoning_content field returns the full CoT.
R1 occasionally skips its reasoning phase on perceived "easy" questions. Append "Please start with <think>" or set assistant_prefix: "<think>\n" in API calls to always get full CoT.
In multi-turn conversations, only include the final content field in history — never the reasoning_content. Including the reasoning chain degrades future response quality significantly.
Official recommendation: use temperature 0.6, top-p 0.95 for R1. Values above 0.7 risk incoherent reasoning chains. Values below 0.5 cause repetition. R1 ignores most other sampling parameters.
DeepSeek R1, released January 20, 2025, was the first open-source reasoning model to match OpenAI o1's performance on math, coding, and reasoning benchmarks — published with full weights under MIT license and 90–95% cheaper API pricing. The significance: it proved that frontier reasoning AI didn't require proprietary infrastructure. It was trained for an estimated $5.5M vs GPT-4's ~$100M, triggering questions about the economics of AI investment and causing NVIDIA's stock to drop 17% in a single day — the largest single-day US market cap loss on record at the time.
Standard models (DeepSeek-V3, V4-Flash) answer directly. R1 thinks before answering — generating an internal chain-of-thought in <think>...</think> tags. This visible reasoning process enables R1 to self-verify, backtrack, and try multiple approaches — producing far more accurate results on math, logic, and complex coding tasks. The tradeoff: R1 responses take 15–60+ seconds for hard problems, vs sub-second for standard models.
R1-Zero is trained via pure RL on the base model — no supervised fine-tuning, no human-labeled examples. It demonstrates that reasoning can emerge from RL alone, but has poor readability: language mixing, endless repetition, inconsistent formatting. R1 adds a cold-start SFT phase before RL and a second SFT stage afterward — fixing readability while preserving R1-Zero's core reasoning capability. Both are published open-source for research.
For most developers: R1-Distill-Qwen-32B — comparable to o1-mini at 72.6% AIME 2024, runs on a 24GB VRAM GPU (RTX 4090, A5000), via ollama run deepseek-r1:32b. For laptop use (16GB): Qwen-14B or Qwen-7B. For maximum privacy/portability: Qwen-7B on 8GB VRAM. For enterprise multi-GPU servers: Llama-70B (best math score: 94.5% MATH-500). For Llama-ecosystem compatibility: use the 8B or 70B Llama variants.
R1-0528 (May 28, 2025) is a minor version upgrade with significantly deeper reasoning. Key improvements: AIME 2025 accuracy rose from 70% to 87.5%; average tokens per AIME question nearly doubled (12K → 23K); function calling and JSON output were added (not in original R1); system prompts now supported; the need to manually force <think> was reduced. Fully backward-compatible — no API endpoint changes. The original deepseek-reasoner API endpoint serves R1-0528 after the update.
For most production reasoning tasks in 2026, DeepSeek V4-Pro with Think Max is recommended — it surpasses original R1 on nearly all benchmarks (80.6% SWE-bench vs R1's 49.2%, 97.3% MATH-500 both similar) while supporting 1M token context and tool calling natively. V4-Pro uses the same OpenAI-compatible API. Use R1-0528 when: you specifically want the open research model by name, you're self-hosting the distilled variants, or you need the deepseek-reasoner alias (before July 24, 2026 retirement).
Yes. Download the distilled model weights from Hugging Face and run locally via Ollama, vLLM, or Hugging Face Transformers. The MIT license permits completely offline deployment with no data sent externally. Recommended setup for private use: ollama run deepseek-r1:32b on a workstation with 24GB VRAM for near-o1-mini quality with full data privacy. The 7B and 14B models run on gaming GPUs (8–12GB VRAM) for lighter use cases.
DeepThink is free at chat.deepseek.com. Distilled R1 models run on a laptop. Weights are MIT licensed. The reasoning revolution is open source.