DeepSeek Coder V2 - Open-Source Code Intelligence | 90.2% HumanEval

Model Variants

Two Sizes. One Standard.

DeepSeek-Coder-V2 ships in two MoE variants - the flagship 236B with GPT-4-Turbo-level performance, and the 16B Lite that punches far above its active parameter count.

🏆 Flagship · GPT-4-Turbo Level

236B · FULL MODEL ⚙️

DeepSeek-Coder-V2

deepseek-ai/DeepSeek-Coder-V2-Instruct

The flagship variant. 236B total parameters, 21B active per token - large enough for frontier intelligence, efficient enough for practical deployment. First open-source model to achieve GPT-4-Turbo-level code generation. Tops the Aider code-fixing benchmark at 73.7%, beating all closed models. Pre-trained on 6 trillion additional tokens on top of DeepSeek-V2 base, including the largest open-source code corpus ever assembled.

236B

Total params

21B

Active params

128K

Context

Extra tokens

Benchmark scores

HumanEval

90.2%

MBPP+

76.2%

LiveCodeBench

43.4%

Aider (fix)

73.7%

MATH

75.7%

GSM8K

94.9%

Download on Hugging Face →

⚡ Lite · 16GB VRAM

16B · LITE MODEL 💨

DeepSeek-Coder-V2-Lite

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

The practical self-hosting variant. Only 2.4B active parameters - runs on a single RTX 4080 or equivalent 16GB VRAM GPU. Despite its active parameter count, it outperforms much larger dense models. FIM (Fill-in-Middle) is trained specifically in the 16B model for IDE code completion. 86.4% mean FIM score across Python, Java, and JavaScript.

16B

Total params

2.4B

Active params

128K

Context

FIM

Trained

Benchmark scores

HumanEval

81.1%

FIM (avg)

86.4%

MBPP+

65.4%

Aider (fix)

44.4%

MATH

61.8%

GSM8K

86.4%

Download 16B Lite →

Note on supersession: DeepSeek-Coder-V2 has been superseded for most tasks by DeepSeek-V4-Pro and V4-Flash (April 2026), which achieve 80.6% and 79.0% on SWE-bench respectively — far beyond Coder-V2's 12.7%. For new production coding projects, deepseek-v4-flash is the recommended API model. Coder-V2 remains the best dedicated FIM model for IDE completion workflows and self-hosted inference on 16GB VRAM.

Performance Benchmarks

The Numbers That Made Headlines

When Coder-V2 launched in June 2024, it was the first open-source model to match or beat GPT-4-Turbo on coding tasks. Every score below was verified at launch.

HumanEval (Python)

164 Python tasks, unit-test verified, zero-shot

DS-Coder-V2 #1 open90.2%

GPT-4o

90.2%

DS-Coder-V2

90.2%

GPT-4-Turbo

87.1%

Claude 3 Opus

84.9%

Gemini 1.5 Pro

84.1%

Llama 3-70B

81.7%

MBPP+ (EvalPlus pipeline)

New state-of-the-art for open-source at launch

SOTA open-source76.2%

DS-Coder-V2

76.2%

GPT-4o

74.5%

Claude 3 Opus

71.9%

Gemini 1.5 Pro

70.5%

LiveCodeBench (Dec 2023–Jun 2024)

Real-world competitive programming — fresh problems

DS-Coder-V2 ties GPT-4o

GPT-4-Turbo

44.6%

DS-Coder-V2

43.4%

GPT-4o

43.4%

Claude 3 Opus

29.0%

Aider Benchmark (Code Editing)

Real-world code repair and modification tasks

DS-Coder-V2 #1 all models!

DS-Coder-V2

73.7%

GPT-4o

72.9%

GPT-4-Turbo

~64%

DS-Coder-V2 Lite

44.4%

Defects4J (Bug Detection & Fix)

Real Java bugs from open-source repositories

Coder-V2 21.0%

DS-Coder-V2

21.0%

DS-Coder-V2 Lite

9.2%

SWE-bench (First open-source to break 10%)

GitHub issue resolution — first open-source milestone

First open-source >10%

GPT-4-Turbo

18.7%

DS-Coder-V2

12.7%

Claude 3 Opus

11.7%

Llama 3-70B

2.7%

MATH Benchmark

Competition-level math problems

DS-Coder-V2 75.7%GPT-4o 76.6%

GPT-4o

76.6%

DS-Coder-V2

75.7%

GPT-4-Turbo

73.4%

Gemini 1.5 Pro

67.7%

Claude 3 Opus

60.1%

GSM8K (Grade School Math)

Multi-step arithmetic word problems

DS-Coder-V2 94.9%

GPT-4o

95.8%

DS-Coder-V2

94.9%

GPT-4-Turbo

93.7%

Claude 3 Opus

95.0%

AIME 2024

Top US math competition — 30 problems

DS-Coder-V2: 4/30

GPT-4-Turbo-0409

3/30

DS-Coder-V2

4/30

Claude 3 Opus

2/30

MultiPL-E — Avg across 8 languages

Python, Java, C++, PHP, Swift, C#, TypeScript, JS

DS-Coder-V2: 75.1%GPT-4o 76.4%

GPT-4o

76.4%

DS-Coder-V2

75.1%

GPT-4-Turbo

72.2%

Claude 3 Opus

65.9%

CRUXEval (Code Understanding)

Reasoning about code execution and output

DS-Coder-V2 (I-COT)

70.0%

DS-Coder-V2 (O-COT)

75.1%

Core Capabilities

Built for Real Code Intelligence

From IDE autocomplete to full repository understanding — every capability a developer toolchain needs.

📝

Code Generation

Generate complete, idiomatic code from natural language descriptions. 90.2% HumanEval, matching GPT-4o at launch. Works across all 338 supported languages with framework and convention awareness.

90.2% HumanEval

🔄

Fill-in-Middle (FIM)

Insert code between existing prefix and suffix context using PSM (Prefix-Suffix-Middle) mode. 86.4% mean FIM accuracy across Python, Java, and JavaScript. Powers real IDE autocomplete in VS Code, Cursor, and JetBrains.

IDE autocomplete

🔍

Repository-Level Context

128K token context window processes entire multi-file projects. Understands cross-file imports, class hierarchies, and module dependencies. RepoBench evaluation confirms strong project-level completion.

128K context

🐛

Bug Detection & Fixing

Tops the Aider benchmark at 73.7% — #1 across all open and closed models at launch. Understands error semantics, runtime behavior, and logical flaws. Works with Defects4J Java bug corpus and real-world repositories.

#1 Aider Score

🔎

Code Review

Review PRs, identify security vulnerabilities, performance bottlenecks, and code quality issues. Outputs structured feedback with severity levels and actionable fixes.

Security aware

🧪

Test Generation

Generate unit tests, integration tests, and edge-case test suites that match the semantics of what code should do — not just what it does. Understands test frameworks for all major languages.

Full coverage

🌐

338 Languages

Expanded from 86 languages in Coder V1 to 338 — including Python, Java, C, C++, C#, TypeScript, Rust, Go, Swift, Kotlin, PHP, Ruby, Scala, R, MATLAB, SQL, Bash, Dockerfile, and 320+ more.

4× languages

📐

Math + Code Integration

Uniquely strong at the intersection of math and code. 75.7% on MATH benchmark, 94.9% on GSM8K, and 4/30 AIME 2024 — enabling algorithm design, numerical methods, and scientific computing workflows.

MATH: 75.7%

🔒

GRPO Alignment

Aligned with Group Relative Policy Optimization using compiler feedback and test cases as reward signals. Produces correct, runnable code — not just plausible-looking code that fails at execution.

Compiler-verified

📊

Needle-in-Haystack Proven

Maintains near-perfect performance on NIAH (Needle In A Haystack) tests across the full 128K context window. Finds and uses information reliably at any position in long codebases.

Full-window recall

🔓

Permissive License

Released publicly under a permissive license allowing both research and unrestricted commercial use. Download weights, fine-tune on your codebase, and deploy in production — no restrictions.

Commercial ✓

🔌

OpenAI API Compatible

Drop-in compatible with the OpenAI API format. Same endpoint structure, same streaming support, same function calling schema. Migrate existing integrations with a base URL and key swap.

2-line migration

Technical Architecture

How Coder-V2 Is Built

Understanding the architectural decisions that give Coder-V2 its combination of performance and efficiency.

🔀MoE Architecture

DeepSeek-Coder-V2 is built on the DeepSeekMoE framework — the same foundation as DeepSeek-V2. Two variants share the architecture but differ in scale: 236B total / 21B active and 16B total / 2.4B active.

Mixture-of-Experts routes each token to a subset of specialized expert networks, activating only the most relevant experts per token. This means the 236B model has the representational capacity of a 236B model but pays the inference cost of a ~21B dense model — the key to its cost-efficiency at frontier quality.

The Multi-Head Latent Attention (MLA) mechanism from DeepSeek-V2 compresses the KV cache into low-dimensional latent vectors, reducing memory footprint by up to 90% compared to standard multi-head attention — critical for the 128K context window.

📚Training Corpus

Coder-V2 continues pre-training from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens — the largest code-focused training corpus used in any open-source model to date.

The corpus is an improved version of the original DeepSeek-Coder dataset. A 1B parameter classifier scored mathematical relevance across Common Crawl, public GitHub repositories, and technical documentation. Ablation studies confirmed the new corpus provided measurable gains: +5.5% HumanEval and +4.4% MBPP on 1B models trained for 2T tokens.

The team discovered that initializing from a code model (rather than a general LLM) before math training produced significantly better mathematical reasoning — a key finding that influenced the design of subsequent DeepSeek models.

Training: DeepSeek-V2 checkpoint
→ + 6T code + math tokens
→ Context: 4K → 16K → 128K (YaRN)
→ FIM (PSM, 50% rate for 16B)
→ GRPO alignment (compiler feedback)

🔄Fill-in-Middle (FIM) Training

The 16B Lite model includes dedicated Fill-in-Middle (FIM) training using the PSM (Prefix-Suffix-Middle) mode at a 50% application rate. The 236B model uses only Next-Token-Prediction — FIM is excluded to focus training on the more powerful generation task.

FIM allows the model to predict a missing code segment given both the code that comes before and after it — exactly how IDE autocomplete works in practice. This is distinct from completion (where only prefix context is available) and significantly more useful for real coding workflows.

# PSM FIM format used in training:
<｜fim_prefix｜>[prefix code]
<｜fim_suffix｜>[suffix code]
<｜fim_middle｜>[target completion]

The 86.4% mean FIM score across Python, Java, and JavaScript confirms the 16B model is well-suited for IDE integration despite its smaller size — making it the practical choice for VS Code, JetBrains, and Cursor plugins.

🌐338 Languages Supported

Language support expanded from 86 in the original Coder to 338 programming languages — a 4× increase driven by the larger and more diverse training corpus. All major general-purpose, scripting, compiled, and domain-specific languages are included.

Python

JavaScript

TypeScript

Java

C++

Rust

Swift

Kotlin

Ruby

PHP

Scala

MATLAB

SQL

Bash

Haskell

Lua

Perl

Fortran

VHDL

+ 314 more

Use Cases

What Developers Build with Coder-V2

Purpose-built for software engineering workflows — from individual developers to enterprise teams.

IDE Code Completion

The 16B Lite model's FIM training makes it the best open-source choice for IDE autocomplete plugins. Low memory footprint (16GB VRAM), low latency, and 86.4% FIM accuracy. Integrates with Continue.dev, Cline, and custom VS Code extensions.

Automated Code Review

Build PR review bots that automatically flag security issues, performance problems, and code quality violations. The 236B model's Aider #1 score means it's the most capable at understanding and suggesting fixes for real-world code.

Legacy Code Modernization

Migrate Python 2 → 3, upgrade Java 8 → 21, convert jQuery to modern React, or port COBOL to modern languages. The 128K context window processes entire legacy files without truncation.

Test Suite Generation

Automatically generate comprehensive test coverage for existing functions and classes. The model understands testing frameworks (pytest, JUnit, Jest, RSpec) and generates tests that match the semantic intent of the code.

Competitive Programming

For algorithm design and competitive programming, the 236B model achieves 43.4% LiveCodeBench — matching GPT-4o on fresh competition problems. Suitable for Codeforces, LeetCode, and ICPC-style problem solving.

API & SDK Development

Generate REST APIs, gRPC services, and SDK wrappers from specifications. Understands OpenAPI/Swagger, Protocol Buffers, and common API design patterns across Python, Go, TypeScript, and Java ecosystems.

Getting Started

Run Coder-V2 in Minutes

Local inference via Ollama, API access, or direct Hugging Face download — choose your setup.

# Run DeepSeek-Coder-V2 locally with Ollama # Install Ollama: curl -fsSL https://ollama.com/install.sh | sh # 16B Lite — 16GB VRAM recommended ollama run deepseek-coder-v2 # Or use via Python with Ollama API import requests response = requests.post("http://localhost:11434/api/generate", json={ "model": "deepseek-coder-v2", "prompt": "Write a Python function to check if a number is prime", "stream": False }) print(response.json()["response"])

# Via DeepSeek API — OpenAI-compatible # Note: for new projects use deepseek-v4-flash (better) # Coder-V2 via API uses the v4 endpoints now from openai import OpenAI import os client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-v4-flash", # V4-Flash supersedes Coder-V2 on API messages=[ {"role": "system", "content": "You are an expert software engineer."}, {"role": "user", "content": "Implement a LRU cache in Python"} ] ) print(response.choices[0].message.content)

# Hugging Face Transformers — 236B full model # Requires 80GB VRAM per GPU × 8 GPUs for BF16 # pip install transformers torch accelerate from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "deepseek-ai/DeepSeek-Coder-V2-Instruct" # For 16B Lite: "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" # Distributes across GPUs ) messages = [ {"role": "user", "content": "Write a binary search in Rust"} ] input_ids = tokenizer.apply_chat_template( messages, return_tensors="pt", add_generation_prompt=True ).to(model.device) out = model.generate(input_ids, max_new_tokens=1024) print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))

# FIM code completion (16B Lite only) from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained( "deepseek-ai/DeepSeek-Coder-V2-Lite-Base" ) model = AutoModelForCausalLM.from_pretrained( "deepseek-ai/DeepSeek-Coder-V2-Lite-Base", device_map="auto" ) # PSM (Prefix-Suffix-Middle) FIM format prefix = "def quicksort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[len(arr) // 2]\n" suffix = "\n return quicksort(left) + middle + quicksort(right)" fim_input = ( "<｜fim_prefix｜>" + prefix + "<｜fim_suffix｜>" + suffix + "<｜fim_middle｜>" ) inputs = tokenizer(fim_input, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Getting Started

Choose Your Setup

Three ways to run Coder-V2: free chat, API, or self-hosted weights.

Free web chat

Go to chat.deepseek.com and use Expert Mode. The platform now runs V4-Pro under the hood — stronger than Coder-V2 on most tasks. Free, no account required for basic use.

Ollama (local, easiest)

Run ollama run deepseek-coder-v2 for the 16B Lite. Requires 16GB VRAM. Download via ollama.com. Zero API costs, full privacy, works offline.

Hugging Face (full model)

Download deepseek-ai/DeepSeek-Coder-V2-Instruct (236B) or DeepSeek-Coder-V2-Lite-Instruct (16B). Use device_map="auto" for multi-GPU distribution.

vLLM for production

Deploy via vllm serve deepseek-ai/DeepSeek-Coder-V2-Instruct for OpenAI-compatible API with batching, streaming, and load balancing. Best for team or production setups.

IDE plugin integration

Point Continue.dev, Cursor, or Cline at your local Ollama endpoint http://localhost:11434. Use the 16B Lite for FIM completion — it's purpose-trained for this workflow.

Fine-tune on your codebase

The permissive license allows commercial fine-tuning. Use Hugging Face PEFT + LoRA for efficient fine-tuning on the 16B model on your internal code. Ideal for proprietary language/framework specialization.

Model	HumanEval	MBPP+	MATH	LiveCodeBench	Aider	Open Source
DS-Coder-V2 (236B)	90.2%	76.2%	75.7%	43.4%	73.7% ★	✓ Permissive
DS-Coder-V2 Lite (16B)	81.1%	65.4%	61.8%	~28%	44.4%	✓ Permissive
GPT-4o (May 2024)	90.2%	74.5%	76.6%	43.4%	72.9%	✗ Closed
GPT-4-Turbo-0409	87.1%	72.9%	73.4%	44.6%	~64%	✗ Closed
Claude 3 Opus	84.9%	71.9%	60.1%	29.0%	~55%	✗ Closed
Gemini 1.5 Pro	84.1%	70.5%	67.7%	~30%	~50%	✗ Closed
Llama 3-70B	81.7%	66.4%	50.4%	~22%	~35%	✓ Meta

FAQ

Frequently Asked Questions

What is DeepSeek-Coder-V2 and how is it different from V4?+

DeepSeek-Coder-V2 was released June 17, 2024 as a dedicated code model built on DeepSeek-V2, pre-trained on 6T additional code tokens. It achieved GPT-4-Turbo-level performance on HumanEval (90.2%) and was the first open-source model to break 10% on SWE-bench. DeepSeek-V4 (April 2026) supersedes it for most tasks — V4-Flash scores 79.0% on SWE-bench vs Coder-V2's 12.7%, and V4-Pro scores 80.6%. For new API integrations, use V4-Flash or V4-Pro. Coder-V2 remains valuable for: self-hosting on 16GB VRAM (16B Lite), FIM/IDE completion workflows (the 16B has dedicated FIM training), and research use cases where the specific Coder-V2 architecture is relevant.

What's the difference between the 236B and 16B Lite variants?+

The 236B variant has 21B active parameters — frontier performance for code generation (90.2% HumanEval, 73.7% Aider), math reasoning (75.7% MATH), and code fixing. Requires 8×80GB GPU servers for full BF16 inference. The 16B Lite has only 2.4B active parameters, runs on a single 16GB VRAM GPU (RTX 4080, A100 40GB, etc.), and uniquely includes FIM (Fill-in-Middle) training for IDE code completion. For self-hosted IDE autocomplete, the 16B Lite at 86.4% mean FIM score is the practical choice. For production code generation and review, the 236B is significantly better.

How do I run Coder-V2 on my laptop?+

The easiest path: install Ollama and run ollama run deepseek-coder-v2. This downloads the quantized 16B Lite model. You need at least 16GB RAM (ideally GPU VRAM for speed). For Apple Silicon (M2 Pro, M3, M3 Max), the model runs entirely in unified memory — performance is excellent at 16GB and above. For Windows/Linux with an NVIDIA GPU, any RTX 3080 10GB or better works for quantized inference. The 236B model cannot run on consumer hardware — it requires professional multi-GPU servers.

What is FIM and why does only the 16B support it?+

FIM (Fill-in-Middle) is a training objective where the model predicts a missing segment given both prefix (code before) and suffix (code after) context. This is exactly how IDE code completion works — the cursor is in the middle of code, and the model fills in a specific gap. The 16B Lite is trained with FIM using PSM (Prefix-Suffix-Middle) mode at 50% application rate. The 236B model uses only Next-Token-Prediction — the team chose this because FIM at 236B scale would require significantly more training compute, while the 16B achieves excellent 86.4% FIM accuracy suitable for IDE use. For FIM, always use the Base model variants (not Instruct) as the chat templates can interfere with FIM format.

Can I use Coder-V2 commercially?+

Yes. DeepSeek-Coder-V2 is released under a permissive license allowing both research and unrestricted commercial use. You can: download and self-host the weights, fine-tune on your proprietary codebase, deploy in commercial products and SaaS applications, and distribute modified versions. There are no royalties and no requirement to release fine-tuned weights. Always check the specific license file in the Hugging Face repository for precise terms — huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct.

Which IDE integrations support Coder-V2?+

Any IDE plugin that supports a custom Ollama or OpenAI-compatible endpoint can use Coder-V2: Continue.dev (VS Code and JetBrains) — the most popular open-source AI code extension, supports local Ollama endpoints directly. Cline (VS Code) — agentic AI coding assistant with full Ollama support. Cursor — configure a custom API endpoint in the model settings. Tabby — open-source self-hosted code completion server that works well with Coder-V2. For all of these, point the endpoint at your local Ollama server (http://localhost:11434) and specify the model name. Use the Base (not Instruct) variant for FIM completion workflows.

Open-Source Code
Intelligence.

Two Sizes. One Standard.

The Numbers That Made Headlines

Built for Real Code Intelligence

How Coder-V2 Is Built

What Developers Build with Coder-V2

Run Coder-V2 in Minutes

Choose Your Setup

Coder-V2 vs the Field

Frequently Asked Questions

Open-source code intelligence.

Open-Source CodeIntelligence.

Two Sizes. One Standard.

The Numbers That Made Headlines

Built for Real Code Intelligence

How Coder-V2 Is Built

What Developers Build with Coder-V2

Run Coder-V2 in Minutes

Choose Your Setup

Coder-V2 vs the Field

Frequently Asked Questions

Open-source code intelligence.

Open-Source Code
Intelligence.