DeepSeek-Coder-V2 breaks the barrier of closed-source code models. 90.2% HumanEval, 338 programming languages, 128K context window, and Fill-in-Middle support - all open-source with unrestricted commercial use.
DeepSeek-Coder-V2 ships in two MoE variants - the flagship 236B with GPT-4-Turbo-level performance, and the 16B Lite that punches far above its active parameter count.
The flagship variant. 236B total parameters, 21B active per token - large enough for frontier intelligence, efficient enough for practical deployment. First open-source model to achieve GPT-4-Turbo-level code generation. Tops the Aider code-fixing benchmark at 73.7%, beating all closed models. Pre-trained on 6 trillion additional tokens on top of DeepSeek-V2 base, including the largest open-source code corpus ever assembled.
The practical self-hosting variant. Only 2.4B active parameters - runs on a single RTX 4080 or equivalent 16GB VRAM GPU. Despite its active parameter count, it outperforms much larger dense models. FIM (Fill-in-Middle) is trained specifically in the 16B model for IDE code completion. 86.4% mean FIM score across Python, Java, and JavaScript.
deepseek-v4-flash is the recommended API model. Coder-V2 remains the best dedicated FIM model for IDE completion workflows and self-hosted inference on 16GB VRAM.
When Coder-V2 launched in June 2024, it was the first open-source model to match or beat GPT-4-Turbo on coding tasks. Every score below was verified at launch.
From IDE autocomplete to full repository understanding — every capability a developer toolchain needs.
Generate complete, idiomatic code from natural language descriptions. 90.2% HumanEval, matching GPT-4o at launch. Works across all 338 supported languages with framework and convention awareness.
90.2% HumanEvalInsert code between existing prefix and suffix context using PSM (Prefix-Suffix-Middle) mode. 86.4% mean FIM accuracy across Python, Java, and JavaScript. Powers real IDE autocomplete in VS Code, Cursor, and JetBrains.
IDE autocomplete128K token context window processes entire multi-file projects. Understands cross-file imports, class hierarchies, and module dependencies. RepoBench evaluation confirms strong project-level completion.
128K contextTops the Aider benchmark at 73.7% — #1 across all open and closed models at launch. Understands error semantics, runtime behavior, and logical flaws. Works with Defects4J Java bug corpus and real-world repositories.
#1 Aider ScoreReview PRs, identify security vulnerabilities, performance bottlenecks, and code quality issues. Outputs structured feedback with severity levels and actionable fixes.
Security awareGenerate unit tests, integration tests, and edge-case test suites that match the semantics of what code should do — not just what it does. Understands test frameworks for all major languages.
Full coverageExpanded from 86 languages in Coder V1 to 338 — including Python, Java, C, C++, C#, TypeScript, Rust, Go, Swift, Kotlin, PHP, Ruby, Scala, R, MATLAB, SQL, Bash, Dockerfile, and 320+ more.
4× languagesUniquely strong at the intersection of math and code. 75.7% on MATH benchmark, 94.9% on GSM8K, and 4/30 AIME 2024 — enabling algorithm design, numerical methods, and scientific computing workflows.
MATH: 75.7%Aligned with Group Relative Policy Optimization using compiler feedback and test cases as reward signals. Produces correct, runnable code — not just plausible-looking code that fails at execution.
Compiler-verifiedMaintains near-perfect performance on NIAH (Needle In A Haystack) tests across the full 128K context window. Finds and uses information reliably at any position in long codebases.
Full-window recallReleased publicly under a permissive license allowing both research and unrestricted commercial use. Download weights, fine-tune on your codebase, and deploy in production — no restrictions.
Commercial ✓Drop-in compatible with the OpenAI API format. Same endpoint structure, same streaming support, same function calling schema. Migrate existing integrations with a base URL and key swap.
2-line migrationUnderstanding the architectural decisions that give Coder-V2 its combination of performance and efficiency.
DeepSeek-Coder-V2 is built on the DeepSeekMoE framework — the same foundation as DeepSeek-V2. Two variants share the architecture but differ in scale: 236B total / 21B active and 16B total / 2.4B active.
Mixture-of-Experts routes each token to a subset of specialized expert networks, activating only the most relevant experts per token. This means the 236B model has the representational capacity of a 236B model but pays the inference cost of a ~21B dense model — the key to its cost-efficiency at frontier quality.
The Multi-Head Latent Attention (MLA) mechanism from DeepSeek-V2 compresses the KV cache into low-dimensional latent vectors, reducing memory footprint by up to 90% compared to standard multi-head attention — critical for the 128K context window.
Coder-V2 continues pre-training from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens — the largest code-focused training corpus used in any open-source model to date.
The corpus is an improved version of the original DeepSeek-Coder dataset. A 1B parameter classifier scored mathematical relevance across Common Crawl, public GitHub repositories, and technical documentation. Ablation studies confirmed the new corpus provided measurable gains: +5.5% HumanEval and +4.4% MBPP on 1B models trained for 2T tokens.
The team discovered that initializing from a code model (rather than a general LLM) before math training produced significantly better mathematical reasoning — a key finding that influenced the design of subsequent DeepSeek models.
The 16B Lite model includes dedicated Fill-in-Middle (FIM) training using the PSM (Prefix-Suffix-Middle) mode at a 50% application rate. The 236B model uses only Next-Token-Prediction — FIM is excluded to focus training on the more powerful generation task.
FIM allows the model to predict a missing code segment given both the code that comes before and after it — exactly how IDE autocomplete works in practice. This is distinct from completion (where only prefix context is available) and significantly more useful for real coding workflows.
The 86.4% mean FIM score across Python, Java, and JavaScript confirms the 16B model is well-suited for IDE integration despite its smaller size — making it the practical choice for VS Code, JetBrains, and Cursor plugins.
Language support expanded from 86 in the original Coder to 338 programming languages — a 4× increase driven by the larger and more diverse training corpus. All major general-purpose, scripting, compiled, and domain-specific languages are included.
Purpose-built for software engineering workflows — from individual developers to enterprise teams.
The 16B Lite model's FIM training makes it the best open-source choice for IDE autocomplete plugins. Low memory footprint (16GB VRAM), low latency, and 86.4% FIM accuracy. Integrates with Continue.dev, Cline, and custom VS Code extensions.
Build PR review bots that automatically flag security issues, performance problems, and code quality violations. The 236B model's Aider #1 score means it's the most capable at understanding and suggesting fixes for real-world code.
Migrate Python 2 → 3, upgrade Java 8 → 21, convert jQuery to modern React, or port COBOL to modern languages. The 128K context window processes entire legacy files without truncation.
Automatically generate comprehensive test coverage for existing functions and classes. The model understands testing frameworks (pytest, JUnit, Jest, RSpec) and generates tests that match the semantic intent of the code.
For algorithm design and competitive programming, the 236B model achieves 43.4% LiveCodeBench — matching GPT-4o on fresh competition problems. Suitable for Codeforces, LeetCode, and ICPC-style problem solving.
Generate REST APIs, gRPC services, and SDK wrappers from specifications. Understands OpenAPI/Swagger, Protocol Buffers, and common API design patterns across Python, Go, TypeScript, and Java ecosystems.
Local inference via Ollama, API access, or direct Hugging Face download — choose your setup.
Three ways to run Coder-V2: free chat, API, or self-hosted weights.
Go to chat.deepseek.com and use Expert Mode. The platform now runs V4-Pro under the hood — stronger than Coder-V2 on most tasks. Free, no account required for basic use.
Run ollama run deepseek-coder-v2 for the 16B Lite. Requires 16GB VRAM. Download via ollama.com. Zero API costs, full privacy, works offline.
Download deepseek-ai/DeepSeek-Coder-V2-Instruct (236B) or DeepSeek-Coder-V2-Lite-Instruct (16B). Use device_map="auto" for multi-GPU distribution.
Deploy via vllm serve deepseek-ai/DeepSeek-Coder-V2-Instruct for OpenAI-compatible API with batching, streaming, and load balancing. Best for team or production setups.
Point Continue.dev, Cursor, or Cline at your local Ollama endpoint http://localhost:11434. Use the 16B Lite for FIM completion — it's purpose-trained for this workflow.
The permissive license allows commercial fine-tuning. Use Hugging Face PEFT + LoRA for efficient fine-tuning on the 16B model on your internal code. Ideal for proprietary language/framework specialization.
How Coder-V2 compared at launch (June 2024) against leading closed-source models.
| Model | HumanEval | MBPP+ | MATH | LiveCodeBench | Aider | Open Source |
|---|---|---|---|---|---|---|
| DS-Coder-V2 (236B) | 90.2% | 76.2% | 75.7% | 43.4% | 73.7% ★ | ✓ Permissive |
| DS-Coder-V2 Lite (16B) | 81.1% | 65.4% | 61.8% | ~28% | 44.4% | ✓ Permissive |
| GPT-4o (May 2024) | 90.2% | 74.5% | 76.6% | 43.4% | 72.9% | ✗ Closed |
| GPT-4-Turbo-0409 | 87.1% | 72.9% | 73.4% | 44.6% | ~64% | ✗ Closed |
| Claude 3 Opus | 84.9% | 71.9% | 60.1% | 29.0% | ~55% | ✗ Closed |
| Gemini 1.5 Pro | 84.1% | 70.5% | 67.7% | ~30% | ~50% | ✗ Closed |
| Llama 3-70B | 81.7% | 66.4% | 50.4% | ~22% | ~35% | ✓ Meta |
★ Aider: DS-Coder-V2 ranked #1 across all models at launch · All scores from official paper (arXiv:2406.11931, June 2024)
DeepSeek-Coder-V2 was released June 17, 2024 as a dedicated code model built on DeepSeek-V2, pre-trained on 6T additional code tokens. It achieved GPT-4-Turbo-level performance on HumanEval (90.2%) and was the first open-source model to break 10% on SWE-bench. DeepSeek-V4 (April 2026) supersedes it for most tasks — V4-Flash scores 79.0% on SWE-bench vs Coder-V2's 12.7%, and V4-Pro scores 80.6%. For new API integrations, use V4-Flash or V4-Pro. Coder-V2 remains valuable for: self-hosting on 16GB VRAM (16B Lite), FIM/IDE completion workflows (the 16B has dedicated FIM training), and research use cases where the specific Coder-V2 architecture is relevant.
The 236B variant has 21B active parameters — frontier performance for code generation (90.2% HumanEval, 73.7% Aider), math reasoning (75.7% MATH), and code fixing. Requires 8×80GB GPU servers for full BF16 inference. The 16B Lite has only 2.4B active parameters, runs on a single 16GB VRAM GPU (RTX 4080, A100 40GB, etc.), and uniquely includes FIM (Fill-in-Middle) training for IDE code completion. For self-hosted IDE autocomplete, the 16B Lite at 86.4% mean FIM score is the practical choice. For production code generation and review, the 236B is significantly better.
The easiest path: install Ollama and run ollama run deepseek-coder-v2. This downloads the quantized 16B Lite model. You need at least 16GB RAM (ideally GPU VRAM for speed). For Apple Silicon (M2 Pro, M3, M3 Max), the model runs entirely in unified memory — performance is excellent at 16GB and above. For Windows/Linux with an NVIDIA GPU, any RTX 3080 10GB or better works for quantized inference. The 236B model cannot run on consumer hardware — it requires professional multi-GPU servers.
FIM (Fill-in-Middle) is a training objective where the model predicts a missing segment given both prefix (code before) and suffix (code after) context. This is exactly how IDE code completion works — the cursor is in the middle of code, and the model fills in a specific gap. The 16B Lite is trained with FIM using PSM (Prefix-Suffix-Middle) mode at 50% application rate. The 236B model uses only Next-Token-Prediction — the team chose this because FIM at 236B scale would require significantly more training compute, while the 16B achieves excellent 86.4% FIM accuracy suitable for IDE use. For FIM, always use the Base model variants (not Instruct) as the chat templates can interfere with FIM format.
Yes. DeepSeek-Coder-V2 is released under a permissive license allowing both research and unrestricted commercial use. You can: download and self-host the weights, fine-tune on your proprietary codebase, deploy in commercial products and SaaS applications, and distribute modified versions. There are no royalties and no requirement to release fine-tuned weights. Always check the specific license file in the Hugging Face repository for precise terms — huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct.
Any IDE plugin that supports a custom Ollama or OpenAI-compatible endpoint can use Coder-V2: Continue.dev (VS Code and JetBrains) — the most popular open-source AI code extension, supports local Ollama endpoints directly. Cline (VS Code) — agentic AI coding assistant with full Ollama support. Cursor — configure a custom API endpoint in the model settings. Tabby — open-source self-hosted code completion server that works well with Coder-V2. For all of these, point the endpoint at your local Ollama server (http://localhost:11434) and specify the model name. Use the Base (not Instruct) variant for FIM completion workflows.
90.2% HumanEval. 338 languages. FIM for IDE autocomplete. Commercial license. Download the weights, run it locally, and build without limits.