V4 Live · July 24 deadline for legacy model migration

Build Smarter Apps
with the DeepSeek API

OpenAI-compatible. Frontier intelligence. Starting at $0.14 per million tokens - up to 100× cheaper than GPT-5 with automatic context caching and no monthly fees.

Get API Key Free Read the Docs
$0.14 V4-Flash input / 1M tokens
5M Free tokens on signup
1M Context window
90% Cache hit discount
OpenAI Compatible API
deepseek-v4-pro $1.74 input · $3.48 output
deepseek-v4-flash $0.14 input · $0.28 output
Context Cache 90% off on cache hits
Context Window 1M tokens — both models
Max Output 384K tokens
Free Tier 5M tokens — no card required
OpenAI SDK drop-in compatible
Anthropic SDK format supported
Function Calling native tool use
Streaming SSE real-time responses
JSON Mode structured outputs
Legacy retirement July 24, 2026
deepseek-v4-pro $1.74 input · $3.48 output
deepseek-v4-flash $0.14 input · $0.28 output
Context Cache 90% off on cache hits
Context Window 1M tokens — both models
Max Output 384K tokens
Free Tier 5M tokens — no card required
OpenAI SDK drop-in compatible
Anthropic SDK format supported
Function Calling native tool use
Streaming SSE real-time responses
JSON Mode structured outputs
Legacy retirement July 24, 2026
Available Models

Choose Your Model

V4 is the current generation. Legacy aliases retire July 24, 2026 — migrate to explicit model IDs now.

V4 Series
V3 Series
Specialized
FLAGSHIP
DeepSeek-V4-Pro
model: "deepseek-v4-pro"

1.6 trillion parameter MoE model. 80.6% SWE-bench, #1 Codeforces (3,206), gold medal IMO 2025. Best open-weights model available. Three reasoning modes: Non-Think, Think High, Think Max.

1.6T
Params
1M
Context
384K
Max Out

Input: $1.74/1M · Output: $3.48/1M (75% off until May 31)

Use in API →
💨
FAST
DeepSeek-V4-Flash
model: "deepseek-v4-flash"

284B parameter MoE, 13B active. Within 1–2 points of Pro on most benchmarks at 12× lower cost. The recommended default for production workloads and high-volume pipelines.

284B
Params
1M
Context
384K
Max Out

Input: $0.14/1M · Output: $0.28/1M

Use in API →
LEGACY
Legacy Aliases
retiring: 2026-07-24

deepseek-chat → V4-Flash (non-think)

deepseek-reasoner → V4-Flash (think)

Migrate before July 24, 2026 to avoid service disruption. No other changes required.

⚠️
Retire
Jul 24
Deadline
2026
Year

⚠️ Update model name only — no other changes needed

Migration Guide →
🔵
STABLE
DeepSeek-V3.2
model: "deepseek-v3.2" (via OpenRouter)

671B parameter MoE (37B active). The previous generation flagship — still an excellent value at lower cost than V4. 128K context window. Gold medal on IMO and IOI 2025. Strong for general chat, RAG, and summarization where V4's improvements aren't needed.

671B
Params
128K
Context
37B
Active

Input: $0.28/1M · Cache hit: $0.028/1M · Output: $0.42/1M

Use in API →
💻
CODE
DeepSeek-Coder V2
deepseek-coder-v2

Purpose-built for software engineering. 82.6% HumanEval — beats GPT-4o. Understands full repositories, not just snippets. 340+ languages with expert-level Python, Java, C++, and TypeScript support.

82.6%
HumanEval
340+
Languages
2T
Code tokens

Check current pricing at api-docs.deepseek.com

View on GitHub →
🧠
REASONING
DeepSeek-R1
deepseek-r1

Pure RL-trained reasoning model. 97.3% on MATH-500. Develops chain-of-thought organically. Best for competition math, logical proofs, and multi-step scientific reasoning.

97.3%
MATH-500
CoT
Reasoning
64K
Max out

Input: $0.55/1M · Output: $2.19/1M

View Paper →
👁️
VISION
DeepSeek-VL
deepseek-vl

Vision-language model for image understanding, OCR, chart analysis, and document parsing. Process screenshots, diagrams, and photos alongside text for multimodal workflows.

Multi
Modal
OCR
Docs
API
Ready

Check current pricing at api-docs.deepseek.com

View on GitHub →
API Capabilities

Everything You Need

A complete developer platform — not just a model endpoint.

🔄
OpenAI-Compatible

Drop-in replacement for the OpenAI API. Change base_url and api_key — your existing code, SDKs, and integrations work without any other modification.

Zero migration effort
💾
Auto Context Caching

Automatic prompt prefix caching — no configuration needed. Repeated system prompts, documents, and conversation history serve from cache at 90% discount. Saves thousands of dollars at scale.

90% cost reduction
Streaming (SSE)

Server-Sent Events streaming for real-time token delivery. Essential for chat UIs and interactive applications. Set stream=True in any request — no other changes.

Real-time output
🔧
Function Calling

Native tool use with OpenAI-compatible JSON schema definitions. Both V4-Pro and V4-Flash support multi-tool calls, parallel execution, and structured function responses for AI agent workflows.

Agent-ready
📐
JSON Mode

Force structured JSON output with guaranteed schema adherence. Ideal for data extraction, classification pipelines, and any application requiring machine-readable responses.

Structured output
🤔
Thinking Modes

Three reasoning effort levels: Non-Think (instant), Think High (analytical), Think Max (maximum effort). Control latency vs accuracy dynamically per-request without changing models.

Adaptive reasoning
📏
1M Token Context

Process entire codebases, books, legal contracts, or conversation histories in a single request. CSA+HCA architecture makes 1M-token inference economically viable — not just a paper spec.

Longest context
🔬
Anthropic API Format

In addition to OpenAI-compatible endpoints, DeepSeek supports the Anthropic Messages API format. Teams already using Claude can migrate with minimal code changes.

Multi-format
📊
Usage Analytics

Real-time dashboard tracking token consumption, cache hit rates, cost per model, and request latency. Response objects include usage.prompt_cache_hit_tokens for precise billing.

Cost visibility
Quickstart

Up in 5 Minutes

From zero to your first API response — works with any OpenAI-compatible SDK.

1
Create your account

Go to platform.deepseek.com and sign up. You receive 5 million free tokens with no credit card required.

2
Generate your API key

Navigate to API Keys in your dashboard. Keys look like sk-xxxxxxxx.... Store it in your environment — never commit it to source control.

3
Install the OpenAI SDK

Run pip install openai (Python) or npm install openai (Node.js). No DeepSeek-specific library needed — the official OpenAI SDK works.

4
Point to DeepSeek's endpoint

Set base_url="https://api.deepseek.com/v1" and your key. Use model deepseek-v4-flash or deepseek-v4-pro. That's the entire migration.

5
Optimize for caching

Place your system prompt first and keep it consistent across requests. After the first call, it's cached at $0.014/1M — a 90% reduction. No code changes needed.

Python
Node.js
cURL
Stream
# pip install openai
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DEEPSEEK_API_KEY"),
  base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
  model="deepseek-v4-flash", # or deepseek-v4-pro
  messages=[
    {"role": "system",
     "content": "You are a helpful assistant."},
    {"role": "user",
     "content": "What is 1M-token context useful for?"}
  ],
  max_tokens=1024
)

print(response.choices[0].message.content)
# Check cache hit tokens:
print(response.usage.prompt_cache_hit_tokens)
// npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: 'https://api.deepseek.com/v1'
});

const res = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' }
  ],
  max_tokens: 512
});

console.log(res.choices[0].message.content);
// Cache hits
console.log(res.usage?.prompt_cache_hit_tokens);
# Replace $DEEPSEEK_API_KEY with your key
curl https://api.deepseek.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 512
  }'
# Streaming: tokens arrive in real-time
from openai import OpenAI

client = OpenAI(
  api_key="<your-key>",
  base_url="https://api.deepseek.com/v1"
)

stream = client.chat.completions.create(
  model="deepseek-v4-flash",
  messages=[{"role":"user","content":"Count to 10"}],
  stream=True  # ← enable streaming
)

for chunk in stream:
  delta = chunk.choices[0].delta.content
  if delta:
    print(delta, end="", flush=True)
Pricing

Pay Only for What You Use

No monthly subscription. No seat fees. New accounts get 5M tokens free.

Token Pricing
vs Competitors
Cost Calculator
🔥 75% OFF until May 31
Flagship Reasoning
V4-Pro
deepseek-v4-pro
Input (cache miss)$0.435/1M
Input (cache hit)$0.044/1M
Output tokens$0.87/1M
Context window1M tokens
Max output384K
Use V4-Pro
Fast & Affordable
V4-Flash
deepseek-v4-flash
Input (cache miss)$0.14/1M
Input (cache hit)$0.014/1M
Output tokens$0.28/1M
Context window1M tokens
Max output384K
Use V4-Flash
General Purpose
V3.2 (Chat)
deepseek-v3.2
Input (cache miss)$0.28/1M
Input (cache hit)$0.028/1M
Output tokens$0.42/1M
Context window128K
Max output8K
Use V3.2
Free Tier
New Account
No card required
Free tokens5M tokens
Equivalent to~3,500 calls
Models includedAll models
Card required?No
Min top-up$2.00
Start Free →

Prices verified April 28, 2026 · Always confirm at api-docs.deepseek.com/quick_start/pricing

Model Input /1M Output /1M Context Open? Cache?
V4-Flash $0.14 $0.28 1M ✓ MIT Auto
V4-Pro $1.74 $3.48 1M ✓ MIT Auto
V3.2 $0.28 $0.42 128K ✓ MIT Auto
GPT-5.5 $5.00 $20.00 128K Manual
Claude Opus 4.7 $5.00 $25.00 200K Manual
Gemini 3 Pro $1.25 $5.00 1M Manual
GPT-5.4 $10.00 $30.00 128K Manual
Estimate Your Monthly Cost
$0.00
Monthly cost
$0.00
Per call avg
0%
Cache savings
Context Caching

Automatic 90% Discount

DeepSeek caches repeated prompt prefixes for free. No configuration, no TTL settings, no code changes.

─── REQUEST 1 (cache cold) ───────────
system: "You are a Python expert..." ← MISS
docs: [entire codebase, 50K tokens] ← MISS
user: "Review auth.py" ← MISS
Cost: 50,500 tokens × $0.14/1M = $0.0071

─── REQUEST 2 (cache warm) ───────────
system: "You are a Python expert..." ← HIT ✓
docs: [entire codebase, 50K tokens] ← HIT ✓
user: "Review utils.py" ← MISS
Cost: 500 miss + 50K hit × $0.014/1M
Cost: $0.0007 (90% cheaper!) ✓

Output: billed at $0.28/1M regardless
Cache Hit Rate
90%+
Achievable with consistent system prompts
Cost Reduction
Up to 90%
$0.014/1M vs $0.14/1M on cache hits
How to Maximize Hits
1. Put system prompt first — always
2. Front-load static documents
3. Keep variable content (user queries) last
4. Consistent prefixes = automatic caching
Monitoring
Check response.usage.prompt_cache_hit_tokens in every response to track your savings.
Integrations

Works With Your Stack

All OpenAI-compatible tools work out-of-the-box. Just change the base URL.

🐍
Python
pip install openai
🟨
Node.js
npm install openai
🦜
LangChain
pip install langchain
🦙
LlamaIndex
pip install llama-index
🤗
Open Weights
huggingface-cli
🦙
Ollama
ollama pull deepseek
Rate Limits & Errors

What to Expect

Dynamic concurrency limits and standard HTTP error codes. Here's how to handle them.

Rate Limits
Rate limit typeDynamic concurrency
Limit signalHTTP 429
Response headerRetry-After
Best practiceExponential backoff
Strict per-key limitNo hard cap
Peak hour behaviorHigher latency
📊
API Limits
Context window1M tokens
Max output (V4)384K tokens
Max output (V3.2)8K tokens
Min account top-up$2.00
Free credits5M tokens
FormatsOpenAI + Anthropic

Common Error Codes

401
Unauthorized

API key is missing, invalid, or expired. Verify the key in your environment and check it hasn't been revoked from the dashboard.

402
Insufficient Balance

Account balance too low. Add credits at platform.deepseek.com. Minimum top-up is $2.00.

404
Model Not Found

Check the model name string. Use deepseek-v4-flash or deepseek-v4-pro. Legacy aliases retire July 24, 2026.

429
Too Many Requests

Concurrency limit reached. Implement exponential backoff. Check the Retry-After header for the wait duration.

500
Server Error

Internal error on DeepSeek's side. Retry with backoff. Check service status at status.deepseek.com.

503
Service Unavailable

Peak load or maintenance. Most common during peak hours (UTC 08:00–16:00). Consider off-peak scheduling (UTC 16:30–00:30) for batch jobs.

FAQ

Developer Questions

Is the DeepSeek API really OpenAI-compatible? +

Yes — fully. The API uses the same endpoint structure (/v1/chat/completions), the same request/response schema, the same streaming format (SSE), and supports both the OpenAI Python/Node SDKs without modification. Change only base_url to https://api.deepseek.com/v1 and swap your API key. Function calling, JSON mode, system prompts, and multi-turn conversations all work identically. The Anthropic Messages format is also supported as an alternative.

How do I migrate from deepseek-chat / deepseek-reasoner? +

Replace model="deepseek-chat" with model="deepseek-v4-flash" for standard mode, and model="deepseek-reasoner" with model="deepseek-v4-flash" (then enable thinking mode via extra_body if needed). No other changes required. The base URL, authentication, and request format remain identical. Deadline: July 24, 2026, 15:59 UTC — after this, legacy aliases will error with no fallback.

Do I need to configure context caching? +

No — caching is completely automatic. When a request shares the same prefix as a recent request, DeepSeek automatically serves those tokens from cache at 10% of the normal input price (e.g., $0.014/1M instead of $0.14/1M for V4-Flash). There's no setup, no cache keys, and no TTL configuration. To maximize savings: keep your system prompt identical across requests and place static documents before dynamic user content. Monitor cache hits via response.usage.prompt_cache_hit_tokens.

What's the difference between V4-Flash and V4-Pro? +

V4-Flash (284B total, 13B active) is 12× cheaper than V4-Pro and within 1–2 benchmark points on most coding and analysis tasks. V4-Pro (1.6T total, 49B active) is the flagship — it has the highest Codeforces score (3,206), 80.6% SWE-bench, and stronger reasoning on the most complex tasks. Start with Flash; only upgrade to Pro if your evaluation shows a meaningful quality gap for your specific workload. Both support 1M context, 384K output, and all three thinking modes.

Is there a free tier? How do I get API keys? +

Yes. Every new account at platform.deepseek.com receives 5 million free tokens with no credit card required. This is enough for ~3,500 typical API calls. After the free tier, the minimum top-up is $2.00. There are no monthly fees or seat licenses — you pay only for tokens consumed. Granted credits are used first before topped-up balance.

How do rate limits work? +

DeepSeek uses dynamic concurrency-based rate limiting rather than fixed token-per-minute caps. When you reach the limit, you receive an HTTP 429 with a Retry-After header. Implement exponential backoff in your retry logic. The API doesn't enforce strict per-account request limits, but during peak demand periods (roughly UTC 08:00–16:00) you may see higher latency or temporary 503 errors. For batch jobs, scheduling during off-peak hours (UTC 16:30–00:30) can reduce both cost and latency.

Can I use DeepSeek with LangChain or LlamaIndex? +

Yes — any framework that supports the OpenAI API format works. For LangChain: use ChatOpenAI with openai_api_base="https://api.deepseek.com/v1". For LlamaIndex: configure OpenAI with the same base URL override. Claude Code, OpenClaw, and OpenCode are officially integrated with DeepSeek V4 as well.

Is it safe to send sensitive data to the DeepSeek API? +

DeepSeek is a Chinese company and data processed through the public API may be stored on servers subject to Chinese law. For non-sensitive workloads (code review, public data analysis, marketing copy) this is generally fine. For sensitive data (healthcare, financial, legal, PII), recommended approaches are: (1) self-host the open-weight models on your own infrastructure — the MIT license allows this at no cost, or (2) access DeepSeek models via compliant cloud providers (AWS Bedrock, Azure AI) that offer HIPAA/SOC2 compliance and data residency guarantees.

Start Building

Your First API Call
in Under 5 Minutes

5 million free tokens. No credit card. OpenAI-compatible. The most affordable frontier AI API period.

Get Free API Key Read API Docs GitHub ↗