$0.14/1M input tokens — V4-Flash, cheapest frontier API 5M free tokens — new accounts, no credit card required OpenAI compatible — change 2 lines of code, not the whole codebase 90% cache discount — repeated prompts served at $0.014/1M 1M token context — V4 Flash and V4 Pro, no extra charge July 24 deadline — migrate from deepseek-chat before deprecation 75% promo — V4-Pro $0.435/1M until May 31, 2026 $0.14/1M input tokens — V4-Flash, cheapest frontier API 5M free tokens — new accounts, no credit card required OpenAI compatible — change 2 lines of code, not the whole codebase 90% cache discount — repeated prompts served at $0.014/1M 1M token context — V4 Flash and V4 Pro, no extra charge July 24 deadline — migrate from deepseek-chat before deprecation 75% promo — V4-Pro $0.435/1M until May 31, 2026
platform.deepseek.com · Updated May 2026

Build with
DeepSeek API.

The most cost-effective frontier AI API in the world. V4-Flash at $0.14/1M tokens. V4-Pro with Think Max reasoning. OpenAI-compatible. 5M free tokens on signup. Zero monthly fees.

Get API Key Free → API Docs ↗ View Code Examples
$0.14per 1M input tokens
5MFree tokens on signup
1MToken context window
90%Cache hit discount
2 linesFrom OpenAI migration
$0Monthly subscription
Available Models

Every Model, Every Use Case

Four models via the DeepSeek API — from the ultra-affordable Flash workhorse to the frontier Pro reasoning model. All OpenAI-compatible.

⚡ Default Workhorse
V4 · FAST
V4-Flash
deepseek-v4-flash

The recommended starting point for all production workloads. 284B MoE (13B active per token). 79% SWE-bench. 83 tok/s. Best for chat, coding, summaries, RAG pipelines, and high-volume APIs where cost dominates.

$0.14
Input/1M
$0.28
Output/1M
1M
Context
384K
Max output
🧠 Flagship Intelligence
V4 · PRO 🧠
V4-Pro
deepseek-v4-pro

1.6T parameter flagship. 49B active per token. 80.6% SWE-bench. Codeforces #1 (3206). 97.3% MATH-500 with Think Max. Best for complex coding, agentic workflows, research, and tasks where maximum quality matters.

$0.435 promo
Input/1M
$0.87 promo
Output/1M
1M
Context
384K
Max output
LEGACY · STABLE 💬
V3.2 Chat
deepseek-chat (alias → V4-Flash)

The legacy alias currently routes to V4-Flash in non-thinking mode. Stable for existing integrations. Retires July 24, 2026 — migrate to deepseek-v4-flash before that date or calls will error.

V4-Flash
Routes to
Jul 24
Retires
8K
Max output
128K
Legacy ctx
LEGACY · STABLE 🔎
V3.2 Reasoner
deepseek-reasoner (alias → V4-Flash thinking)

The legacy reasoning alias routes to V4-Flash in thinking mode. Retires July 24, 2026 — migrate to deepseek-v4-flash or deepseek-v4-pro with thinking enabled. No silent fallback after retirement.

V4-Flash
Routes to
Jul 24
Retires
64K
CoT tokens
Thinking
Mode
⚠️ Migration deadline — July 24, 2026 15:59 UTC: deepseek-chat and deepseek-reasoner will return errors with no fallback after this timestamp. Update to deepseek-v4-flash or deepseek-v4-pro. Only the model parameter needs to change — base URL and auth stay identical.
Pricing — May 2026

Pay Per Token. Nothing Else.

No monthly subscription. No seat fees. No capacity minimums. You pay for exactly the tokens you use — and automatic context caching slashes repeated costs by 90%.

Model Context Max Output Input (cache hit) Input (cache miss) Output Notes
deepseek-v4-flash 1M tokens 384K $0.014/1M $0.14/1M $0.28/1M Recommended default
deepseek-v4-pro 1M tokens 384K $0.044/1M $0.435/1M75% OFF $0.87/1Mpromo Promo until May 31
deepseek-v4-pro (regular) 1M tokens 384K $0.174/1M $1.74/1M $3.48/1M After May 31
deepseek-chat (legacy) 128K 8K $0.014/1M $0.14/1M $0.28/1M ⚠️ Retires Jul 24
deepseek-reasoner (legacy) 128K 64K CoT $0.014/1M $0.14/1M $0.28/1M ⚠️ Retires Jul 24
36×
Cheaper than GPT-5.5 input
90%
Cache hit discount
5M
Free tokens on signup
$0
Monthly subscription
Context caching is automatic: Unlike other providers, DeepSeek caches repeated prompt prefixes automatically — no code changes required. System prompts, tool definitions, and shared document context that appear at the start of every request are served from cache at $0.014/1M (V4-Flash), a 90% reduction. A production app with a 500-token system prompt on 100K daily calls saves approximately $0.63/day on that prefix alone. At scale, this compounds enormously.
Platform Features

Everything You Need to Build

The DeepSeek Platform provides every capability modern AI applications require — from streaming to function calling to structured output — all on an OpenAI-compatible foundation.

🔌
OpenAI & Anthropic Compatible

Supports both the OpenAI ChatCompletions format and the Anthropic Messages API format. Change base URL and API key — all existing streaming, tool, and JSON code works unchanged.

2-line migration
📡
Streaming SSE

Real-time token streaming over Server-Sent Events. V4-Flash delivers 83 tok/s — fast enough for smooth chat UIs with sub-second time-to-first-token at 1.11s.

83 tok/s
🛠️
Function Calling / Tool Use

Define tools in the same JSON schema format as OpenAI. V4 adds an improved agentic task synthesis pipeline for significantly better real-world tool-use accuracy.

Agent native
📋
JSON Mode / Structured Output

Force valid JSON output with response_format: {"type": "json_object"}. Guaranteed parseable by JSON.parse() with no pre-processing.

Zero parsing errors
💾
Automatic Context Caching

Disk-based caching activates automatically. No SDK changes, no explicit cache keys. Repeated prompt prefixes hit at 90% discount. Stale cache cleared after 30 minutes of inactivity.

90% off, automatic
🧠
Three Thinking Modes

Control reasoning effort per request: Non-Think (instant), Think High (analytical), Think Max (frontier reasoning). Switch without changing models via the extra_body.thinking parameter.

Per-request control
🔍
reasoning_content Field

In thinking modes, the response includes a separate reasoning_content field containing the full chain-of-thought. Use for debugging, education, and verification.

Chain-of-thought visible
📊
Detailed Token Usage

Every API response includes usage.prompt_cache_hit_tokens and prompt_cache_miss_tokens for precise billing analysis.

Transparent billing
🔢
FIM (Fill-In-Middle)

Predict missing code segments from prefix + suffix context using PSM (Prefix-Suffix-Middle) mode. Powers IDE autocomplete integrations for VS Code, Cursor, and JetBrains.

IDE integration
🌍
1M Token Context

Both V4-Flash and V4-Pro support 1 million token context at no additional cost. Process entire codebases, legal documents, or book-length content in a single request.

No extra charge
📦
Batch API

Submit bulk inference jobs for offline processing at reduced cost. Ideal for data annotation, evaluation pipelines, and bulk document analysis where latency is not critical.

Offline workloads
🏦
Pay-As-You-Go Billing

Top up with balance and pay only for tokens used. No monthly minimums. Granted balance (from free credits) is consumed before purchased balance. Usage dashboard in the platform console.

No minimums
API Integration

Integrate in 2 Minutes

DeepSeek is fully OpenAI-compatible. Change base URL and API key. Everything else stays the same.

# pip install openai — no extra packages needed from openai import OpenAI import os # Only these two lines change from OpenAI client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-v4-flash", # or deepseek-v4-pro messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain MoE architecture"} ], stream=False ) print(response.choices[0].message.content) # Check cache savings in token usage usage = response.usage print(f"Cache hit: {usage.prompt_cache_hit_tokens} tokens (90% off)") print(f"Cache miss: {usage.prompt_cache_miss_tokens} tokens")
// npm install openai import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.DEEPSEEK_API_KEY, baseURL: 'https://api.deepseek.com/v1', }); // Streaming — 83 tok/s on V4-Flash const stream = await client.chat.completions.create({ model: 'deepseek-v4-flash', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'What is the capital of France?' } ], stream: true, }); for await (const chunk of stream) { const text = chunk.choices[0]?.delta?.content ?? ''; process.stdout.write(text); }
# cURL — replace YOUR_KEY with your API key curl https://api.deepseek.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_KEY" \ -d '{ "model": "deepseek-v4-flash", "messages": [ { "role": "user", "content": "Hello! What can you do?" } ], "stream": false }' # JSON mode — force structured output curl https://api.deepseek.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_KEY" \ -d '{ "model": "deepseek-v4-flash", "response_format": {"type": "json_object"}, "messages": [ {"role": "user", "content": "Return a JSON with name and age"} ] }'
# Think Max — extended reasoning for hard problems from openai import OpenAI client = OpenAI( api_key="<your-key>", base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-v4-pro", messages=[{ "role": "user", "content": "Solve: find all prime p where p²+14 is prime" }], max_tokens=32768, extra_body={ "thinking": { "type": "enabled", "budget": "max" # Non-Think | high | max } } ) # reasoning_content = internal chain-of-thought chain = response.choices[0].message.reasoning_content answer = response.choices[0].message.content # IMPORTANT: In multi-turn, only include `content` # in conversation history — NOT reasoning_content
# Function calling / tool use from openai import OpenAI client = OpenAI( api_key="<your-key>", base_url="https://api.deepseek.com/v1" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="deepseek-v4-flash", messages=[{"role":"user","content":"Weather in Tokyo?"}], tools=tools, tool_choice="auto" )
API Migration

Migrate Before July 24, 2026

Legacy aliases deepseek-chat and deepseek-reasoner retire at 15:59 UTC on July 24, 2026. No grace period. No silent fallback.

Before: Legacy aliases (retiring)
⚠️ Retire Jul 24 2026
# These stop working July 24, 2026 15:59 UTC from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) # ❌ Will error after retirement response = client.chat.completions.create( model="deepseek-chat", # ← retiring messages=[...] ) # ❌ Also retiring response = client.chat.completions.create( model="deepseek-reasoner", # ← retiring messages=[...] )
After: Explicit V4 model IDs
✓ Current & Stable
# Only the model name changes — nothing else from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) # ✅ V4-Flash: fast, cost-efficient (was deepseek-chat) response = client.chat.completions.create( model="deepseek-v4-flash", # ← updated messages=[...] ) # ✅ V4-Pro: frontier reasoning (was deepseek-reasoner) response = client.chat.completions.create( model="deepseek-v4-pro", # ← updated messages=[...] )

Migration checklist

1
Find all legacy alias references

Search your codebase for deepseek-chat and deepseek-reasoner — not just in main API calls but also in retry logic, fallback handlers, error messages, and configuration files.

2
Choose replacement model

deepseek-chatdeepseek-v4-flash (same pricing, 1M context, faster). deepseek-reasonerdeepseek-v4-flash or deepseek-v4-pro with thinking.type: "enabled".

3
Verify context window handling

V4 supports 1M tokens vs legacy 128K. Older tooling may have context limits hardcoded. Update any max_tokens or context truncation logic to take advantage of the expanded window.

4
Test reasoning_content field

If you were using deepseek-reasoner for multi-turn, ensure your history-building code only includes the content field, not reasoning_content — including the latter in history degrades performance.

5
Deploy and monitor

Track cache hit rates via usage.prompt_cache_hit_tokens. A well-structured prompt with a stable system prefix should see 60–85% cache hit rates, cutting input costs by more than half.

Getting Started

Go from Zero to Production

From account creation to a production API integration in under 5 minutes.

1
Create an account

Go to platform.deepseek.com. Sign up with Email or Google. No credit card required to start — every new account receives 5 million free tokens automatically deposited.

2
Generate an API key

In the platform console, go to API Keys and click Create Key. Copy and store it securely — it won't be shown again. Set it as the environment variable DEEPSEEK_API_KEY.

3
Install the SDK

Install via pip install openai (Python) or npm install openai (Node.js). No DeepSeek-specific SDK needed — the standard OpenAI client works directly.

4
Make your first call

Set base_url="https://api.deepseek.com/v1" and model="deepseek-v4-flash". That's it. If you're migrating from OpenAI, those are the only two changes needed.

5
Optimize with caching

Put your system prompt at the very start of every request. Stable prefixes hit cache at $0.014/1M (90% off). Use usage.prompt_cache_hit_tokens to monitor savings in production.

6
Add top-up for production

Once free tokens are used, top up with balance via the billing dashboard. Pay-as-you-go. No monthly fees. Monitor usage and set alerts in the platform console to control spend.

Provider Comparison

DeepSeek vs the Competition

API pricing and capability comparison across the major frontier AI providers — May 2026.

Provider Input / 1M Output / 1M Context Cache discount Free tier Open source
DeepSeek V4-Flash $0.14 $0.28 1M 90% (auto) 5M tokens ✓ MIT
DeepSeek V4-Pro $1.74 $3.48 1M 90% (auto) 5M tokens ✓ MIT
GPT-5.5 $5.00 $20.00 128K 50% manual Limited ✗ Closed
Claude Opus 4.7 $5.00 $25.00 200K Manual None ✗ Closed
Gemini 3.1 Pro $1.25 $5.00 1M Manual Limited ✗ Closed
Together / Fireworks ~$0.18 ~$0.35 1M Varies $5 min deposit DeepSeek weights

Prices from official provider documentation, May 2026. Third-party providers add ~25% premium for improved SLA and global latency.

FAQ

Frequently Asked Questions

How do I get started with the DeepSeek API?+

Sign up at platform.deepseek.com with email or Google — no credit card required. Every new account receives 5 million free tokens automatically. Install the OpenAI SDK (pip install openai), set base_url="https://api.deepseek.com/v1" and your DEEPSEEK_API_KEY. Use model deepseek-v4-flash for general use. The full API documentation is at api-docs.deepseek.com.

What is the deadline for migrating from deepseek-chat?+

The legacy aliases deepseek-chat and deepseek-reasoner are deprecated and will stop working at July 24, 2026 15:59 UTC — with no grace period and no silent fallback. Calls will return errors. The migration is a one-line change: update model to deepseek-v4-flash (replacing deepseek-chat) or deepseek-v4-pro (replacing deepseek-reasoner). The base URL, authentication, and request format are identical. Search your entire codebase including retry logic, fallback handlers, and configuration files — not just primary API calls.

How does context caching work and how do I maximize savings?+

DeepSeek's caching is fully automatic — no code changes, no explicit cache keys, no SDK flags. Any prompt prefix that has been seen recently will be served from cache at a 90% discount ($0.014/1M vs $0.14/1M for V4-Flash). To maximize cache hits: (1) Put your system prompt and any fixed context at the very top of every request. (2) Keep the variable parts (user query, latest message) at the end. (3) Use usage.prompt_cache_hit_tokens in the response to monitor your actual cache hit rate. A well-designed production app with a 1,000-token system prompt on 100K daily requests can save hundreds of dollars monthly purely from caching.

What's the difference between V4-Flash and V4-Pro for API use?+

V4-Flash: 284B MoE (13B active), $0.14/$0.28 per 1M, 83 tok/s, 79% SWE-bench. The recommended default for most production workloads — 12.4× cheaper per output token than Pro. Start here. V4-Pro: 1.6T MoE (49B active), $1.74/$3.48 per 1M (regular, $0.435/$0.87 promo until May 31), ~40 tok/s, 80.6% SWE-bench, Codeforces #1, IMO Gold with Think Max. Use when quality matters more than cost — complex multi-step agents, hard reasoning, or benchmark-critical tasks. Best practice: benchmark both on your actual task before committing. For most use cases, Flash is sufficient.

How do I use the thinking modes via API?+

Pass the extra_body parameter to control reasoning effort per request. Three modes: (1) {"thinking": {"type": "disabled"}} — Non-Think, instant response, no CoT. (2) {"thinking": {"type": "enabled", "budget": "high"}} — Think High, structured analytical reasoning. (3) {"thinking": {"type": "enabled", "budget": "max"}} — Think Max, full reasoning chain, most tokens, best quality for hard problems. When using thinking modes, the response contains both content (the final answer) and reasoning_content (the chain-of-thought). Important: In multi-turn conversations, only include content in the history — never reasoning_content. Including the reasoning chain in history degrades future responses.

What are the rate limits on the DeepSeek API?+

DeepSeek does not publish fixed public rate limits — limits are applied dynamically based on account status and server load. The free tier (5M tokens) is subject to stricter rate limiting than paid accounts. During peak demand (especially during model launches), the direct API may experience capacity constraints. For mission-critical production systems requiring 99.9%+ availability SLAs, DeepSeek recommends routing through cloud providers like AWS Bedrock or Azure AI, or inference providers like Together AI or Fireworks, which offer dedicated capacity at a modest premium (~25% higher per-token cost).

Does DeepSeek have an off-peak discount?+

Historically, DeepSeek offered 50–75% additional discounts for API calls made during 16:30–00:30 UTC on legacy V3 and R1 models. As of the V4 launch (April 24, 2026), the current official documentation shows the standard pricing table without an explicit off-peak tier listed. The 75% promotional discount on V4-Pro (until May 31, 2026) effectively provides a similar cost reduction for Pro users. Always check the official pricing page at api-docs.deepseek.com/quick_start/pricing for the most current rates — DeepSeek adjusts pricing and promotions regularly.

Can I access DeepSeek via AWS Bedrock or Azure?+

Yes. DeepSeek models are available through multiple cloud providers for enterprises requiring data residency, compliance guarantees, or higher reliability SLAs: AWS Bedrock (HIPAA/SOC2 eligible, data residency in your region), Azure AI (GDPR-compliant EU region available), Together AI and Fireworks (faster US inference, OpenAI-compatible API). Cloud provider pricing is typically 20–30% higher per token than the direct DeepSeek API, but includes enterprise SLAs, compliance certifications, and dedicated capacity. For sensitive data (healthcare, financial, legal), cloud provider routing is strongly recommended over the direct DeepSeek API.

Get Started

Ready to build?

5 million free tokens. No credit card. OpenAI-compatible. The most cost-effective frontier AI API in the world — start in 2 minutes.

Get API Key Free → API Docs ↗ Try Chat Free