DeepSeek Platform - API, Developer Tools & Model Access 2026

Available Models

Every Model, Every Use Case

Four models via the DeepSeek API — from the ultra-affordable Flash workhorse to the frontier Pro reasoning model. All OpenAI-compatible.

⚡ Default Workhorse

V4 · FAST ⚡

V4-Flash

deepseek-v4-flash

The recommended starting point for all production workloads. 284B MoE (13B active per token). 79% SWE-bench. 83 tok/s. Best for chat, coding, summaries, RAG pipelines, and high-volume APIs where cost dominates.

$0.14

Input/1M

$0.28

Output/1M

Context

384K

Max output

🧠 Flagship Intelligence

V4 · PRO 🧠

V4-Pro

deepseek-v4-pro

1.6T parameter flagship. 49B active per token. 80.6% SWE-bench. Codeforces #1 (3206). 97.3% MATH-500 with Think Max. Best for complex coding, agentic workflows, research, and tasks where maximum quality matters.

$0.435 promo

Input/1M

$0.87 promo

Output/1M

Context

384K

Max output

LEGACY · STABLE 💬

V3.2 Chat

deepseek-chat (alias → V4-Flash)

The legacy alias currently routes to V4-Flash in non-thinking mode. Stable for existing integrations. Retires July 24, 2026 — migrate to deepseek-v4-flash before that date or calls will error.

V4-Flash

Routes to

Jul 24

Retires

Max output

128K

Legacy ctx

LEGACY · STABLE 🔎

V3.2 Reasoner

deepseek-reasoner (alias → V4-Flash thinking)

The legacy reasoning alias routes to V4-Flash in thinking mode. Retires July 24, 2026 — migrate to deepseek-v4-flash or deepseek-v4-pro with thinking enabled. No silent fallback after retirement.

V4-Flash

Routes to

Jul 24

Retires

64K

CoT tokens

Thinking

Mode

⚠️ Migration deadline — July 24, 2026 15:59 UTC: deepseek-chat and deepseek-reasoner will return errors with no fallback after this timestamp. Update to deepseek-v4-flash or deepseek-v4-pro. Only the model parameter needs to change — base URL and auth stay identical.

Model	Context	Max Output	Input (cache hit)	Input (cache miss)	Output	Notes
deepseek-v4-flash	1M tokens	384K	$0.014/1M	$0.14/1M	$0.28/1M	Recommended default
deepseek-v4-pro	1M tokens	384K	$0.044/1M	$0.435/1M75% OFF	$0.87/1Mpromo	Promo until May 31
deepseek-v4-pro (regular)	1M tokens	384K	$0.174/1M	$1.74/1M	$3.48/1M	After May 31
deepseek-chat (legacy)	128K	8K	$0.014/1M	$0.14/1M	$0.28/1M	⚠️ Retires Jul 24
deepseek-reasoner (legacy)	128K	64K CoT	$0.014/1M	$0.14/1M	$0.28/1M	⚠️ Retires Jul 24

Platform Features

Everything You Need to Build

The DeepSeek Platform provides every capability modern AI applications require — from streaming to function calling to structured output — all on an OpenAI-compatible foundation.

🔌

OpenAI & Anthropic Compatible

Supports both the OpenAI ChatCompletions format and the Anthropic Messages API format. Change base URL and API key — all existing streaming, tool, and JSON code works unchanged.

2-line migration

📡

Streaming SSE

Real-time token streaming over Server-Sent Events. V4-Flash delivers 83 tok/s — fast enough for smooth chat UIs with sub-second time-to-first-token at 1.11s.

83 tok/s

🛠️

Function Calling / Tool Use

Define tools in the same JSON schema format as OpenAI. V4 adds an improved agentic task synthesis pipeline for significantly better real-world tool-use accuracy.

Agent native

📋

JSON Mode / Structured Output

Force valid JSON output with response_format: {"type": "json_object"}. Guaranteed parseable by JSON.parse() with no pre-processing.

Zero parsing errors

💾

Automatic Context Caching

Disk-based caching activates automatically. No SDK changes, no explicit cache keys. Repeated prompt prefixes hit at 90% discount. Stale cache cleared after 30 minutes of inactivity.

90% off, automatic

🧠

Three Thinking Modes

Control reasoning effort per request: Non-Think (instant), Think High (analytical), Think Max (frontier reasoning). Switch without changing models via the extra_body.thinking parameter.

Per-request control

🔍

reasoning_content Field

In thinking modes, the response includes a separate reasoning_content field containing the full chain-of-thought. Use for debugging, education, and verification.

Chain-of-thought visible

📊

Detailed Token Usage

Every API response includes usage.prompt_cache_hit_tokens and prompt_cache_miss_tokens for precise billing analysis.

Transparent billing

🔢

FIM (Fill-In-Middle)

Predict missing code segments from prefix + suffix context using PSM (Prefix-Suffix-Middle) mode. Powers IDE autocomplete integrations for VS Code, Cursor, and JetBrains.

IDE integration

🌍

1M Token Context

Both V4-Flash and V4-Pro support 1 million token context at no additional cost. Process entire codebases, legal documents, or book-length content in a single request.

No extra charge

📦

Batch API

Submit bulk inference jobs for offline processing at reduced cost. Ideal for data annotation, evaluation pipelines, and bulk document analysis where latency is not critical.

Offline workloads

🏦

Pay-As-You-Go Billing

Top up with balance and pay only for tokens used. No monthly minimums. Granted balance (from free credits) is consumed before purchased balance. Usage dashboard in the platform console.

No minimums

API Integration

Integrate in 2 Minutes

DeepSeek is fully OpenAI-compatible. Change base URL and API key. Everything else stays the same.

# pip install openai — no extra packages needed from openai import OpenAI import os # Only these two lines change from OpenAI client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-v4-flash", # or deepseek-v4-pro messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain MoE architecture"} ], stream=False ) print(response.choices[0].message.content) # Check cache savings in token usage usage = response.usage print(f"Cache hit: {usage.prompt_cache_hit_tokens} tokens (90% off)") print(f"Cache miss: {usage.prompt_cache_miss_tokens} tokens")

// npm install openai import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.DEEPSEEK_API_KEY, baseURL: 'https://api.deepseek.com/v1', }); // Streaming — 83 tok/s on V4-Flash const stream = await client.chat.completions.create({ model: 'deepseek-v4-flash', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'What is the capital of France?' } ], stream: true, }); for await (const chunk of stream) { const text = chunk.choices[0]?.delta?.content ?? ''; process.stdout.write(text); }

# cURL — replace YOUR_KEY with your API key curl https://api.deepseek.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_KEY" \ -d '{ "model": "deepseek-v4-flash", "messages": [ { "role": "user", "content": "Hello! What can you do?" } ], "stream": false }' # JSON mode — force structured output curl https://api.deepseek.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_KEY" \ -d '{ "model": "deepseek-v4-flash", "response_format": {"type": "json_object"}, "messages": [ {"role": "user", "content": "Return a JSON with name and age"} ] }'

# Think Max — extended reasoning for hard problems from openai import OpenAI client = OpenAI( api_key="<your-key>", base_url="https://api.deepseek.com/v1" ) response = client.chat.completions.create( model="deepseek-v4-pro", messages=[{ "role": "user", "content": "Solve: find all prime p where p²+14 is prime" }], max_tokens=32768, extra_body={ "thinking": { "type": "enabled", "budget": "max" # Non-Think | high | max } } ) # reasoning_content = internal chain-of-thought chain = response.choices[0].message.reasoning_content answer = response.choices[0].message.content # IMPORTANT: In multi-turn, only include `content` # in conversation history — NOT reasoning_content

# Function calling / tool use from openai import OpenAI client = OpenAI( api_key="<your-key>", base_url="https://api.deepseek.com/v1" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="deepseek-v4-flash", messages=[{"role":"user","content":"Weather in Tokyo?"}], tools=tools, tool_choice="auto" )

API Migration

Migrate Before July 24, 2026

Legacy aliases deepseek-chat and deepseek-reasoner retire at 15:59 UTC on July 24, 2026. No grace period. No silent fallback.

Before: Legacy aliases (retiring)

⚠️ Retire Jul 24 2026

# These stop working July 24, 2026 15:59 UTC from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) # ❌ Will error after retirement response = client.chat.completions.create( model="deepseek-chat", # ← retiring messages=[...] ) # ❌ Also retiring response = client.chat.completions.create( model="deepseek-reasoner", # ← retiring messages=[...] )

After: Explicit V4 model IDs

✓ Current & Stable

# Only the model name changes — nothing else from openai import OpenAI client = OpenAI( api_key="YOUR_KEY", base_url="https://api.deepseek.com/v1" ) # ✅ V4-Flash: fast, cost-efficient (was deepseek-chat) response = client.chat.completions.create( model="deepseek-v4-flash", # ← updated messages=[...] ) # ✅ V4-Pro: frontier reasoning (was deepseek-reasoner) response = client.chat.completions.create( model="deepseek-v4-pro", # ← updated messages=[...] )

Migration checklist

Find all legacy alias references

Search your codebase for deepseek-chat and deepseek-reasoner — not just in main API calls but also in retry logic, fallback handlers, error messages, and configuration files.

Choose replacement model

deepseek-chat → deepseek-v4-flash (same pricing, 1M context, faster). deepseek-reasoner → deepseek-v4-flash or deepseek-v4-pro with thinking.type: "enabled".

Verify context window handling

V4 supports 1M tokens vs legacy 128K. Older tooling may have context limits hardcoded. Update any max_tokens or context truncation logic to take advantage of the expanded window.

Test reasoning_content field

If you were using deepseek-reasoner for multi-turn, ensure your history-building code only includes the content field, not reasoning_content — including the latter in history degrades performance.

Deploy and monitor

Track cache hit rates via usage.prompt_cache_hit_tokens. A well-structured prompt with a stable system prefix should see 60–85% cache hit rates, cutting input costs by more than half.

Getting Started

Go from Zero to Production

From account creation to a production API integration in under 5 minutes.

Create an account

Go to platform.deepseek.com. Sign up with Email or Google. No credit card required to start — every new account receives 5 million free tokens automatically deposited.

Generate an API key

In the platform console, go to API Keys and click Create Key. Copy and store it securely — it won't be shown again. Set it as the environment variable DEEPSEEK_API_KEY.

Install the SDK

Install via pip install openai (Python) or npm install openai (Node.js). No DeepSeek-specific SDK needed — the standard OpenAI client works directly.

Make your first call

Set base_url="https://api.deepseek.com/v1" and model="deepseek-v4-flash". That's it. If you're migrating from OpenAI, those are the only two changes needed.

Optimize with caching

Put your system prompt at the very start of every request. Stable prefixes hit cache at $0.014/1M (90% off). Use usage.prompt_cache_hit_tokens to monitor savings in production.

Add top-up for production

Once free tokens are used, top up with balance via the billing dashboard. Pay-as-you-go. No monthly fees. Monitor usage and set alerts in the platform console to control spend.

Provider	Input / 1M	Output / 1M	Context	Cache discount	Free tier	Open source
DeepSeek V4-Flash	$0.14	$0.28	1M	90% (auto)	5M tokens	✓ MIT
DeepSeek V4-Pro	$1.74	$3.48	1M	90% (auto)	5M tokens	✓ MIT
GPT-5.5	$5.00	$20.00	128K	50% manual	Limited	✗ Closed
Claude Opus 4.7	$5.00	$25.00	200K	Manual	None	✗ Closed
Gemini 3.1 Pro	$1.25	$5.00	1M	Manual	Limited	✗ Closed
Together / Fireworks	~$0.18	~$0.35	1M	Varies	$5 min deposit	DeepSeek weights

FAQ

Frequently Asked Questions

How do I get started with the DeepSeek API?+

Sign up at platform.deepseek.com with email or Google — no credit card required. Every new account receives 5 million free tokens automatically. Install the OpenAI SDK (pip install openai), set base_url="https://api.deepseek.com/v1" and your DEEPSEEK_API_KEY. Use model deepseek-v4-flash for general use. The full API documentation is at api-docs.deepseek.com.

What is the deadline for migrating from deepseek-chat?+

The legacy aliases deepseek-chat and deepseek-reasoner are deprecated and will stop working at July 24, 2026 15:59 UTC — with no grace period and no silent fallback. Calls will return errors. The migration is a one-line change: update model to deepseek-v4-flash (replacing deepseek-chat) or deepseek-v4-pro (replacing deepseek-reasoner). The base URL, authentication, and request format are identical. Search your entire codebase including retry logic, fallback handlers, and configuration files — not just primary API calls.

How does context caching work and how do I maximize savings?+

DeepSeek's caching is fully automatic — no code changes, no explicit cache keys, no SDK flags. Any prompt prefix that has been seen recently will be served from cache at a 90% discount ($0.014/1M vs $0.14/1M for V4-Flash). To maximize cache hits: (1) Put your system prompt and any fixed context at the very top of every request. (2) Keep the variable parts (user query, latest message) at the end. (3) Use usage.prompt_cache_hit_tokens in the response to monitor your actual cache hit rate. A well-designed production app with a 1,000-token system prompt on 100K daily requests can save hundreds of dollars monthly purely from caching.

What's the difference between V4-Flash and V4-Pro for API use?+

V4-Flash: 284B MoE (13B active), $0.14/$0.28 per 1M, 83 tok/s, 79% SWE-bench. The recommended default for most production workloads — 12.4× cheaper per output token than Pro. Start here. V4-Pro: 1.6T MoE (49B active), $1.74/$3.48 per 1M (regular, $0.435/$0.87 promo until May 31), ~40 tok/s, 80.6% SWE-bench, Codeforces #1, IMO Gold with Think Max. Use when quality matters more than cost — complex multi-step agents, hard reasoning, or benchmark-critical tasks. Best practice: benchmark both on your actual task before committing. For most use cases, Flash is sufficient.

How do I use the thinking modes via API?+

Pass the extra_body parameter to control reasoning effort per request. Three modes: (1) {"thinking": {"type": "disabled"}} — Non-Think, instant response, no CoT. (2) {"thinking": {"type": "enabled", "budget": "high"}} — Think High, structured analytical reasoning. (3) {"thinking": {"type": "enabled", "budget": "max"}} — Think Max, full reasoning chain, most tokens, best quality for hard problems. When using thinking modes, the response contains both content (the final answer) and reasoning_content (the chain-of-thought). Important: In multi-turn conversations, only include content in the history — never reasoning_content. Including the reasoning chain in history degrades future responses.

What are the rate limits on the DeepSeek API?+

DeepSeek does not publish fixed public rate limits — limits are applied dynamically based on account status and server load. The free tier (5M tokens) is subject to stricter rate limiting than paid accounts. During peak demand (especially during model launches), the direct API may experience capacity constraints. For mission-critical production systems requiring 99.9%+ availability SLAs, DeepSeek recommends routing through cloud providers like AWS Bedrock or Azure AI, or inference providers like Together AI or Fireworks, which offer dedicated capacity at a modest premium (~25% higher per-token cost).

Does DeepSeek have an off-peak discount?+

Historically, DeepSeek offered 50–75% additional discounts for API calls made during 16:30–00:30 UTC on legacy V3 and R1 models. As of the V4 launch (April 24, 2026), the current official documentation shows the standard pricing table without an explicit off-peak tier listed. The 75% promotional discount on V4-Pro (until May 31, 2026) effectively provides a similar cost reduction for Pro users. Always check the official pricing page at api-docs.deepseek.com/quick_start/pricing for the most current rates — DeepSeek adjusts pricing and promotions regularly.

Can I access DeepSeek via AWS Bedrock or Azure?+

Yes. DeepSeek models are available through multiple cloud providers for enterprises requiring data residency, compliance guarantees, or higher reliability SLAs: AWS Bedrock (HIPAA/SOC2 eligible, data residency in your region), Azure AI (GDPR-compliant EU region available), Together AI and Fireworks (faster US inference, OpenAI-compatible API). Cloud provider pricing is typically 20–30% higher per token than the direct DeepSeek API, but includes enterprise SLAs, compliance certifications, and dedicated capacity. For sensitive data (healthcare, financial, legal), cloud provider routing is strongly recommended over the direct DeepSeek API.

Build with
DeepSeek API.

Every Model, Every Use Case

Pay Per Token. Nothing Else.

Everything You Need to Build

Integrate in 2 Minutes

Migrate Before July 24, 2026

Go from Zero to Production

DeepSeek vs the Competition

Frequently Asked Questions

Ready to build?

Build withDeepSeek API.

Every Model, Every Use Case

Pay Per Token. Nothing Else.

Everything You Need to Build

Integrate in 2 Minutes

Migrate Before July 24, 2026

Go from Zero to Production

DeepSeek vs the Competition

Frequently Asked Questions

Ready to build?

Build with
DeepSeek API.