The most cost-effective frontier AI API in the world. V4-Flash at $0.14/1M tokens. V4-Pro with Think Max reasoning. OpenAI-compatible. 5M free tokens on signup. Zero monthly fees.
Four models via the DeepSeek API — from the ultra-affordable Flash workhorse to the frontier Pro reasoning model. All OpenAI-compatible.
The recommended starting point for all production workloads. 284B MoE (13B active per token). 79% SWE-bench. 83 tok/s. Best for chat, coding, summaries, RAG pipelines, and high-volume APIs where cost dominates.
1.6T parameter flagship. 49B active per token. 80.6% SWE-bench. Codeforces #1 (3206). 97.3% MATH-500 with Think Max. Best for complex coding, agentic workflows, research, and tasks where maximum quality matters.
The legacy alias currently routes to V4-Flash in non-thinking mode. Stable for existing integrations. Retires July 24, 2026 — migrate to deepseek-v4-flash before that date or calls will error.
The legacy reasoning alias routes to V4-Flash in thinking mode. Retires July 24, 2026 — migrate to deepseek-v4-flash or deepseek-v4-pro with thinking enabled. No silent fallback after retirement.
deepseek-chat and deepseek-reasoner will return errors with no fallback after this timestamp. Update to deepseek-v4-flash or deepseek-v4-pro. Only the model parameter needs to change — base URL and auth stay identical.
No monthly subscription. No seat fees. No capacity minimums. You pay for exactly the tokens you use — and automatic context caching slashes repeated costs by 90%.
| Model | Context | Max Output | Input (cache hit) | Input (cache miss) | Output | Notes |
|---|---|---|---|---|---|---|
| deepseek-v4-flash | 1M tokens | 384K | $0.014/1M | $0.14/1M | $0.28/1M | Recommended default |
| deepseek-v4-pro | 1M tokens | 384K | $0.044/1M | $0.435/1M75% OFF | $0.87/1Mpromo | Promo until May 31 |
| deepseek-v4-pro (regular) | 1M tokens | 384K | $0.174/1M | $1.74/1M | $3.48/1M | After May 31 |
| deepseek-chat (legacy) | 128K | 8K | $0.014/1M | $0.14/1M | $0.28/1M | ⚠️ Retires Jul 24 |
| deepseek-reasoner (legacy) | 128K | 64K CoT | $0.014/1M | $0.14/1M | $0.28/1M | ⚠️ Retires Jul 24 |
The DeepSeek Platform provides every capability modern AI applications require — from streaming to function calling to structured output — all on an OpenAI-compatible foundation.
Supports both the OpenAI ChatCompletions format and the Anthropic Messages API format. Change base URL and API key — all existing streaming, tool, and JSON code works unchanged.
2-line migrationReal-time token streaming over Server-Sent Events. V4-Flash delivers 83 tok/s — fast enough for smooth chat UIs with sub-second time-to-first-token at 1.11s.
83 tok/sDefine tools in the same JSON schema format as OpenAI. V4 adds an improved agentic task synthesis pipeline for significantly better real-world tool-use accuracy.
Agent nativeForce valid JSON output with response_format: {"type": "json_object"}. Guaranteed parseable by JSON.parse() with no pre-processing.
Disk-based caching activates automatically. No SDK changes, no explicit cache keys. Repeated prompt prefixes hit at 90% discount. Stale cache cleared after 30 minutes of inactivity.
90% off, automaticControl reasoning effort per request: Non-Think (instant), Think High (analytical), Think Max (frontier reasoning). Switch without changing models via the extra_body.thinking parameter.
In thinking modes, the response includes a separate reasoning_content field containing the full chain-of-thought. Use for debugging, education, and verification.
Every API response includes usage.prompt_cache_hit_tokens and prompt_cache_miss_tokens for precise billing analysis.
Predict missing code segments from prefix + suffix context using PSM (Prefix-Suffix-Middle) mode. Powers IDE autocomplete integrations for VS Code, Cursor, and JetBrains.
IDE integrationBoth V4-Flash and V4-Pro support 1 million token context at no additional cost. Process entire codebases, legal documents, or book-length content in a single request.
No extra chargeSubmit bulk inference jobs for offline processing at reduced cost. Ideal for data annotation, evaluation pipelines, and bulk document analysis where latency is not critical.
Offline workloadsTop up with balance and pay only for tokens used. No monthly minimums. Granted balance (from free credits) is consumed before purchased balance. Usage dashboard in the platform console.
No minimumsDeepSeek is fully OpenAI-compatible. Change base URL and API key. Everything else stays the same.
Legacy aliases deepseek-chat and deepseek-reasoner retire at 15:59 UTC on July 24, 2026. No grace period. No silent fallback.
Migration checklist
Search your codebase for deepseek-chat and deepseek-reasoner — not just in main API calls but also in retry logic, fallback handlers, error messages, and configuration files.
deepseek-chat → deepseek-v4-flash (same pricing, 1M context, faster). deepseek-reasoner → deepseek-v4-flash or deepseek-v4-pro with thinking.type: "enabled".
V4 supports 1M tokens vs legacy 128K. Older tooling may have context limits hardcoded. Update any max_tokens or context truncation logic to take advantage of the expanded window.
If you were using deepseek-reasoner for multi-turn, ensure your history-building code only includes the content field, not reasoning_content — including the latter in history degrades performance.
Track cache hit rates via usage.prompt_cache_hit_tokens. A well-structured prompt with a stable system prefix should see 60–85% cache hit rates, cutting input costs by more than half.
From account creation to a production API integration in under 5 minutes.
Go to platform.deepseek.com. Sign up with Email or Google. No credit card required to start — every new account receives 5 million free tokens automatically deposited.
In the platform console, go to API Keys and click Create Key. Copy and store it securely — it won't be shown again. Set it as the environment variable DEEPSEEK_API_KEY.
Install via pip install openai (Python) or npm install openai (Node.js). No DeepSeek-specific SDK needed — the standard OpenAI client works directly.
Set base_url="https://api.deepseek.com/v1" and model="deepseek-v4-flash". That's it. If you're migrating from OpenAI, those are the only two changes needed.
Put your system prompt at the very start of every request. Stable prefixes hit cache at $0.014/1M (90% off). Use usage.prompt_cache_hit_tokens to monitor savings in production.
Once free tokens are used, top up with balance via the billing dashboard. Pay-as-you-go. No monthly fees. Monitor usage and set alerts in the platform console to control spend.
API pricing and capability comparison across the major frontier AI providers — May 2026.
| Provider | Input / 1M | Output / 1M | Context | Cache discount | Free tier | Open source |
|---|---|---|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M | 90% (auto) | 5M tokens | ✓ MIT |
| DeepSeek V4-Pro | $1.74 | $3.48 | 1M | 90% (auto) | 5M tokens | ✓ MIT |
| GPT-5.5 | $5.00 | $20.00 | 128K | 50% manual | Limited | ✗ Closed |
| Claude Opus 4.7 | $5.00 | $25.00 | 200K | Manual | None | ✗ Closed |
| Gemini 3.1 Pro | $1.25 | $5.00 | 1M | Manual | Limited | ✗ Closed |
| Together / Fireworks | ~$0.18 | ~$0.35 | 1M | Varies | $5 min deposit | DeepSeek weights |
Prices from official provider documentation, May 2026. Third-party providers add ~25% premium for improved SLA and global latency.
Sign up at platform.deepseek.com with email or Google — no credit card required. Every new account receives 5 million free tokens automatically. Install the OpenAI SDK (pip install openai), set base_url="https://api.deepseek.com/v1" and your DEEPSEEK_API_KEY. Use model deepseek-v4-flash for general use. The full API documentation is at api-docs.deepseek.com.
The legacy aliases deepseek-chat and deepseek-reasoner are deprecated and will stop working at July 24, 2026 15:59 UTC — with no grace period and no silent fallback. Calls will return errors. The migration is a one-line change: update model to deepseek-v4-flash (replacing deepseek-chat) or deepseek-v4-pro (replacing deepseek-reasoner). The base URL, authentication, and request format are identical. Search your entire codebase including retry logic, fallback handlers, and configuration files — not just primary API calls.
DeepSeek's caching is fully automatic — no code changes, no explicit cache keys, no SDK flags. Any prompt prefix that has been seen recently will be served from cache at a 90% discount ($0.014/1M vs $0.14/1M for V4-Flash). To maximize cache hits: (1) Put your system prompt and any fixed context at the very top of every request. (2) Keep the variable parts (user query, latest message) at the end. (3) Use usage.prompt_cache_hit_tokens in the response to monitor your actual cache hit rate. A well-designed production app with a 1,000-token system prompt on 100K daily requests can save hundreds of dollars monthly purely from caching.
V4-Flash: 284B MoE (13B active), $0.14/$0.28 per 1M, 83 tok/s, 79% SWE-bench. The recommended default for most production workloads — 12.4× cheaper per output token than Pro. Start here. V4-Pro: 1.6T MoE (49B active), $1.74/$3.48 per 1M (regular, $0.435/$0.87 promo until May 31), ~40 tok/s, 80.6% SWE-bench, Codeforces #1, IMO Gold with Think Max. Use when quality matters more than cost — complex multi-step agents, hard reasoning, or benchmark-critical tasks. Best practice: benchmark both on your actual task before committing. For most use cases, Flash is sufficient.
Pass the extra_body parameter to control reasoning effort per request. Three modes: (1) {"thinking": {"type": "disabled"}} — Non-Think, instant response, no CoT. (2) {"thinking": {"type": "enabled", "budget": "high"}} — Think High, structured analytical reasoning. (3) {"thinking": {"type": "enabled", "budget": "max"}} — Think Max, full reasoning chain, most tokens, best quality for hard problems. When using thinking modes, the response contains both content (the final answer) and reasoning_content (the chain-of-thought). Important: In multi-turn conversations, only include content in the history — never reasoning_content. Including the reasoning chain in history degrades future responses.
DeepSeek does not publish fixed public rate limits — limits are applied dynamically based on account status and server load. The free tier (5M tokens) is subject to stricter rate limiting than paid accounts. During peak demand (especially during model launches), the direct API may experience capacity constraints. For mission-critical production systems requiring 99.9%+ availability SLAs, DeepSeek recommends routing through cloud providers like AWS Bedrock or Azure AI, or inference providers like Together AI or Fireworks, which offer dedicated capacity at a modest premium (~25% higher per-token cost).
Historically, DeepSeek offered 50–75% additional discounts for API calls made during 16:30–00:30 UTC on legacy V3 and R1 models. As of the V4 launch (April 24, 2026), the current official documentation shows the standard pricing table without an explicit off-peak tier listed. The 75% promotional discount on V4-Pro (until May 31, 2026) effectively provides a similar cost reduction for Pro users. Always check the official pricing page at api-docs.deepseek.com/quick_start/pricing for the most current rates — DeepSeek adjusts pricing and promotions regularly.
Yes. DeepSeek models are available through multiple cloud providers for enterprises requiring data residency, compliance guarantees, or higher reliability SLAs: AWS Bedrock (HIPAA/SOC2 eligible, data residency in your region), Azure AI (GDPR-compliant EU region available), Together AI and Fireworks (faster US inference, OpenAI-compatible API). Cloud provider pricing is typically 20–30% higher per token than the direct DeepSeek API, but includes enterprise SLAs, compliance certifications, and dedicated capacity. For sensitive data (healthcare, financial, legal), cloud provider routing is strongly recommended over the direct DeepSeek API.
5 million free tokens. No credit card. OpenAI-compatible. The most cost-effective frontier AI API in the world — start in 2 minutes.