OpenAI-compatible. Frontier intelligence. Starting at $0.14 per million tokens - up to 100× cheaper than GPT-5 with automatic context caching and no monthly fees.
V4 is the current generation. Legacy aliases retire July 24, 2026 — migrate to explicit model IDs now.
1.6 trillion parameter MoE model. 80.6% SWE-bench, #1 Codeforces (3,206), gold medal IMO 2025. Best open-weights model available. Three reasoning modes: Non-Think, Think High, Think Max.
Input: $1.74/1M · Output: $3.48/1M (75% off until May 31)
Use in API →284B parameter MoE, 13B active. Within 1–2 points of Pro on most benchmarks at 12× lower cost. The recommended default for production workloads and high-volume pipelines.
Input: $0.14/1M · Output: $0.28/1M
Use in API →deepseek-chat → V4-Flash (non-think)deepseek-reasoner → V4-Flash (think)
Migrate before July 24, 2026 to avoid service disruption. No other changes required.
⚠️ Update model name only — no other changes needed
Migration Guide →671B parameter MoE (37B active). The previous generation flagship — still an excellent value at lower cost than V4. 128K context window. Gold medal on IMO and IOI 2025. Strong for general chat, RAG, and summarization where V4's improvements aren't needed.
Input: $0.28/1M · Cache hit: $0.028/1M · Output: $0.42/1M
Use in API →Purpose-built for software engineering. 82.6% HumanEval — beats GPT-4o. Understands full repositories, not just snippets. 340+ languages with expert-level Python, Java, C++, and TypeScript support.
Check current pricing at api-docs.deepseek.com
View on GitHub →Pure RL-trained reasoning model. 97.3% on MATH-500. Develops chain-of-thought organically. Best for competition math, logical proofs, and multi-step scientific reasoning.
Input: $0.55/1M · Output: $2.19/1M
View Paper →Vision-language model for image understanding, OCR, chart analysis, and document parsing. Process screenshots, diagrams, and photos alongside text for multimodal workflows.
Check current pricing at api-docs.deepseek.com
View on GitHub →A complete developer platform — not just a model endpoint.
Drop-in replacement for the OpenAI API. Change base_url and api_key — your existing code, SDKs, and integrations work without any other modification.
Automatic prompt prefix caching — no configuration needed. Repeated system prompts, documents, and conversation history serve from cache at 90% discount. Saves thousands of dollars at scale.
90% cost reductionServer-Sent Events streaming for real-time token delivery. Essential for chat UIs and interactive applications. Set stream=True in any request — no other changes.
Native tool use with OpenAI-compatible JSON schema definitions. Both V4-Pro and V4-Flash support multi-tool calls, parallel execution, and structured function responses for AI agent workflows.
Agent-readyForce structured JSON output with guaranteed schema adherence. Ideal for data extraction, classification pipelines, and any application requiring machine-readable responses.
Structured outputThree reasoning effort levels: Non-Think (instant), Think High (analytical), Think Max (maximum effort). Control latency vs accuracy dynamically per-request without changing models.
Adaptive reasoningProcess entire codebases, books, legal contracts, or conversation histories in a single request. CSA+HCA architecture makes 1M-token inference economically viable — not just a paper spec.
Longest contextIn addition to OpenAI-compatible endpoints, DeepSeek supports the Anthropic Messages API format. Teams already using Claude can migrate with minimal code changes.
Multi-formatReal-time dashboard tracking token consumption, cache hit rates, cost per model, and request latency. Response objects include usage.prompt_cache_hit_tokens for precise billing.
From zero to your first API response — works with any OpenAI-compatible SDK.
Go to platform.deepseek.com and sign up. You receive 5 million free tokens with no credit card required.
Navigate to API Keys in your dashboard. Keys look like sk-xxxxxxxx.... Store it in your environment — never commit it to source control.
Run pip install openai (Python) or npm install openai (Node.js). No DeepSeek-specific library needed — the official OpenAI SDK works.
Set base_url="https://api.deepseek.com/v1" and your key. Use model deepseek-v4-flash or deepseek-v4-pro. That's the entire migration.
Place your system prompt first and keep it consistent across requests. After the first call, it's cached at $0.014/1M — a 90% reduction. No code changes needed.
No monthly subscription. No seat fees. New accounts get 5M tokens free.
Prices verified April 28, 2026 · Always confirm at api-docs.deepseek.com/quick_start/pricing
| Model | Input /1M | Output /1M | Context | Open? | Cache? |
|---|---|---|---|---|---|
| V4-Flash | $0.14 | $0.28 | 1M | ✓ MIT | Auto |
| V4-Pro | $1.74 | $3.48 | 1M | ✓ MIT | Auto |
| V3.2 | $0.28 | $0.42 | 128K | ✓ MIT | Auto |
| GPT-5.5 | $5.00 | $20.00 | 128K | ✗ | Manual |
| Claude Opus 4.7 | $5.00 | $25.00 | 200K | ✗ | Manual |
| Gemini 3 Pro | $1.25 | $5.00 | 1M | ✗ | Manual |
| GPT-5.4 | $10.00 | $30.00 | 128K | ✗ | Manual |
DeepSeek caches repeated prompt prefixes for free. No configuration, no TTL settings, no code changes.
response.usage.prompt_cache_hit_tokens in every response to track your savings.
All OpenAI-compatible tools work out-of-the-box. Just change the base URL.
Dynamic concurrency limits and standard HTTP error codes. Here's how to handle them.
API key is missing, invalid, or expired. Verify the key in your environment and check it hasn't been revoked from the dashboard.
Account balance too low. Add credits at platform.deepseek.com. Minimum top-up is $2.00.
Check the model name string. Use deepseek-v4-flash or deepseek-v4-pro. Legacy aliases retire July 24, 2026.
Concurrency limit reached. Implement exponential backoff. Check the Retry-After header for the wait duration.
Internal error on DeepSeek's side. Retry with backoff. Check service status at status.deepseek.com.
Peak load or maintenance. Most common during peak hours (UTC 08:00–16:00). Consider off-peak scheduling (UTC 16:30–00:30) for batch jobs.
Yes — fully. The API uses the same endpoint structure (/v1/chat/completions), the same request/response schema, the same streaming format (SSE), and supports both the OpenAI Python/Node SDKs without modification. Change only base_url to https://api.deepseek.com/v1 and swap your API key. Function calling, JSON mode, system prompts, and multi-turn conversations all work identically. The Anthropic Messages format is also supported as an alternative.
Replace model="deepseek-chat" with model="deepseek-v4-flash" for standard mode, and model="deepseek-reasoner" with model="deepseek-v4-flash" (then enable thinking mode via extra_body if needed). No other changes required. The base URL, authentication, and request format remain identical. Deadline: July 24, 2026, 15:59 UTC — after this, legacy aliases will error with no fallback.
No — caching is completely automatic. When a request shares the same prefix as a recent request, DeepSeek automatically serves those tokens from cache at 10% of the normal input price (e.g., $0.014/1M instead of $0.14/1M for V4-Flash). There's no setup, no cache keys, and no TTL configuration. To maximize savings: keep your system prompt identical across requests and place static documents before dynamic user content. Monitor cache hits via response.usage.prompt_cache_hit_tokens.
V4-Flash (284B total, 13B active) is 12× cheaper than V4-Pro and within 1–2 benchmark points on most coding and analysis tasks. V4-Pro (1.6T total, 49B active) is the flagship — it has the highest Codeforces score (3,206), 80.6% SWE-bench, and stronger reasoning on the most complex tasks. Start with Flash; only upgrade to Pro if your evaluation shows a meaningful quality gap for your specific workload. Both support 1M context, 384K output, and all three thinking modes.
Yes. Every new account at platform.deepseek.com receives 5 million free tokens with no credit card required. This is enough for ~3,500 typical API calls. After the free tier, the minimum top-up is $2.00. There are no monthly fees or seat licenses — you pay only for tokens consumed. Granted credits are used first before topped-up balance.
DeepSeek uses dynamic concurrency-based rate limiting rather than fixed token-per-minute caps. When you reach the limit, you receive an HTTP 429 with a Retry-After header. Implement exponential backoff in your retry logic. The API doesn't enforce strict per-account request limits, but during peak demand periods (roughly UTC 08:00–16:00) you may see higher latency or temporary 503 errors. For batch jobs, scheduling during off-peak hours (UTC 16:30–00:30) can reduce both cost and latency.
Yes — any framework that supports the OpenAI API format works. For LangChain: use ChatOpenAI with openai_api_base="https://api.deepseek.com/v1". For LlamaIndex: configure OpenAI with the same base URL override. Claude Code, OpenClaw, and OpenCode are officially integrated with DeepSeek V4 as well.
DeepSeek is a Chinese company and data processed through the public API may be stored on servers subject to Chinese law. For non-sensitive workloads (code review, public data analysis, marketing copy) this is generally fine. For sensitive data (healthcare, financial, legal, PII), recommended approaches are: (1) self-host the open-weight models on your own infrastructure — the MIT license allows this at no cost, or (2) access DeepSeek models via compliant cloud providers (AWS Bedrock, Azure AI) that offer HIPAA/SOC2 compliance and data residency guarantees.
5 million free tokens. No credit card. OpenAI-compatible. The most affordable frontier AI API period.