DeepSeek API Reference
Complete developer reference for the DeepSeek API. The API is compatible with the OpenAI ChatCompletions format and the Anthropic Messages format — change two lines of code to migrate from either. Access powerful reasoning with V4-Pro and high-speed inference with V4-Flash.
Authentication
All API requests require an API key passed as a Bearer token in the Authorization header. API keys are created in the DeepSeek Platform console.
Never expose your API key in client-side code, version control, or logs. Use environment variables. If compromised, immediately revoke in the Platform console — keys cannot be recovered once deleted.
Your First API Call
Make your first request in under 2 minutes. Install the OpenAI SDK — no DeepSeek-specific package needed.
Base URLs
Two base URL options — both route to the same models. The /v1 suffix is a compatibility alias for OpenAI SDK users who hardcode the version segment.
| Base URL | Format | Notes |
|---|---|---|
https://api.deepseek.com | OpenAI & Anthropic | Recommended — no version suffix needed |
https://api.deepseek.com/v1 | OpenAI only | Compatibility alias for OpenAI SDK defaults |
https://api.deepseek.com/beta | OpenAI only | Required for FIM & Chat Prefix Completion |
https://api.deepseek.com/anthropic | Anthropic | Anthropic Messages API format |
Available Models
Two production V4 models and two legacy aliases pending retirement. Always use deepseek-v4-flash or deepseek-v4-pro for new integrations.
High-speed, cost-efficient frontier model. 284B MoE (13B active). 79% SWE-bench. 83 tok/s. Recommended for chat, coding, summaries, high-volume pipelines.
Frontier reasoning model. 1.6T MoE (49B active). 80.6% SWE-bench. Codeforces #1 (3206). Think Max for hard math and agentic tasks.
Legacy alias. Currently routes to V4-Flash non-thinking mode. Retires July 24, 2026 15:59 UTC. Will error with no fallback after that date.
Legacy alias. Currently routes to V4-Flash thinking mode. Retires July 24, 2026 15:59 UTC. Migrate to deepseek-v4-flash or deepseek-v4-pro with thinking enabled.
July 24, 2026 15:59 UTC — deepseek-chat and deepseek-reasoner will return errors with no fallback. Migration is a single line change: update the model parameter. Everything else stays identical. Search your entire codebase including retry handlers, config files, and fallback logic.
List Models (API)
Retrieve all currently available model IDs programmatically.
POST /chat/completions
The primary API endpoint. Creates a model response for the given conversation. Returns a completion object or a stream of SSE chunks when stream: true.
Request Parameters
Core Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | Model ID. Use deepseek-v4-flash or deepseek-v4-pro. Legacy: deepseek-chat, deepseek-reasoner (retiring Jul 24, 2026). |
| messages | array | Required | Conversation history. Each message has role (system, user, assistant, or tool) and content. Max context: 1M tokens. |
| stream | boolean | Optional | If true, returns Server-Sent Events (SSE) chunks. Each chunk is a data: line with a partial completion object. Terminated with data: [DONE]. Default: false. |
| max_tokens | integer | Optional | Maximum output tokens. Default varies by model. For Think Max mode, set to at least 32768. Hard maximum: 384000 (384K) for V4 models. |
| temperature | number | Optional | Sampling temperature 0–2. Higher = more random. Default: 1.0. Recommended: 0.6–0.7 for thinking mode. Note: ignored in pure reasoning modes. |
| top_p | number | Optional | Nucleus sampling. Considers tokens with cumulative probability up to top_p. Default: 1.0. Recommended: 0.95 for reasoning models. |
| response_format | object | Optional | Set {"type": "json_object"} to force valid JSON output. Set {"type": "text"} (default) for natural language. See JSON Mode. |
| tools | array | Optional | Array of tool definitions in OpenAI function-calling format. Each tool has type: "function" and a function object with name, description, parameters. |
| tool_choice | string | object | Optional | Controls tool selection. "auto" (model decides), "none" (no tools), "required" (must call a tool), or {"type": "function", "function": {"name": "..."}} to force specific function. |
| stop | string | array | Optional | Up to 4 stop sequences. Generation halts when any sequence is produced. Default: null. |
| frequency_penalty | number | Optional | Range −2.0 to 2.0. Positive values penalize tokens proportional to their frequency, reducing repetition. Default: 0. |
| presence_penalty | number | Optional | Range −2.0 to 2.0. Positive values penalize tokens that have appeared at all, encouraging new topics. Default: 0. |
| logprobs | boolean | Optional | If true, returns log probabilities for each output token. Default: false. |
| top_logprobs | integer | Optional | Number of top-k token log probability alternatives to return per token. Requires logprobs: true. Range: 0–20. |
Thinking Mode Parameters
Control reasoning depth via extra_body in the OpenAI SDK, or directly in the request body when using cURL. V4 models support three reasoning effort levels.
| Parameter | Type | Values | Description |
|---|---|---|---|
| thinking.type | string | enabled · disabled |
Enables or disables chain-of-thought reasoning. When enabled, responses include a reasoning_content field with the full thinking chain. Pass via extra_body={"thinking": {"type": "enabled"}}. |
| reasoning_effort | string | low · high · none |
Controls reasoning depth when thinking is enabled. high = Think Max (uses up to 50K+ tokens internally). low = faster CoT. none = no thinking (same as thinking.type: disabled). |
Response Object
The API returns a completion object on success. In streaming mode, multiple SSE chunks with partial deltas are emitted, terminated by data: [DONE].
finish_reason Values
| Value | Meaning |
|---|---|
stop | Generation completed naturally or reached a stop sequence. |
length | Generation stopped due to max_tokens limit. Increase max_tokens to get a complete response. |
tool_calls | Model decided to call one or more tools. Process tool calls and continue the conversation with tool results. |
content_filter | Generation stopped by content safety filter. Response may be partial or empty. |
Function Calling / Tool Use
Define tools in the same JSON schema format as OpenAI. The model decides which tools to call and with what arguments. Both V4-Flash and V4-Pro support function calling in all thinking modes.
Enable thinking mode with tools to let the model reason about which tools to call before making the calls. This significantly improves accuracy on complex multi-step agentic tasks. Set extra_body={"thinking": {"type": "enabled", "budget": "high"}}.
JSON Mode
Force the model to produce a valid JSON object by setting response_format. The model will always return parseable JSON — no need to strip markdown fences or handle parse errors.
The system prompt or user message should describe the expected JSON structure. The model uses this description to format its output. JSON mode cannot be combined with FIM completion.
Streaming (SSE)
Set stream: true to receive Server-Sent Events. Each chunk delivers a delta of new content. V4-Flash delivers ~83 tokens/second. Use streaming for chat UIs and real-time applications.
Raw SSE Format
Context Caching
Context caching reduces costs by 90% for repeated prompt prefixes. It is fully automatic — no code changes, no explicit cache keys. Repeated content at the beginning of your prompt is served at cache-hit pricing.
Structure prompts to maximize cache hits: put static content first (system prompts, tool definitions, reference documents) and variable content last (user messages, query-specific context). Cache persists for ~30 minutes of inactivity. Content below 64 tokens is never cached.
Reading Cache Usage
Every response includes precise cache hit/miss breakdown in the usage object for billing analysis.
FIM Completion Beta
Fill-In-Middle (FIM) completion predicts a missing segment given both prefix and suffix context. Used for IDE code completion, story continuation, and document infilling. Requires base_url="https://api.deepseek.com/beta".
base_url="https://api.deepseek.com/beta". Only supported in non-thinking mode.| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Required | deepseek-v4-pro (only model supporting FIM via beta) |
| prompt | string | Required | Code/text prefix — content before the gap to fill. |
| suffix | string | Optional | Code/text suffix — content after the gap. If omitted, model generates until completion. |
| max_tokens | integer | Optional | Maximum tokens to generate for the middle segment. Default: 4096. |
Chat Prefix Completion Beta
Force the assistant to begin its response with a specific prefix. Useful for output format control, structured completions, and guided generation. Set the last message role to assistant with prefix: true. Requires base_url="https://api.deepseek.com/beta".
Anthropic API Format
The DeepSeek API also supports the Anthropic Messages API format at https://api.deepseek.com/anthropic. This allows existing Anthropic SDK code to work with DeepSeek models by changing the base URL and API key only.
Multi-Turn Conversations
The DeepSeek API is stateless — each call is independent. To maintain conversation context, include the full message history on every request. This is identical to the OpenAI pattern.
When using thinking mode, the response contains both content and reasoning_content. When building conversation history, only include content — never include reasoning_content. Including the thinking chain in history significantly degrades subsequent response quality.
Agent & IDE Integrations
The DeepSeek API works directly with popular agent frameworks and coding tools — no wrapper code required. Switch the backend model in a config file.
| Tool | Integration Type | Config |
|---|---|---|
| Claude Code | Agentic CLI coding assistant | ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic |
| GitHub Copilot | IDE code completion (VS Code) | Set custom model endpoint in Copilot settings |
| OpenCode | Terminal AI coding assistant | OpenAI-compatible — set baseUrl and apiKey |
| Continue.dev | VS Code / JetBrains plugin | Add DeepSeek provider with base_url and key in config.json |
| Cursor | AI-first IDE | Add custom OpenAI-compatible endpoint in Models settings |
| LangChain | LLM orchestration framework | ChatOpenAI(base_url="https://api.deepseek.com", model="deepseek-v4-flash") |
GET /user/balance
Returns your current account balance. Useful for pre-flight checks before starting long-running jobs. The API returns both topped-up balance (purchased) and granted balance (free credits). Granted balance is consumed first.
Token Pricing
Pay-as-you-go per token. No monthly subscription. No minimum spend. Granted balance (free credits) is consumed before topped-up balance. All prices in USD per 1M tokens.
V4-Pro is at 75% discount until May 31, 2026: $0.435/1M input (regular $1.74/1M), $0.87/1M output (regular $3.48/1M). After May 31, regular pricing applies. Cache hit prices are 1/10 of cache miss prices across all models (effective 2026-04-26).
| Model | Context | Max Out | Input (cache hit) | Input (cache miss) | Output | Notes |
|---|---|---|---|---|---|---|
| deepseek-v4-flash | 1M | 384K | $0.014/1M | $0.14/1M | $0.28/1M | Default — recommended |
| deepseek-v4-pro | 1M | 384K | $0.044/1M | $0.435/1M 75% OFF | $0.87/1M promo | Promo until May 31 |
| deepseek-v4-pro | 1M | 384K | $0.174/1M | $1.74/1M | $3.48/1M | Regular (after May 31) |
| deepseek-chat DEP | 128K | 8K | $0.014/1M | $0.14/1M | $0.28/1M | Retires Jul 24, 2026 |
| deepseek-reasoner DEP | 128K | 64K CoT | $0.014/1M | $0.14/1M | $0.28/1M | Retires Jul 24, 2026 |
Cache hit price = 1/10 of cache miss price. New accounts: $5 in free API credits (≈ 35M input tokens on V4-Flash). Always verify at api-docs.deepseek.com.
Error Codes
HTTP status codes and error response format. All errors include a JSON body with error.message and error.type fields.
Malformed request — invalid JSON, missing required field, or unsupported parameter combination. Check that messages is a non-empty array, model is a valid model ID, and parameter types are correct. FIM + JSON mode is not supported simultaneously.
Invalid, missing, or revoked API key. Verify Authorization: Bearer sk-... header. Check the API key is active in the Platform console. Do not wrap the key in extra quotes.
Account balance is insufficient to complete the request. Add credits at platform.deepseek.com/billing. Use GET /user/balance to pre-check balance before long-running jobs.
Request structure is valid but contains logical errors — e.g., FIM mode with thinking enabled, prefix completion without prefix: true, or incompatible parameter combinations. Check the error message body for the specific violation.
Too many requests. Implement exponential backoff with jitter. Check the Retry-After header if present. For production systems requiring guaranteed capacity, consider routing through AWS Bedrock or Azure AI for dedicated throughput.
Unexpected error on the DeepSeek server. Retry with exponential backoff. If the error persists across multiple retries, check status.deepseek.com for ongoing incidents.
Server is temporarily overloaded. Implement retry logic with at least 30 seconds delay. For mission-critical workloads requiring 99.9%+ SLA, route through a cloud provider (AWS Bedrock, Azure AI, Together AI, Fireworks) for dedicated capacity.
Error Response Format
Rate Limits
DeepSeek does not publish fixed public rate limits — limits are applied dynamically based on account tier and server load. Free-tier accounts (using granted balance) are subject to stricter limits than paid accounts.
Implement exponential backoff on 429 and 503 responses: start at 1s, double on each retry, cap at 60s, add 0–1s jitter. Check the Retry-After header first. For high-volume production workloads with SLA requirements, DeepSeek recommends routing through cloud provider endpoints (AWS Bedrock, Azure AI) which offer dedicated capacity.
Migration Guide
From OpenAI
Migrating from GPT to DeepSeek requires two line changes. All request/response structure, streaming, tool calling, and JSON mode work identically.
From Legacy Aliases
If your code uses deepseek-chat or deepseek-reasoner, update before July 24, 2026:
API Changelog
New: deepseek-v4-pro and deepseek-v4-flash. 1M context window. Available via OpenAI and Anthropic format. Legacy aliases (deepseek-chat, deepseek-reasoner) deprecated — retirement date July 24, 2026. Cache hit price reduced to 1/10 of launch price.
Both aliases upgraded to DeepSeek-V3.2. deepseek-chat → V3.2 non-thinking. deepseek-reasoner → V3.2 thinking. V3.2-Speciale served via temporary endpoint (expired Dec 15, 2025).
deepseek-reasoner upgraded to R1-0528. Added function calling and JSON output support to R1. System prompts now supported. AIME 2025 accuracy: 70% → 87.5%.
deepseek-reasoner launched as the reasoning alias, serving DeepSeek-R1. 97.3% MATH-500. Introduced reasoning_content field and chain-of-thought API surface.
Launched automatic context caching on disk. Launched FIM Completion Beta and Chat Prefix Completion Beta via api.deepseek.com/beta.