DeepSeek API Documentation [API Docs] | Complete Reference 2026

Auth

Authentication

All API requests require an API key passed as a Bearer token in the Authorization header. API keys are created in the DeepSeek Platform console.

⚠ Security

Never expose your API key in client-side code, version control, or logs. Use environment variables. If compromised, immediately revoke in the Platform console — keys cannot be recovered once deleted.

      HTTP Header
      
Authorization: Bearer $DEEPSEEK_API_KEY
Content-Type: application/json

      Set API Key — Shell
      
# Add to ~/.bashrc or ~/.zshrc
export DEEPSEEK_API_KEY="sk-your-key-here"

# Verify it's set
echo $DEEPSEEK_API_KEY

Quickstart

Your First API Call

Make your first request in under 2 minutes. Install the OpenAI SDK — no DeepSeek-specific package needed.

# pip install openai from openai import OpenAI import os client = OpenAI( api_key=os.getenv("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com", ) response = client.chat.completions.create( model="deepseek-v4-flash", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}, ], stream=False, ) print(response.choices[0].message.content)

// npm install openai import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.DEEPSEEK_API_KEY, baseURL: 'https://api.deepseek.com', }); const response = await client.chat.completions.create({ model: 'deepseek-v4-flash', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' }, ], stream: false, }); console.log(response.choices[0].message.content);

curl https://api.deepseek.com/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $DEEPSEEK_API_KEY" \ -d '{ "model": "deepseek-v4-flash", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "stream": false }'

Base URLs

Two base URL options — both route to the same models. The /v1 suffix is a compatibility alias for OpenAI SDK users who hardcode the version segment.

Base URL	Format	Notes
`https://api.deepseek.com`	OpenAI & Anthropic	Recommended — no version suffix needed
`https://api.deepseek.com/v1`	OpenAI only	Compatibility alias for OpenAI SDK defaults
`https://api.deepseek.com/beta`	OpenAI only	Required for FIM & Chat Prefix Completion
`https://api.deepseek.com/anthropic`	Anthropic	Anthropic Messages API format

Models

Available Models

Two production V4 models and two legacy aliases pending retirement. Always use deepseek-v4-flash or deepseek-v4-pro for new integrations.

⚡ DEFAULT · FAST

DeepSeek-V4-Flash

deepseek-v4-flash

High-speed, cost-efficient frontier model. 284B MoE (13B active). 79% SWE-bench. 83 tok/s. Recommended for chat, coding, summaries, high-volume pipelines.

1M tokens

Context

384K

Max output

$0.14/1M

Input

$0.28/1M

Output

🧠 FLAGSHIP

DeepSeek-V4-Pro

deepseek-v4-pro

Frontier reasoning model. 1.6T MoE (49B active). 80.6% SWE-bench. Codeforces #1 (3206). Think Max for hard math and agentic tasks.

1M tokens

Context

384K

Max output

$0.435/1M

Input (promo)

$0.87/1M

Output (promo)

⚠ DEPRECATED

deepseek-chat

deepseek-chat → deepseek-v4-flash (non-thinking)

Legacy alias. Currently routes to V4-Flash non-thinking mode. Retires July 24, 2026 15:59 UTC. Will error with no fallback after that date.

Jul 24, 2026

Retirement

V4-Flash

Routes to

⚠ DEPRECATED

deepseek-reasoner

deepseek-reasoner → deepseek-v4-flash (thinking)

Legacy alias. Currently routes to V4-Flash thinking mode. Retires July 24, 2026 15:59 UTC. Migrate to deepseek-v4-flash or deepseek-v4-pro with thinking enabled.

Jul 24, 2026

Retirement

V4-Flash

Routes to

🚨 Migration Deadline

July 24, 2026 15:59 UTC — deepseek-chat and deepseek-reasoner will return errors with no fallback. Migration is a single line change: update the model parameter. Everything else stays identical. Search your entire codebase including retry handlers, config files, and fallback logic.

List Models (API)

Retrieve all currently available model IDs programmatically.

GET /models

Returns a list of available models with their IDs and metadata. Useful for verifying model availability before sending requests.

cURL

curl https://api.deepseek.com/models \ -H "Authorization: Bearer $DEEPSEEK_API_KEY"

Core Endpoint

POST /chat/completions

The primary API endpoint. Creates a model response for the given conversation. Returns a completion object or a stream of SSE chunks when stream: true.

POST https://api.deepseek.com/chat/completions

Creates a model response for the given chat conversation. Supports streaming, tool calling, JSON mode, thinking modes, and multi-turn conversation.

Request Body

Request Parameters

Core Parameters

Parameter	Type	Required	Description
model	string	Required	Model ID. Use `deepseek-v4-flash` or `deepseek-v4-pro`. Legacy: `deepseek-chat`, `deepseek-reasoner` (retiring Jul 24, 2026).
messages	array	Required	Conversation history. Each message has `role` (`system`, `user`, `assistant`, or `tool`) and `content`. Max context: 1M tokens.
stream	boolean	Optional	If `true`, returns Server-Sent Events (SSE) chunks. Each chunk is a `data:` line with a partial completion object. Terminated with `data: [DONE]`. Default: `false`.
max_tokens	integer	Optional	Maximum output tokens. Default varies by model. For Think Max mode, set to at least `32768`. Hard maximum: `384000` (384K) for V4 models.
temperature	number	Optional	Sampling temperature 0–2. Higher = more random. Default: `1.0`. Recommended: `0.6–0.7` for thinking mode. Note: ignored in pure reasoning modes.
top_p	number	Optional	Nucleus sampling. Considers tokens with cumulative probability up to `top_p`. Default: `1.0`. Recommended: `0.95` for reasoning models.
response_format	object	Optional	Set `{"type": "json_object"}` to force valid JSON output. Set `{"type": "text"}` (default) for natural language. See JSON Mode.
tools	array	Optional	Array of tool definitions in OpenAI function-calling format. Each tool has `type: "function"` and a `function` object with `name`, `description`, `parameters`.
tool_choice	string \| object	Optional	Controls tool selection. `"auto"` (model decides), `"none"` (no tools), `"required"` (must call a tool), or `{"type": "function", "function": {"name": "..."}}` to force specific function.
stop	string \| array	Optional	Up to 4 stop sequences. Generation halts when any sequence is produced. Default: `null`.
frequency_penalty	number	Optional	Range −2.0 to 2.0. Positive values penalize tokens proportional to their frequency, reducing repetition. Default: `0`.
presence_penalty	number	Optional	Range −2.0 to 2.0. Positive values penalize tokens that have appeared at all, encouraging new topics. Default: `0`.
logprobs	boolean	Optional	If `true`, returns log probabilities for each output token. Default: `false`.
top_logprobs	integer	Optional	Number of top-k token log probability alternatives to return per token. Requires `logprobs: true`. Range: 0–20.

Thinking Mode Parameters

Control reasoning depth via extra_body in the OpenAI SDK, or directly in the request body when using cURL. V4 models support three reasoning effort levels.

Parameter	Type	Values	Description
thinking.type	string	`enabled` · `disabled`	Enables or disables chain-of-thought reasoning. When enabled, responses include a `reasoning_content` field with the full thinking chain. Pass via `extra_body={"thinking": {"type": "enabled"}}`.
reasoning_effort	string	`low` · `high` · `none`	Controls reasoning depth when thinking is enabled. `high` = Think Max (uses up to 50K+ tokens internally). `low` = faster CoT. `none` = no thinking (same as `thinking.type: disabled`).

Python — Think Max

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Prove √2 is irrational."}],
    max_tokens=65536,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

# reasoning_content = full chain-of-thought
chain = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

# ⚠ CRITICAL for multi-turn: never include reasoning_content
# in conversation history — only include content
    

Response

Response Object

The API returns a completion object on success. In streaming mode, multiple SSE chunks with partial deltas are emitted, terminated by data: [DONE].

Response · 200 OK 200

{ "id": "chatcmpl-930c60df-bf64-41c9-a88e-3ec75f81e00e", "object": "chat.completion", "created": 1746259200, "model": "deepseek-v4-flash", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?", "reasoning_content": null // populated in thinking mode }, "finish_reason": "stop" // stop | length | tool_calls | content_filter }], "usage": { "prompt_tokens": 16, "completion_tokens": 10, "total_tokens": 26, "prompt_cache_hit_tokens": 12, // billed at 10% (cache hit) "prompt_cache_miss_tokens": 4 // billed at 100% (cache miss) } }

finish_reason Values

Value	Meaning
`stop`	Generation completed naturally or reached a stop sequence.
`length`	Generation stopped due to `max_tokens` limit. Increase `max_tokens` to get a complete response.
`tool_calls`	Model decided to call one or more tools. Process tool calls and continue the conversation with tool results.
`content_filter`	Generation stopped by content safety filter. Response may be partial or empty.

Tool Calling

Function Calling / Tool Use

Define tools in the same JSON schema format as OpenAI. The model decides which tools to call and with what arguments. Both V4-Flash and V4-Pro support function calling in all thinking modes.

Python — Tool Definition & Response

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit":     {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    }
}]

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

# If finish_reason == "tool_calls", handle the call
msg = response.choices[0].message
if msg.tool_calls:
    for call in msg.tool_calls:
        print(call.function.name, call.function.arguments)
    

💡 Think + Tools

Enable thinking mode with tools to let the model reason about which tools to call before making the calls. This significantly improves accuracy on complex multi-step agentic tasks. Set extra_body={"thinking": {"type": "enabled", "budget": "high"}}.

Structured Output

JSON Mode

Force the model to produce a valid JSON object by setting response_format. The model will always return parseable JSON — no need to strip markdown fences or handle parse errors.

Python — JSON Mode

import json

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{
        "role": "user",
        "content": "Return a JSON object with fields: name (string), age (number), city (string). Example person: Alice, 30, Tokyo.",
    }],
    response_format={"type": "json_object"},
)

data = json.loads(response.choices[0].message.content)
print(data["name"])  # "Alice"
    

ℹ Note

The system prompt or user message should describe the expected JSON structure. The model uses this description to format its output. JSON mode cannot be combined with FIM completion.

Streaming

Streaming (SSE)

Set stream: true to receive Server-Sent Events. Each chunk delivers a delta of new content. V4-Flash delivers ~83 tokens/second. Use streaming for chat UIs and real-time applications.

stream = client.chat.completions.create( model="deepseek-v4-flash", messages=[{"role": "user", "content": "Tell me a joke."}], stream=True, ) for chunk in stream: delta = chunk.choices[0].delta.content if delta: print(delta, end="", flush=True)

Raw SSE Format

SSE Stream

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":""},"index":0}]} data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]} data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"!"},"index":0}]} data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}]} data: [DONE]

Caching

Context Caching

Context caching reduces costs by 90% for repeated prompt prefixes. It is fully automatic — no code changes, no explicit cache keys. Repeated content at the beginning of your prompt is served at cache-hit pricing.

💡 Cache Optimization

Structure prompts to maximize cache hits: put static content first (system prompts, tool definitions, reference documents) and variable content last (user messages, query-specific context). Cache persists for ~30 minutes of inactivity. Content below 64 tokens is never cached.

Reading Cache Usage

Every response includes precise cache hit/miss breakdown in the usage object for billing analysis.

Usage object with cache fields

"usage": { "prompt_tokens": 1024, "completion_tokens": 256, "total_tokens": 1280, "prompt_cache_hit_tokens": 900, // billed at $0.014/1M (V4-Flash) "prompt_cache_miss_tokens": 124 // billed at $0.14/1M (V4-Flash) }

Beta

FIM Completion Beta

Fill-In-Middle (FIM) completion predicts a missing segment given both prefix and suffix context. Used for IDE code completion, story continuation, and document infilling. Requires base_url="https://api.deepseek.com/beta".

POST https://api.deepseek.com/beta/completions Beta

FIM completion API. Compatible with OpenAI FIM API format. Only available with base_url="https://api.deepseek.com/beta". Only supported in non-thinking mode.

Parameter	Type	Required	Description
model	string	Required	`deepseek-v4-pro` (only model supporting FIM via beta)
prompt	string	Required	Code/text prefix — content before the gap to fill.
suffix	string	Optional	Code/text suffix — content after the gap. If omitted, model generates until completion.
max_tokens	integer	Optional	Maximum tokens to generate for the middle segment. Default: 4096.

Python — FIM Code Completion

# Must use beta base URL for FIM
beta_client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com/beta",
)

response = beta_client.completions.create(
    model="deepseek-v4-pro",
    prompt="def fibonacci(n):\n    if n <= 1:\n        return n\n",
    suffix="\n    return fibonacci(n-1) + fibonacci(n-2)",
    max_tokens=256,
)
print(response.choices[0].text)  # filled middle segment
    

Beta

Chat Prefix Completion Beta

Force the assistant to begin its response with a specific prefix. Useful for output format control, structured completions, and guided generation. Set the last message role to assistant with prefix: true. Requires base_url="https://api.deepseek.com/beta".

Python — Prefix Completion

response = beta_client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user",      "content": "Write a haiku about programming."},
        {"role": "assistant", "content": "Fingers on keyboard,", "prefix": True},
    ],
)
# Model completes the haiku starting from "Fingers on keyboard,"
    

Format

Anthropic API Format

The DeepSeek API also supports the Anthropic Messages API format at https://api.deepseek.com/anthropic. This allows existing Anthropic SDK code to work with DeepSeek models by changing the base URL and API key only.

Python — Anthropic SDK

# pip install anthropic
import anthropic

client = anthropic.Anthropic(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com/anthropic",
)

message = client.messages.create(
    model="deepseek-v4-flash",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)
    

Conversation

Multi-Turn Conversations

The DeepSeek API is stateless — each call is independent. To maintain conversation context, include the full message history on every request. This is identical to the OpenAI pattern.

🚨 Thinking Mode Multi-Turn

When using thinking mode, the response contains both content and reasoning_content. When building conversation history, only include content — never include reasoning_content. Including the thinking chain in history significantly degrades subsequent response quality.

Python — Stateless Multi-Turn

history = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def chat(user_msg):
    history.append({"role": "user", "content": user_msg})
    resp = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=history,
    )
    assistant_content = resp.choices[0].message.content
    # Only append .content — NOT .reasoning_content
    history.append({"role": "assistant", "content": assistant_content})
    return assistant_content

chat("What is the capital of France?")
chat("What language do they speak there?")  # context is preserved
    

Integrations

Agent & IDE Integrations

The DeepSeek API works directly with popular agent frameworks and coding tools — no wrapper code required. Switch the backend model in a config file.

Tool	Integration Type	Config
Claude Code	Agentic CLI coding assistant	`ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic`
GitHub Copilot	IDE code completion (VS Code)	Set custom model endpoint in Copilot settings
OpenCode	Terminal AI coding assistant	OpenAI-compatible — set `baseUrl` and `apiKey`
Continue.dev	VS Code / JetBrains plugin	Add DeepSeek provider with `base_url` and key in `config.json`
Cursor	AI-first IDE	Add custom OpenAI-compatible endpoint in Models settings
LangChain	LLM orchestration framework	`ChatOpenAI(base_url="https://api.deepseek.com", model="deepseek-v4-flash")`

Account

GET /user/balance

Returns your current account balance. Useful for pre-flight checks before starting long-running jobs. The API returns both topped-up balance (purchased) and granted balance (free credits). Granted balance is consumed first.

GET https://api.deepseek.com/user/balance

cURL

curl https://api.deepseek.com/user/balance \ -H "Authorization: Bearer $DEEPSEEK_API_KEY"

Response · 200 OK200

{ "is_available": true, "balance_infos": [ { "currency": "CNY", "total_balance": "0.98", "granted_balance": "0.00", "topped_up_balance": "0.98" } ] }

Pricing

Token Pricing

Pay-as-you-go per token. No monthly subscription. No minimum spend. Granted balance (free credits) is consumed before topped-up balance. All prices in USD per 1M tokens.

💡 V4-Pro Promo

V4-Pro is at 75% discount until May 31, 2026: $0.435/1M input (regular $1.74/1M), $0.87/1M output (regular $3.48/1M). After May 31, regular pricing applies. Cache hit prices are 1/10 of cache miss prices across all models (effective 2026-04-26).

Model	Context	Max Out	Input (cache hit)	Input (cache miss)	Output	Notes
deepseek-v4-flash	1M	384K	$0.014/1M	$0.14/1M	$0.28/1M	Default — recommended
deepseek-v4-pro	1M	384K	$0.044/1M	$0.435/1M 75% OFF	$0.87/1M promo	Promo until May 31
deepseek-v4-pro	1M	384K	$0.174/1M	$1.74/1M	$3.48/1M	Regular (after May 31)
deepseek-chat DEP	128K	8K	$0.014/1M	$0.14/1M	$0.28/1M	Retires Jul 24, 2026
deepseek-reasoner DEP	128K	64K CoT	$0.014/1M	$0.14/1M	$0.28/1M	Retires Jul 24, 2026

Cache hit price = 1/10 of cache miss price. New accounts: $5 in free API credits (≈ 35M input tokens on V4-Flash). Always verify at api-docs.deepseek.com.

Error Handling

Error Codes

HTTP status codes and error response format. All errors include a JSON body with error.message and error.type fields.

400

Bad Request / Invalid Format

Malformed request — invalid JSON, missing required field, or unsupported parameter combination. Check that messages is a non-empty array, model is a valid model ID, and parameter types are correct. FIM + JSON mode is not supported simultaneously.

401

Authentication Failure

Invalid, missing, or revoked API key. Verify Authorization: Bearer sk-... header. Check the API key is active in the Platform console. Do not wrap the key in extra quotes.

402

Insufficient Balance

Account balance is insufficient to complete the request. Add credits at platform.deepseek.com/billing. Use GET /user/balance to pre-check balance before long-running jobs.

422

Unprocessable Entity

Request structure is valid but contains logical errors — e.g., FIM mode with thinking enabled, prefix completion without prefix: true, or incompatible parameter combinations. Check the error message body for the specific violation.

429

Rate Limit Exceeded

Too many requests. Implement exponential backoff with jitter. Check the Retry-After header if present. For production systems requiring guaranteed capacity, consider routing through AWS Bedrock or Azure AI for dedicated throughput.

500

Internal Server Error

Unexpected error on the DeepSeek server. Retry with exponential backoff. If the error persists across multiple retries, check status.deepseek.com for ongoing incidents.

503

Service Unavailable / Overloaded

Server is temporarily overloaded. Implement retry logic with at least 30 seconds delay. For mission-critical workloads requiring 99.9%+ SLA, route through a cloud provider (AWS Bedrock, Azure AI, Together AI, Fireworks) for dedicated capacity.

Error Response Format

Error Response4xx

{ "error": { "message": "Invalid API key provided: sk-xxx...xxx", "type": "invalid_request_error", "code": "invalid_api_key" } }

Limits

Rate Limits

DeepSeek does not publish fixed public rate limits — limits are applied dynamically based on account tier and server load. Free-tier accounts (using granted balance) are subject to stricter limits than paid accounts.

ℹ Retry Strategy

Implement exponential backoff on 429 and 503 responses: start at 1s, double on each retry, cap at 60s, add 0–1s jitter. Check the Retry-After header first. For high-volume production workloads with SLA requirements, DeepSeek recommends routing through cloud provider endpoints (AWS Bedrock, Azure AI) which offer dedicated capacity.

Python — Retry with Backoff

import time, random

def call_with_retry(client, **kwargs):
    max_retries, delay = 5, 1.0
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(**kwargs)
        except Exception as e:
            status = getattr(e, "status_code", None)
            if status in (429, 500, 503) and attempt < max_retries - 1:
                jitter = random.uniform(0, 1)
                time.sleep(min(delay + jitter, 60))
                delay *= 2
            else:
                raise
    

Migration

Migration Guide

From OpenAI

Migrating from GPT to DeepSeek requires two line changes. All request/response structure, streaming, tool calling, and JSON mode work identically.

Python — OpenAI → DeepSeek

# BEFORE (OpenAI)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
model = "gpt-4o"

# AFTER (DeepSeek) — only these two lines change
client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com",  # ← add this
)
model = "deepseek-v4-flash"  # ← change model name
    

From Legacy Aliases

If your code uses deepseek-chat or deepseek-reasoner, update before July 24, 2026:

Migration Map

# deepseek-chat  →  deepseek-v4-flash  (no thinking, same pricing)
# deepseek-reasoner → deepseek-v4-flash  (with thinking enabled)
#               OR → deepseek-v4-pro    (if you need stronger reasoning)

# For deepseek-chat replacement:
model = "deepseek-v4-flash"  # drop-in, same price, better context

# For deepseek-reasoner replacement:
model = "deepseek-v4-flash"
extra_body = {"thinking": {"type": "enabled"}}

# Or use V4-Pro for stronger reasoning:
model = "deepseek-v4-pro"
reasoning_effort = "high"
    

Changelog

API Changelog

2026-04-24

DeepSeek-V4 — V4-Pro & V4-Flash

New: deepseek-v4-pro and deepseek-v4-flash. 1M context window. Available via OpenAI and Anthropic format. Legacy aliases (deepseek-chat, deepseek-reasoner) deprecated — retirement date July 24, 2026. Cache hit price reduced to 1/10 of launch price.

2025-12-01

DeepSeek-V3.2 — deepseek-chat & deepseek-reasoner upgraded

Both aliases upgraded to DeepSeek-V3.2. deepseek-chat → V3.2 non-thinking. deepseek-reasoner → V3.2 thinking. V3.2-Speciale served via temporary endpoint (expired Dec 15, 2025).

2025-05-28

R1-0528 — deepseek-reasoner upgraded + function calling

deepseek-reasoner upgraded to R1-0528. Added function calling and JSON output support to R1. System prompts now supported. AIME 2025 accuracy: 70% → 87.5%.

2025-01-20

DeepSeek-R1 — deepseek-reasoner launched

deepseek-reasoner launched as the reasoning alias, serving DeepSeek-R1. 97.3% MATH-500. Introduced reasoning_content field and chain-of-thought API surface.

2024-08-02

Context Caching & FIM Beta

Launched automatic context caching on disk. Launched FIM Completion Beta and Chat Prefix Completion Beta via api.deepseek.com/beta.

DeepSeek API Reference

Authentication

Your First API Call

Base URLs

Available Models

List Models (API)

POST /chat/completions

Request Parameters

Core Parameters

Thinking Mode Parameters

Response Object

finish_reason Values

Function Calling / Tool Use

JSON Mode

Streaming (SSE)

Raw SSE Format

Context Caching

Reading Cache Usage

FIM Completion Beta

Chat Prefix Completion Beta

Anthropic API Format

Multi-Turn Conversations

Agent & IDE Integrations

GET /user/balance

Token Pricing

Error Codes

Error Response Format

Rate Limits

Migration Guide

From OpenAI

From Legacy Aliases

API Changelog