DeepSeek API | Powerful & Cost-Effective AI Integration

Available Models

Choose Your Model

V4 is the current generation. Legacy aliases retire July 24, 2026 — migrate to explicit model IDs now.

V4 Series

V3 Series

Specialized

⚡

FLAGSHIP

DeepSeek-V4-Pro

model: "deepseek-v4-pro"

1.6 trillion parameter MoE model. 80.6% SWE-bench, #1 Codeforces (3,206), gold medal IMO 2025. Best open-weights model available. Three reasoning modes: Non-Think, Think High, Think Max.

1.6T

Params

1M

Context

384K

Max Out

Input: $1.74/1M · Output: $3.48/1M (75% off until May 31)

Use in API →

💨

FAST

DeepSeek-V4-Flash

model: "deepseek-v4-flash"

284B parameter MoE, 13B active. Within 1–2 points of Pro on most benchmarks at 12× lower cost. The recommended default for production workloads and high-volume pipelines.

284B

Params

1M

Context

384K

Max Out

Input: $0.14/1M · Output: $0.28/1M

Use in API →

⏰

LEGACY

Legacy Aliases

retiring: 2026-07-24

deepseek-chat → V4-Flash (non-think)

deepseek-reasoner → V4-Flash (think)

Migrate before July 24, 2026 to avoid service disruption. No other changes required.

⚠️

Retire

Jul 24

Deadline

2026

Year

⚠️ Update model name only — no other changes needed

Migration Guide →

🔵

STABLE

DeepSeek-V3.2

model: "deepseek-v3.2" (via OpenRouter)

671B parameter MoE (37B active). The previous generation flagship — still an excellent value at lower cost than V4. 128K context window. Gold medal on IMO and IOI 2025. Strong for general chat, RAG, and summarization where V4's improvements aren't needed.

671B

Params

128K

Context

37B

Active

Input: $0.28/1M · Cache hit: $0.028/1M · Output: $0.42/1M

Use in API →

💻

CODE

DeepSeek-Coder V2

deepseek-coder-v2

Purpose-built for software engineering. 82.6% HumanEval — beats GPT-4o. Understands full repositories, not just snippets. 340+ languages with expert-level Python, Java, C++, and TypeScript support.

82.6%

HumanEval

340+

Languages

2T

Code tokens

Check current pricing at api-docs.deepseek.com

View on GitHub →

🧠

REASONING

DeepSeek-R1

deepseek-r1

Pure RL-trained reasoning model. 97.3% on MATH-500. Develops chain-of-thought organically. Best for competition math, logical proofs, and multi-step scientific reasoning.

97.3%

MATH-500

CoT

Reasoning

64K

Max out

Input: $0.55/1M · Output: $2.19/1M

View Paper →

👁️

VISION

DeepSeek-VL

deepseek-vl

Vision-language model for image understanding, OCR, chart analysis, and document parsing. Process screenshots, diagrams, and photos alongside text for multimodal workflows.

Multi

Modal

OCR

Docs

API

Ready

Check current pricing at api-docs.deepseek.com

View on GitHub →

API Capabilities

Everything You Need

A complete developer platform — not just a model endpoint.

🔄

OpenAI-Compatible

Drop-in replacement for the OpenAI API. Change base_url and api_key — your existing code, SDKs, and integrations work without any other modification.

Zero migration effort

💾

Auto Context Caching

Automatic prompt prefix caching — no configuration needed. Repeated system prompts, documents, and conversation history serve from cache at 90% discount. Saves thousands of dollars at scale.

90% cost reduction

⚡

Streaming (SSE)

Server-Sent Events streaming for real-time token delivery. Essential for chat UIs and interactive applications. Set stream=True in any request — no other changes.

Real-time output

🔧

Function Calling

Native tool use with OpenAI-compatible JSON schema definitions. Both V4-Pro and V4-Flash support multi-tool calls, parallel execution, and structured function responses for AI agent workflows.

Agent-ready

📐

JSON Mode

Force structured JSON output with guaranteed schema adherence. Ideal for data extraction, classification pipelines, and any application requiring machine-readable responses.

Structured output

🤔

Thinking Modes

Three reasoning effort levels: Non-Think (instant), Think High (analytical), Think Max (maximum effort). Control latency vs accuracy dynamically per-request without changing models.

Adaptive reasoning

📏

1M Token Context

Process entire codebases, books, legal contracts, or conversation histories in a single request. CSA+HCA architecture makes 1M-token inference economically viable — not just a paper spec.

Longest context

🔬

Anthropic API Format

In addition to OpenAI-compatible endpoints, DeepSeek supports the Anthropic Messages API format. Teams already using Claude can migrate with minimal code changes.

Multi-format

📊

Usage Analytics

Real-time dashboard tracking token consumption, cache hit rates, cost per model, and request latency. Response objects include usage.prompt_cache_hit_tokens for precise billing.

Cost visibility

Quickstart

Up in 5 Minutes

From zero to your first API response — works with any OpenAI-compatible SDK.

1

Create your account

Go to platform.deepseek.com and sign up. You receive 5 million free tokens with no credit card required.

2

Generate your API key

Navigate to API Keys in your dashboard. Keys look like sk-xxxxxxxx.... Store it in your environment — never commit it to source control.

3

Install the OpenAI SDK

Run pip install openai (Python) or npm install openai (Node.js). No DeepSeek-specific library needed — the official OpenAI SDK works.

4

Point to DeepSeek's endpoint

Set base_url="https://api.deepseek.com/v1" and your key. Use model deepseek-v4-flash or deepseek-v4-pro. That's the entire migration.

5

Optimize for caching

Place your system prompt first and keep it consistent across requests. After the first call, it's cached at $0.014/1M — a 90% reduction. No code changes needed.

Python

Node.js

cURL

Stream

# pip install openai
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DEEPSEEK_API_KEY"),
  base_url="https://api.deepseek.com/v1"
)

response = client.chat.completions.create(
  model="deepseek-v4-flash", # or deepseek-v4-pro
  messages=[
    {"role": "system",
     "content": "You are a helpful assistant."},
    {"role": "user",
     "content": "What is 1M-token context useful for?"}
  ],
  max_tokens=1024
)

print(response.choices[0].message.content)
# Check cache hit tokens:
print(response.usage.prompt_cache_hit_tokens)

// npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: 'https://api.deepseek.com/v1'
});

const res = await client.chat.completions.create({
  model: 'deepseek-v4-flash',
  messages: [
    { role: 'system', content: 'You are helpful.' },
    { role: 'user', content: 'Hello!' }
  ],
  max_tokens: 512
});

console.log(res.choices[0].message.content);
// Cache hits
console.log(res.usage?.prompt_cache_hit_tokens);

# Replace $DEEPSEEK_API_KEY with your key
curl https://api.deepseek.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are helpful."},
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 512
  }'

# Streaming: tokens arrive in real-time
from openai import OpenAI

client = OpenAI(
  api_key="<your-key>",
  base_url="https://api.deepseek.com/v1"
)

stream = client.chat.completions.create(
  model="deepseek-v4-flash",
  messages=[{"role":"user","content":"Count to 10"}],
  stream=True  # ← enable streaming
)

for chunk in stream:
  delta = chunk.choices[0].delta.content
  if delta:
    print(delta, end="", flush=True)

Pricing

Pay Only for What You Use

No monthly subscription. No seat fees. New accounts get 5M tokens free.

Token Pricing

vs Competitors

Cost Calculator

🔥 75% OFF until May 31

Flagship Reasoning

V4-Pro

deepseek-v4-pro

Input (cache miss)$0.435/1M

Input (cache hit)$0.044/1M

Output tokens$0.87/1M

Context window1M tokens

Max output384K

Use V4-Pro

Fast & Affordable

V4-Flash

deepseek-v4-flash

Input (cache miss)$0.14/1M

Input (cache hit)$0.014/1M

Output tokens$0.28/1M

Context window1M tokens

Max output384K

Use V4-Flash

General Purpose

V3.2 (Chat)

deepseek-v3.2

Input (cache miss)$0.28/1M

Input (cache hit)$0.028/1M

Output tokens$0.42/1M

Context window128K

Max output8K

Use V3.2

Free Tier

New Account

No card required

Free tokens5M tokens

Equivalent to~3,500 calls

Models includedAll models

Card required?No

Min top-up$2.00

Start Free →

Prices verified April 28, 2026 · Always confirm at api-docs.deepseek.com/quick_start/pricing

Model	Input /1M	Output /1M	Context	Open?	Cache?
V4-Flash	$0.14	$0.28	1M	✓ MIT	Auto
V4-Pro	$1.74	$3.48	1M	✓ MIT	Auto
V3.2	$0.28	$0.42	128K	✓ MIT	Auto
GPT-5.5	$5.00	$20.00	128K	✗	Manual
Claude Opus 4.7	$5.00	$25.00	200K	✗	Manual
Gemini 3 Pro	$1.25	$5.00	1M	✗	Manual
GPT-5.4	$10.00	$30.00	128K	✗	Manual

Estimate Your Monthly Cost

Model

Monthly API calls

Avg input tokens / call

Avg output tokens / call

Cache hit rate (%)

$0.00

Monthly cost

$0.00

Per call avg

0%

Cache savings

Context Caching

Automatic 90% Discount

DeepSeek caches repeated prompt prefixes for free. No configuration, no TTL settings, no code changes.

─── REQUEST 1 (cache cold) ───────────
system: "You are a Python expert..." ← MISS
docs: [entire codebase, 50K tokens] ← MISS
user: "Review auth.py" ← MISS
Cost: 50,500 tokens × $0.14/1M = $0.0071

─── REQUEST 2 (cache warm) ───────────
system: "You are a Python expert..." ← HIT ✓
docs: [entire codebase, 50K tokens] ← HIT ✓
user: "Review utils.py" ← MISS
Cost: 500 miss + 50K hit × $0.014/1M
Cost: $0.0007 (90% cheaper!) ✓

Output: billed at $0.28/1M regardless

Cache Hit Rate

90%+

Achievable with consistent system prompts

Cost Reduction

Up to 90%

$0.014/1M vs $0.14/1M on cache hits

How to Maximize Hits

1. Put system prompt first — always
2. Front-load static documents
3. Keep variable content (user queries) last
4. Consistent prefixes = automatic caching

Monitoring

Check response.usage.prompt_cache_hit_tokens in every response to track your savings.

Integrations

Works With Your Stack

All OpenAI-compatible tools work out-of-the-box. Just change the base URL.

🐍

Python

pip install openai

🟨

Node.js

npm install openai

🦜

LangChain

pip install langchain

🦙

LlamaIndex

pip install llama-index

🤗

Open Weights

huggingface-cli

🦙

Ollama

ollama pull deepseek

Rate Limits & Errors

What to Expect

Dynamic concurrency limits and standard HTTP error codes. Here's how to handle them.

⚡

Rate Limits

Rate limit typeDynamic concurrency

Limit signalHTTP 429

Response headerRetry-After

Best practiceExponential backoff

Strict per-key limitNo hard cap

Peak hour behaviorHigher latency

📊

API Limits

Context window1M tokens

Max output (V4)384K tokens

Max output (V3.2)8K tokens

Min account top-up$2.00

Free credits5M tokens

FormatsOpenAI + Anthropic

Common Error Codes

401

Unauthorized

API key is missing, invalid, or expired. Verify the key in your environment and check it hasn't been revoked from the dashboard.

402

Insufficient Balance

Account balance too low. Add credits at platform.deepseek.com. Minimum top-up is $2.00.

404

Model Not Found

Check the model name string. Use deepseek-v4-flash or deepseek-v4-pro. Legacy aliases retire July 24, 2026.

429

Too Many Requests

Concurrency limit reached. Implement exponential backoff. Check the Retry-After header for the wait duration.

500

Server Error

Internal error on DeepSeek's side. Retry with backoff. Check service status at status.deepseek.com.

503

Service Unavailable

Peak load or maintenance. Most common during peak hours (UTC 08:00–16:00). Consider off-peak scheduling (UTC 16:30–00:30) for batch jobs.

FAQ

Developer Questions

Is the DeepSeek API really OpenAI-compatible? +

Yes — fully. The API uses the same endpoint structure (/v1/chat/completions), the same request/response schema, the same streaming format (SSE), and supports both the OpenAI Python/Node SDKs without modification. Change only base_url to https://api.deepseek.com/v1 and swap your API key. Function calling, JSON mode, system prompts, and multi-turn conversations all work identically. The Anthropic Messages format is also supported as an alternative.

How do I migrate from deepseek-chat / deepseek-reasoner? +

Replace model="deepseek-chat" with model="deepseek-v4-flash" for standard mode, and model="deepseek-reasoner" with model="deepseek-v4-flash" (then enable thinking mode via extra_body if needed). No other changes required. The base URL, authentication, and request format remain identical. Deadline: July 24, 2026, 15:59 UTC — after this, legacy aliases will error with no fallback.

Do I need to configure context caching? +

No — caching is completely automatic. When a request shares the same prefix as a recent request, DeepSeek automatically serves those tokens from cache at 10% of the normal input price (e.g., $0.014/1M instead of $0.14/1M for V4-Flash). There's no setup, no cache keys, and no TTL configuration. To maximize savings: keep your system prompt identical across requests and place static documents before dynamic user content. Monitor cache hits via response.usage.prompt_cache_hit_tokens.

What's the difference between V4-Flash and V4-Pro? +

V4-Flash (284B total, 13B active) is 12× cheaper than V4-Pro and within 1–2 benchmark points on most coding and analysis tasks. V4-Pro (1.6T total, 49B active) is the flagship — it has the highest Codeforces score (3,206), 80.6% SWE-bench, and stronger reasoning on the most complex tasks. Start with Flash; only upgrade to Pro if your evaluation shows a meaningful quality gap for your specific workload. Both support 1M context, 384K output, and all three thinking modes.

Is there a free tier? How do I get API keys? +

Yes. Every new account at platform.deepseek.com receives 5 million free tokens with no credit card required. This is enough for ~3,500 typical API calls. After the free tier, the minimum top-up is $2.00. There are no monthly fees or seat licenses — you pay only for tokens consumed. Granted credits are used first before topped-up balance.

How do rate limits work? +

DeepSeek uses dynamic concurrency-based rate limiting rather than fixed token-per-minute caps. When you reach the limit, you receive an HTTP 429 with a Retry-After header. Implement exponential backoff in your retry logic. The API doesn't enforce strict per-account request limits, but during peak demand periods (roughly UTC 08:00–16:00) you may see higher latency or temporary 503 errors. For batch jobs, scheduling during off-peak hours (UTC 16:30–00:30) can reduce both cost and latency.

Can I use DeepSeek with LangChain or LlamaIndex? +

Yes — any framework that supports the OpenAI API format works. For LangChain: use ChatOpenAI with openai_api_base="https://api.deepseek.com/v1". For LlamaIndex: configure OpenAI with the same base URL override. Claude Code, OpenClaw, and OpenCode are officially integrated with DeepSeek V4 as well.

Is it safe to send sensitive data to the DeepSeek API? +

DeepSeek is a Chinese company and data processed through the public API may be stored on servers subject to Chinese law. For non-sensitive workloads (code review, public data analysis, marketing copy) this is generally fine. For sensitive data (healthcare, financial, legal, PII), recommended approaches are: (1) self-host the open-weight models on your own infrastructure — the MIT license allows this at no cost, or (2) access DeepSeek models via compliant cloud providers (AWS Bedrock, Azure AI) that offer HIPAA/SOC2 compliance and data residency guarantees.

Build Smarter Apps
with the DeepSeek API

Choose Your Model

Everything You Need

Up in 5 Minutes

Pay Only for What You Use

Automatic 90% Discount

Works With Your Stack

What to Expect

Common Error Codes

Unauthorized

Insufficient Balance

Model Not Found

Too Many Requests

Server Error

Service Unavailable

Developer Questions

Your First API Call
in Under 5 Minutes

Build Smarter Appswith the DeepSeek API

Choose Your Model

Everything You Need

Up in 5 Minutes

Pay Only for What You Use

Automatic 90% Discount

Works With Your Stack

What to Expect

Common Error Codes

Unauthorized

Insufficient Balance

Model Not Found

Too Many Requests

Server Error

Service Unavailable

Developer Questions

Your First API Callin Under 5 Minutes

Build Smarter Apps
with the DeepSeek API

Your First API Call
in Under 5 Minutes