DeepSeek Prompt Guide — Techniques, Templates & Best Practices 2026

Mode Selection

Choose Your Mode First

The wrong mode wastes tokens and degrades results. Mode selection is the most important decision before writing a single word of your prompt.

⚡ INSTANT MODE 💬

V4-Flash

model: deepseek-v4-flash

Fast, high-volume tasks. Everyday chat, quick Q&A, summaries, translations, and code snippets. 83 tok/s output speed — the fastest in its class.

✓Speed-sensitive tasks & high-volume APIs

✓Simple agents — on par with Pro

✓12.4× cheaper than V4-Pro on output

✗Avoid for complex multi-step agentic tasks

⭐ Most Capable

🧠 EXPERT MODE ⚙️

V4-Pro

model: deepseek-v4-pro

Flagship intelligence. Complex coding, deep analysis, advanced reasoning, and multi-step agentic tasks. State the goal — let the model find the path. 80.6% SWE-bench.

✓Complex software engineering & code review

✓Multi-step agentic & tool-use workflows

✓CO-STAR + XML prompts — max benefit here

✗Overkill for simple chat or summaries

🔎 DEEPTHINK 🔬

Think Max / R1

thinking: {type:"enabled", budget:"max"}

Chain-of-thought reasoning — the model shows every step. 97.3% MATH-500. IMO Gold Medal 2025. Use for hard math, formal proofs, and logic puzzles.

✓Minimal prompts — state problem & output

✓Set context window ≥ 384K tokens

✗Never add "think step by step" — already does

✗No few-shot examples (degrades performance)

⚠️ July 24, 2026: Legacy aliases deepseek-chat and deepseek-reasoner retire permanently. Migrate to deepseek-v4-flash or deepseek-v4-pro before that date.

Framework

The CO-STAR Framework

DeepSeek V4 responds best to structured prompts. CO-STAR gives every prompt six dimensions of precision. Use all six for complex tasks — fewer for simple ones.

Context

Background, who you are, what you're working on. Prevents wrong assumptions.

I'm a senior Python dev refactoring a Django 2.7 monolith...

Objective

One clear deliverable. "Help me" is not an objective. Be specific.

Write a migration plan to Django 4.2 with zero-downtime steps.

Style

How the response should be written — doc, bullets, code-heavy, Socratic.

Detailed technical doc with numbered phases and code examples.

Tone

The emotional register. Direct and terse, warm, or critical and rigorous.

Be direct. Flag risks bluntly. Assume I can handle bad news.

Audience

Who reads the output. Adjusts vocabulary and depth of explanation.

Senior engineers who know Django but not migration tooling.

Response

Exact output format — JSON, markdown, sections, word count.

Return: three phases, each with title, steps, and risk callout.

V4 tip: Wrap CO-STAR sections in XML tags like <context>, <objective>, <response_format> — V4 uses these as unambiguous boundaries between instructions and the content you're analyzing.

Copy-Ready Templates

Prompts That Actually Work

Tested and structured. Every template uses principles validated across the DeepSeek developer community in 2026.

Code Review — Expert Mode

Multi-angle structured review with explicit severity levels and verdict

Expert / V4-Pro

<context> Reviewing a Node.js payment processing service. PCI-DSS compliance required. </context> <objective> Review for: (1) security vulnerabilities, (2) performance issues, (3) code quality, (4) PCI-DSS compliance risks. </objective> <code> [PASTE YOUR CODE HERE] </code> <response_format> Four sections: SECURITY, PERFORMANCE, CODE_QUALITY, COMPLIANCE. Each issue: [SEVERITY: critical/major/minor] — Description. End with VERDICT: approve / request_changes and one-line summary. </response_format>

Why it works: XML tags give DeepSeek unambiguous section boundaries. Severity levels force prioritization. VERDICT prevents vague conclusions.

Debugging — Chain-of-Draft

Efficient debugging using minimal reasoning steps (~80% fewer tokens)

Instant / V4-Flash

Debug this error. Think step by step, but only keep a minimum draft for each thinking step (5 words or less per step). Then give the fix. Error: TypeError: Cannot read properties of undefined (reading 'map') Code: [PASTE RELEVANT CODE] Provide: 1. Root cause (1–2 sentences) 2. The fix (code block) 3. How to prevent it (1 sentence)

Why it works: Chain-of-Draft reduces token usage ~80% vs full chain-of-thought while keeping reasoning quality. Fixed output structure prevents rambling.

Spec → Implementation Pipeline

Three-stage scaffold: analyze → plan → code. Each stage is checkable.

Expert / V4-Pro

<task> Build a REST API endpoint for user authentication with JWT tokens. Requirements: - POST /auth/login — accepts email + password - Returns access token (15 min) + refresh token (7 days) - Rate limit: 5 failed attempts → 15-min lockout - Stack: Node.js, Express, PostgreSQL, Redis (rate limiting) </task> <pipeline> Stage 1 — SPEC: List all edge cases and assumptions. Stage 2 — PLAN: Data flow diagram (text), dependencies, file structure. Stage 3 — CODE: Full implementation with inline comments. Post-Stage 3: List assumptions made and what info would change them. </pipeline>

Why it works: Each stage produces a checkable artifact — wrong assumptions surface in Stage 1, not after writing 200 lines. Mirrors professional engineering workflow.

Persona-Layered Expert Writing

Assign a specific expert voice to prevent generic, watered-down answers

Instant / V4-Flash

You are Dr. Sarah Chen, a Stanford-trained ML researcher with 15 years in NLP. You explain complex concepts using practical analogies. You push back on oversimplifications and cite specific techniques by name. Explain how attention mechanisms work in transformers. Audience: CS degree, no ML background. Write 400 words. Use one real-world analogy. Explicitly call out two common misconceptions.

Why it works: Persona layering gives DeepSeek a specific lens. "Push back" instruction prevents generic answers. Works best in chat mode — keep lighter in DeepThink.

Tone-Shifted Rewrite

Preserve content while changing register and audience completely

Instant / V4-Flash

<original> [PASTE YOUR DRAFT HERE] </original> <task> Rewrite for a non-technical executive making a $500K budget decision. </task> <constraints> - Keep all factual claims. Change nothing in the data. - Replace all jargon with plain equivalents. - Front-load the single most important sentence. - Max 250 words. </constraints>

Why it works: XML tags prevent mixing instructions with content. Explicit word limit stops bloat. Audience framing completely changes vocabulary calibration.

Document Analysis — 1M Context

Upload entire documents and extract structured intelligence in one pass

Expert / V4-Pro

<document> [PASTE FULL DOCUMENT — up to 750,000 words] </document> <analysis_tasks> 1. Executive summary: 3 sentences. Who wrote this, what it argues, why it matters. 2. Key claims: 5 most important assertions. For each: claim text | evidence strength (strong / weak / none) 3. Gaps: What important questions does this document NOT answer? 4. Actions: What should someone do after reading? Numbered list. </analysis_tasks> If uncertain about a claim, say so explicitly. Do not invent citations.

Why it works: Numbered tasks create structured output. "Flag uncertainty" prevents hallucination. V4-Pro's 1M context processes entire books in one pass.

Hard Math — DeepThink Mode

Minimal prompt. Let R1 reasoning do the heavy lifting.

DeepThink / R1

// Enable DeepThink toggle in chat, or via API: // extra_body={"thinking":{"type":"enabled","budget":"max"}} Find all prime numbers p such that p² + 14 is also prime. Prove your answer is complete. Deliverable: The complete set, with proof no other solutions exist.

Why it works: DeepThink prompts should be short and precise. "Prove complete" forces exhaustive reasoning. The model's internal CoT handles multi-step work — no need to spell out steps.

Self-Check Pattern

Force the model to verify its own answer — catches ~70% of errors

Expert / V4-Pro

<task> [YOUR MATH OR LOGIC PROBLEM] </task> <requirements> Work through the solution step by step. Before finalizing, check: * Does the answer satisfy all constraints? * Did you handle all edge cases? * Is the arithmetic correct? List any assumptions made. If uncertain, state what info would resolve the uncertainty. Return ONLY the final verified answer after your checks. </requirements>

Why it works: Explicit self-check breaks a documented failure mode where models reinforce shaky intermediate steps. Forces assumptions to be stated, not buried.

Balanced Research Summary

Forces multiple perspectives and explicit uncertainty flagging

Expert / V4-Pro

<topic>[YOUR RESEARCH TOPIC]</topic> Provide a balanced research summary with four sections: 1. CONSENSUS — What do most credible sources agree on? 2. CONTESTED — Where do experts meaningfully disagree and why? 3. GAPS — What important questions remain unanswered? 4. SOURCES — 5 specific sources to read. Format: Author (Year). Title. Why relevant. Use hedging language ("evidence suggests", "some researchers argue") for claims that aren't universally accepted.

Why it works: CONTESTED section forces genuine disagreement over false consensus. Hedging language instruction reduces overconfident claims on uncertain topics.

Decision Framework

Structured decision analysis — forces a real recommendation

Instant / V4-Flash

I need to decide: [YOUR DECISION] Context: [2–3 sentences about your situation] Framework: 1. List top 3 decision criteria for my situation 2. Score each option 1–10 per criterion 3. Note the single biggest risk with each option 4. Give your recommendation — commit to one. Explain in one paragraph. If you can't decide, tell me what info would resolve it. Do not hedge. Give a real recommendation.

Why it works: "Do not hedge" breaks DeepSeek's tendency to list pros/cons without committing. Scoring forces explicit comparison. The escape valve prevents false confidence.

Misconception Correction Method

Learn any topic starting with what most people get wrong

Instant / V4-Flash

Teach me [TOPIC] using the misconception correction method. 1. State the most common wrong belief about this topic 2. Explain why that belief is appealing (it's not stupid to hold it) 3. Show where it breaks down with a concrete example 4. Give the correct model or framework 5. Give me one test to tell them apart in real situations

Why it works: Starting with the misconception activates curiosity and contrast — deepening retention. The test at the end forces actionable, not just theoretical, knowledge.

Core Principles

12 Rules for Better DeepSeek Prompts

Distilled from official DeepSeek documentation, developer community testing, and API behavior in 2026.

🏷️

RULE 01

Use XML tags for boundaries

Wrap sections in <context>, <code>, <constraints>. Prevents DeepSeek from confusing your instructions with the content being analyzed.

V4 recommended

🚫

RULE 02

No "think step by step" in DeepThink

R1 mode reasons internally via CoT already. Adding step-by-step instruction is redundant and consistently interferes with native reasoning.

Common mistake

💾

RULE 03

Design for context caching

Cached tokens cost 90% less ($0.014 vs $0.14/1M). Put stable system prompts and examples at the top — repeated prefixes hit cache automatically.

Cost saver

🎯

RULE 04

Name a concrete output artifact

"Help me with my code" → "Return: a JSON diff of changed lines, one-line justification each." Named deliverables produce dramatically better results.

High impact

📏

RULE 05

Temperature 0.5–0.7 for R1

DeepSeek's official recommendation: temperature 0.5–0.7 (0.6 default), top-p 0.95. Above 0.7 risks incoherent reasoning. R1 ignores most other sampling params.

API param

🔇

RULE 06

Skip few-shot in DeepThink

Examples consistently degrade R1 performance — well-documented in official DeepSeek guides. In chat mode they help. In thinking mode, they distract.

R1 antipattern

⚡

RULE 07

Match model to task complexity

Flash for simple tasks. Pro for complex. Think Max for hard reasoning only. Flash costs 12× less for no quality loss on most everyday tasks.

Cost efficiency

🔁

RULE 08

Stable prefix = cheaper API

Stateless API — every call re-sends full history. Put stable parts (system prompt, examples) first for caching. Variable parts (user query) at the end.

API architecture

✅

RULE 09

Add a self-check instruction

Append "Before finalizing, check: does this satisfy all constraints? List assumptions made." Catches ~70% of errors before they reach you.

Error reducer

🧩

RULE 10

reasoning_content ≠ final answer

In API thinking mode, responses have reasoning_content (CoT) and content (answer). In multi-turn history, only include content — not the reasoning chain.

API gotcha

🔒

RULE 11

Add guardrails for agents

DeepSeek has lighter built-in safety layers. For agentic tasks with tool access, explicitly state what the agent must never do and what success looks like.

Production safety

🏗️

RULE 12

Force <think> if it disappears

DeepThink occasionally skips its reasoning phase. Add "Please start your response with the <think> tag" to re-activate the chain-of-thought.

DeepThink fix

System Prompts

Drop-In System Prompts for V4

Copy directly into your API system prompt field. Work with V4-Flash and V4-Pro. Use minimal or no system prompt with DeepThink mode.

Expert Code Reviewer

Persona

You are an expert senior software engineer specializing in security, performance optimization, and clean architecture. When reviewing code: - Flag security vulnerabilities first, sorted by severity - Identify performance bottlenecks with estimated impact - Suggest refactors only when they improve maintainability meaningfully - Give concrete, copy-pasteable fixes — not just descriptions - If the code is good, say so. Don't invent problems. Be direct. Skip pleasantries. Assume the developer is competent.

Structured JSON API Caller

API / Tool Use

You are a precise data extraction API. When given text input: - Extract exactly the fields specified in the user prompt - Return ONLY valid JSON matching the provided schema - If a field cannot be determined, return null for that field - Do not add fields not in the schema - Do not output prose, markdown, or code fences Output must be parseable by JSON.parse() with no pre-processing.

Research Synthesizer

Strict Mode

You are a rigorous research synthesizer. Your standards: 1. Distinguish established consensus from contested claims. Use hedging language ("evidence suggests") for contested points. 2. Never present a single study as definitive proof. 3. If asked about something outside your knowledge cutoff, say so rather than speculating. 4. When summarizing, identify nuances lost in simplification. 5. Cite specific researchers or papers when you can. Flag when you cannot verify a claim.

Agentic Task Runner

Agent / V4-Pro

You are an agentic task planner for software engineering. For each task: 1. PLAN: List all subtasks in dependency order before starting 2. EXECUTE: Work through each subtask, verifying completion 3. VERIFY: Check that the overall goal is satisfied 4. REPORT: Summarize what was done and what was skipped Never do these without explicit user approval: - Delete files or data - Modify production configurations If blocked, state the exact information needed to continue.

Getting Started

How to Use DeepSeek Prompts

From zero to great AI output in minutes — whether you're using the chat interface or the API.

Choose your mode

Use Instant Mode (V4-Flash) for speed and cost. Expert Mode (V4-Pro) for complex tasks. Toggle DeepThink only for hard math, proofs, and logic.

Apply CO-STAR structure

Add Context, Objective, Style, Tone, Audience, and Response format. Use all six for complex tasks. Even two or three elements significantly improve output.

Wrap in XML tags

Use <context>, <task>, <code>, <response_format> tags to create unambiguous boundaries for V4-Pro and V4-Flash.

Name a concrete artifact

Replace "help me" with a named deliverable — "Return a JSON diff", "Write a numbered migration plan", "Produce a 3-row comparison table."

Add a self-check

End with "Before finalizing, check: does this satisfy all constraints? List any assumptions." Catches most errors without a follow-up turn.

Optimize for cache (API)

Put stable system prompts at the top of every request. Cache hits save 90% on input costs — $0.014/1M vs $0.14/1M. Huge savings at scale.

FAQ

Frequently Asked Questions

What's the difference between prompting V4-Flash vs V4-Pro?+

Both models support the same prompting techniques — XML tags, CO-STAR framework, system prompts. V4-Pro benefits more from elaborate structured prompts because it has deeper reasoning capacity (49B active params). V4-Flash is faster and cheaper — great for simpler tasks where a concise direct prompt is all you need. Start with Flash; upgrade to Pro only if output quality falls short on your specific task.

Should I use a system prompt with DeepThink (R1 mode)?+

Keep it minimal or skip it. DeepSeek's official docs recommend placing all instructions in the user prompt for R1/DeepThink mode. The original R1 model doesn't support system prompts at all. V4's DeepThink mode supports them, but benchmarks show heavy system prompts may slightly reduce reasoning performance. Rule of thumb: if using DeepThink for hard math or logic, write a clean direct user prompt. Save elaborate system prompts for chat/Expert mode.

Why do few-shot examples make R1 worse?+

This is well-documented in DeepSeek's own prompting guide. Reasoning models like R1 have chain-of-thought built in — providing examples forces them to follow a specific reasoning path, which conflicts with their internal process. Instead of examples, describe the problem and output format clearly. If you must use examples, ensure they align very precisely with your instructions — even small mismatches degrade output.

How do I get structured JSON output from DeepSeek?+

In the API, use "response_format": {"type": "json_object"} — this forces valid JSON output. In chat, use a system prompt that says "Output ONLY valid JSON. No prose. No markdown fences." and add your schema. Note: JSON mode works best in V4-Flash and V4-Pro. R1/DeepThink is not optimized for structured outputs — use chat/Expert mode when you need JSON.

How do I save costs using context caching?+

Context caching in the DeepSeek API is automatic. The key is prompt design: put stable content (system prompt, instructions, few-shot examples) at the beginning of every request — where it gets cached. Variable content (user query) goes at the end. Cached tokens cost $0.014/1M vs $0.14/1M for uncached — a 90% discount. For production apps with consistent system prompts, this reduces input costs 60–80%.

My DeepThink prompt isn't showing the <think> tag — what do I do?+

This is a known occasional behavior. DeepThink sometimes skips the reasoning phase, producing a direct answer without the <think> block. This reduces quality on hard problems. Fix: add "Please start your response with the <think> tag" to your prompt. If using the API, verify extra_body={"thinking": {"type": "enabled", "budget": "max"}} is set correctly.

Is this guide updated for V4 (April 2026)?+

Yes. This guide covers DeepSeek-V4-Pro and V4-Flash released April 24, 2026. Key V4 changes: both models support 1M token context windows (up from 128K in V3), three reasoning effort modes (Non-Think, Think High, Think Max), and the new model names deepseek-v4-pro and deepseek-v4-flash. The legacy aliases deepseek-chat and deepseek-reasoner retire July 24, 2026.

Write Better
DeepSeek Prompts

Choose Your Mode First

The CO-STAR Framework

Prompts That Actually Work

12 Rules for Better DeepSeek Prompts

Drop-In System Prompts for V4

How to Use DeepSeek Prompts

Frequently Asked Questions

Ready to go deeper?

Write BetterDeepSeek Prompts

Choose Your Mode First

The CO-STAR Framework

Prompts That Actually Work

12 Rules for Better DeepSeek Prompts

Drop-In System Prompts for V4

How to Use DeepSeek Prompts

Frequently Asked Questions

Ready to go deeper?

Write Better
DeepSeek Prompts