The complete guide to prompting DeepSeek V4 — CO-STAR framework, XML tags, copy-ready templates for coding, writing, math, and research. Works in chat and API.
The wrong mode wastes tokens and degrades results. Mode selection is the most important decision before writing a single word of your prompt.
Fast, high-volume tasks. Everyday chat, quick Q&A, summaries, translations, and code snippets. 83 tok/s output speed — the fastest in its class.
Flagship intelligence. Complex coding, deep analysis, advanced reasoning, and multi-step agentic tasks. State the goal — let the model find the path. 80.6% SWE-bench.
Chain-of-thought reasoning — the model shows every step. 97.3% MATH-500. IMO Gold Medal 2025. Use for hard math, formal proofs, and logic puzzles.
deepseek-chat and deepseek-reasoner retire permanently. Migrate to deepseek-v4-flash or deepseek-v4-pro before that date.
DeepSeek V4 responds best to structured prompts. CO-STAR gives every prompt six dimensions of precision. Use all six for complex tasks — fewer for simple ones.
Background, who you are, what you're working on. Prevents wrong assumptions.
One clear deliverable. "Help me" is not an objective. Be specific.
How the response should be written — doc, bullets, code-heavy, Socratic.
The emotional register. Direct and terse, warm, or critical and rigorous.
Who reads the output. Adjusts vocabulary and depth of explanation.
Exact output format — JSON, markdown, sections, word count.
<context>, <objective>, <response_format> — V4 uses these as unambiguous boundaries between instructions and the content you're analyzing.
Tested and structured. Every template uses principles validated across the DeepSeek developer community in 2026.
Why it works: XML tags give DeepSeek unambiguous section boundaries. Severity levels force prioritization. VERDICT prevents vague conclusions.
Why it works: Chain-of-Draft reduces token usage ~80% vs full chain-of-thought while keeping reasoning quality. Fixed output structure prevents rambling.
Why it works: Each stage produces a checkable artifact — wrong assumptions surface in Stage 1, not after writing 200 lines. Mirrors professional engineering workflow.
Why it works: Persona layering gives DeepSeek a specific lens. "Push back" instruction prevents generic answers. Works best in chat mode — keep lighter in DeepThink.
Why it works: XML tags prevent mixing instructions with content. Explicit word limit stops bloat. Audience framing completely changes vocabulary calibration.
Why it works: Numbered tasks create structured output. "Flag uncertainty" prevents hallucination. V4-Pro's 1M context processes entire books in one pass.
Why it works: DeepThink prompts should be short and precise. "Prove complete" forces exhaustive reasoning. The model's internal CoT handles multi-step work — no need to spell out steps.
Why it works: Explicit self-check breaks a documented failure mode where models reinforce shaky intermediate steps. Forces assumptions to be stated, not buried.
Why it works: CONTESTED section forces genuine disagreement over false consensus. Hedging language instruction reduces overconfident claims on uncertain topics.
Why it works: "Do not hedge" breaks DeepSeek's tendency to list pros/cons without committing. Scoring forces explicit comparison. The escape valve prevents false confidence.
Why it works: Starting with the misconception activates curiosity and contrast — deepening retention. The test at the end forces actionable, not just theoretical, knowledge.
Distilled from official DeepSeek documentation, developer community testing, and API behavior in 2026.
Wrap sections in <context>, <code>, <constraints>. Prevents DeepSeek from confusing your instructions with the content being analyzed.
V4 recommendedR1 mode reasons internally via CoT already. Adding step-by-step instruction is redundant and consistently interferes with native reasoning.
Common mistakeCached tokens cost 90% less ($0.014 vs $0.14/1M). Put stable system prompts and examples at the top — repeated prefixes hit cache automatically.
Cost saver"Help me with my code" → "Return: a JSON diff of changed lines, one-line justification each." Named deliverables produce dramatically better results.
High impactDeepSeek's official recommendation: temperature 0.5–0.7 (0.6 default), top-p 0.95. Above 0.7 risks incoherent reasoning. R1 ignores most other sampling params.
API paramExamples consistently degrade R1 performance — well-documented in official DeepSeek guides. In chat mode they help. In thinking mode, they distract.
R1 antipatternFlash for simple tasks. Pro for complex. Think Max for hard reasoning only. Flash costs 12× less for no quality loss on most everyday tasks.
Cost efficiencyStateless API — every call re-sends full history. Put stable parts (system prompt, examples) first for caching. Variable parts (user query) at the end.
API architectureAppend "Before finalizing, check: does this satisfy all constraints? List assumptions made." Catches ~70% of errors before they reach you.
Error reducerIn API thinking mode, responses have reasoning_content (CoT) and content (answer). In multi-turn history, only include content — not the reasoning chain.
API gotchaDeepSeek has lighter built-in safety layers. For agentic tasks with tool access, explicitly state what the agent must never do and what success looks like.
Production safetyDeepThink occasionally skips its reasoning phase. Add "Please start your response with the <think> tag" to re-activate the chain-of-thought.
DeepThink fixCopy directly into your API system prompt field. Work with V4-Flash and V4-Pro. Use minimal or no system prompt with DeepThink mode.
From zero to great AI output in minutes — whether you're using the chat interface or the API.
Use Instant Mode (V4-Flash) for speed and cost. Expert Mode (V4-Pro) for complex tasks. Toggle DeepThink only for hard math, proofs, and logic.
Add Context, Objective, Style, Tone, Audience, and Response format. Use all six for complex tasks. Even two or three elements significantly improve output.
Use <context>, <task>, <code>, <response_format> tags to create unambiguous boundaries for V4-Pro and V4-Flash.
Replace "help me" with a named deliverable — "Return a JSON diff", "Write a numbered migration plan", "Produce a 3-row comparison table."
End with "Before finalizing, check: does this satisfy all constraints? List any assumptions." Catches most errors without a follow-up turn.
Put stable system prompts at the top of every request. Cache hits save 90% on input costs — $0.014/1M vs $0.14/1M. Huge savings at scale.
Both models support the same prompting techniques — XML tags, CO-STAR framework, system prompts. V4-Pro benefits more from elaborate structured prompts because it has deeper reasoning capacity (49B active params). V4-Flash is faster and cheaper — great for simpler tasks where a concise direct prompt is all you need. Start with Flash; upgrade to Pro only if output quality falls short on your specific task.
Keep it minimal or skip it. DeepSeek's official docs recommend placing all instructions in the user prompt for R1/DeepThink mode. The original R1 model doesn't support system prompts at all. V4's DeepThink mode supports them, but benchmarks show heavy system prompts may slightly reduce reasoning performance. Rule of thumb: if using DeepThink for hard math or logic, write a clean direct user prompt. Save elaborate system prompts for chat/Expert mode.
This is well-documented in DeepSeek's own prompting guide. Reasoning models like R1 have chain-of-thought built in — providing examples forces them to follow a specific reasoning path, which conflicts with their internal process. Instead of examples, describe the problem and output format clearly. If you must use examples, ensure they align very precisely with your instructions — even small mismatches degrade output.
In the API, use "response_format": {"type": "json_object"} — this forces valid JSON output. In chat, use a system prompt that says "Output ONLY valid JSON. No prose. No markdown fences." and add your schema. Note: JSON mode works best in V4-Flash and V4-Pro. R1/DeepThink is not optimized for structured outputs — use chat/Expert mode when you need JSON.
Context caching in the DeepSeek API is automatic. The key is prompt design: put stable content (system prompt, instructions, few-shot examples) at the beginning of every request — where it gets cached. Variable content (user query) goes at the end. Cached tokens cost $0.014/1M vs $0.14/1M for uncached — a 90% discount. For production apps with consistent system prompts, this reduces input costs 60–80%.
This is a known occasional behavior. DeepThink sometimes skips the reasoning phase, producing a direct answer without the <think> block. This reduces quality on hard problems. Fix: add "Please start your response with the <think> tag" to your prompt. If using the API, verify extra_body={"thinking": {"type": "enabled", "budget": "max"}} is set correctly.
Yes. This guide covers DeepSeek-V4-Pro and V4-Flash released April 24, 2026. Key V4 changes: both models support 1M token context windows (up from 128K in V3), three reasoning effort modes (Non-Think, Think High, Think Max), and the new model names deepseek-v4-pro and deepseek-v4-flash. The legacy aliases deepseek-chat and deepseek-reasoner retire July 24, 2026.
Join millions of users and developers building with frontier AI that's open, affordable, and remarkably capable. Start prompting DeepSeek free today.