deepseek-ai/DeepSeek-R1-0528

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:685BQuant:FP8Ctx Length:32kPublished:May 28, 2025License:mitArchitecture:Transformer2.4K Open Weights Warm

DeepSeek-R1-0528 is a 685 billion parameter language model developed by DeepSeek AI, featuring a 32K token context length. This updated version significantly enhances reasoning and inference capabilities through algorithmic optimizations and increased computational resources. It demonstrates strong performance across mathematics, programming, and general logic benchmarks, with notable improvements in complex reasoning tasks and reduced hallucination rates. The model is designed for advanced applications requiring deep reasoning and robust problem-solving.

Loading preview...

DeepSeek-R1-0528: Enhanced Reasoning and Inference

DeepSeek-R1-0528 is an upgraded version of the DeepSeek R1 model by DeepSeek AI, focusing on significantly improved reasoning and inference capabilities. This 685 billion parameter model leverages increased computational resources and algorithmic optimizations during post-training to achieve performance approaching leading models like O3 and Gemini 2.5 Pro.

Key Capabilities and Improvements

  • Enhanced Reasoning Depth: Demonstrates substantial improvements in handling complex reasoning tasks, evidenced by an increase in AIME 2025 test accuracy from 70% to 87.5%. The model now uses an average of 23K tokens per question for deeper thought processes, up from 12K.
  • Reduced Hallucination: Offers a lower hallucination rate compared to its previous version.
  • Improved Function Calling: Provides enhanced support for function calling.
  • Vibe Coding Experience: Delivers a better experience for "vibe coding."
  • Benchmark Performance: Shows strong performance across various benchmarks, including MMLU-Redux (93.4), GPQA-Diamond (81.0), LiveCodeBench (73.3), and AIME 2025 (87.5).
  • Distillation for Smaller Models: The chain-of-thought from DeepSeek-R1-0528 has been used to post-train DeepSeek-R1-0528-Qwen3-8B, achieving state-of-the-art performance among open-source models on AIME 2024.

Usage Notes

  • Supports system prompts.
  • No longer requires the "\n" prefix to activate thinking patterns.
  • Maximum generation length is 64K tokens for evaluations.

This model is suitable for applications demanding advanced logical reasoning, mathematical problem-solving, and robust code generation.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p