Thinking-capable models emit aDocumentation Index
Fetch the complete documentation index at: https://docs.ollama.com/llms.txt
Use this file to discover all available pages before exploring further.
thinking field that separates their reasoning trace from the final answer.
Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.
Supported models
- Qwen 3
- GPT-OSS (use
thinklevels:low,medium,high— the trace cannot be fully disabled) - DeepSeek-v3.1
- DeepSeek R1
- Browse the latest additions under thinking models
Enable thinking in API calls
Set thethink field on chat or generate requests. Most models accept booleans (true/false).
GPT-OSS instead expects one of low, medium, or high to tune the trace length.
The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.
- cURL
- Python
- JavaScript
GPT-OSS requires
think to be set to "low", "medium", or "high". Passing true/false is ignored for that model.Stream the reasoning trace
Thinking streams interleave reasoning tokens before answer tokens. Detect the firstthinking chunk to render a “thinking” section, then switch to the final reply once message.content arrives.
- Python
- JavaScript
CLI quick reference
- Enable thinking for a single run:
ollama run deepseek-r1 --think "Where should I visit in Lisbon?" - Disable thinking:
ollama run deepseek-r1 --think=false "Summarize this article" - Hide the trace while still using a thinking model:
ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?" - Inside interactive sessions, toggle with
/set thinkor/set nothink. - GPT-OSS only accepts levels:
ollama run gpt-oss --think=low "Draft a headline"(replacelowwithmediumorhighas needed).
Thinking is enabled by default in the CLI and API for supported models.

