v0.12.7

@rick-github

New models

Qwen3-VL: Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama's cloud

Ollama's new app now includes a way to add one or many files when prompting the model:

For better responses, thinking levels can now be adjusted for the gpt-oss models:

New API documentation is available for Ollama's API: https://docs.ollama.com/api

Model load failures now include more information on Windows
Fixed embedding results being incorrect when running embeddinggemma
Fixed gemma3n on Vulkan backend
Increased time allocated for ROCm to discover devices
Fixed truncation error when generating embeddings
Fixed request status code when running cloud models
The OpenAI-compatible /v1/embeddings endpoint now supports encoding_format parameter
Ollama will now parse tool calls that don't conform to {"name": name, "arguments": args} (thanks @rick-github!)
Fixed prompt processing reporting in the llama runner
Increase speed when scheduling models
Fixed issue where FROM <model> would not inherit RENDERER or PARSER commands

Full Changelog: v0.12.6...v0.12.7