Problem
When using Xiaomi MiMo models (e.g., mimo-v2.5-pro) via the xiaomi provider, Hermes does not pass the thinking parameter in the API request.
According to MiMo official documentation, thinking mode is enabled by default, which significantly increases token consumption. Even when the user configures thinking: hidden or reasoning_effort: none in config.yaml, these settings are not forwarded to the MiMo API endpoint.
Expected Behavior
Hermes should respect the user's thinking / reasoning_effort configuration and pass the appropriate parameter (e.g., thinking: { type: "disabled" } or equivalent) to the MiMo API when the user explicitly disables thinking.
Current Behavior
- The
thinking parameter is not included in API requests to MiMo
- MiMo defaults to thinking=enabled, consuming extra tokens
- Config settings like
thinking: hidden and reasoning_effort: none have no effect on the actual API call
Config Example
model:
provider: xiaomi
default: mimo-v2.5-pro
api_key: tp-xxx
agent:
reasoning_effort: none
display:
sections:
thinking: hidden
Impact
Unnecessary token consumption on every request when using MiMo models, especially costly for users on limited token plans.
Environment
Problem
When using Xiaomi MiMo models (e.g.,
mimo-v2.5-pro) via thexiaomiprovider, Hermes does not pass thethinkingparameter in the API request.According to MiMo official documentation, thinking mode is enabled by default, which significantly increases token consumption. Even when the user configures
thinking: hiddenorreasoning_effort: noneinconfig.yaml, these settings are not forwarded to the MiMo API endpoint.Expected Behavior
Hermes should respect the user's
thinking/reasoning_effortconfiguration and pass the appropriate parameter (e.g.,thinking: { type: "disabled" }or equivalent) to the MiMo API when the user explicitly disables thinking.Current Behavior
thinkingparameter is not included in API requests to MiMothinking: hiddenandreasoning_effort: nonehave no effect on the actual API callConfig Example
Impact
Unnecessary token consumption on every request when using MiMo models, especially costly for users on limited token plans.
Environment