Description
The reasoning: -> effort: parameter (low | medium | high) does not appear to work, especially in custom agents. Regardless of value, results remain the same in terms of time, speed, and number of thinking tokens.
Tested Models
gtp-oss-20
gtp-oss-120
grok-code-fast-1
Expected Behavior
- Clear differences between
low, medium, and high reasoning.
gtp-oss family normally shows large variation in thinking tokens:
- low → ~58 tokens
- medium → ~1500 tokens
- high → 5000+ tokens
Actual Behavior
- No significant change between reasoning levels.
- Output corresponds roughly to
medium level regardless of setting.
- OpenRouter activity logs show no increase in thinking tokens when switching from
low to high (should be ~10x difference).
Steps to Reproduce
- Run the same prompt with agents with different
reasoning effort: low | medium | high.
- Compare runtime, speed, and thinking tokens.
- Observe logs for token usage.
Evidence
- Local runs of
gtp-oss models confirm expected differences.
- Cloud service runs show no variation (all behave like
medium).
Impact
- Impossible to control reasoning depth in custom agents.
- Misleads users expecting higher reasoning at higher effort levels.
Description
The
reasoning:->effort:parameter (low | medium | high) does not appear to work, especially in custom agents. Regardless of value, results remain the same in terms of time, speed, and number of thinking tokens.Tested Models
gtp-oss-20gtp-oss-120grok-code-fast-1Expected Behavior
low,medium, andhighreasoning.gtp-ossfamily normally shows large variation in thinking tokens:Actual Behavior
mediumlevel regardless of setting.lowtohigh(should be ~10x difference).Steps to Reproduce
reasoningeffort: low | medium | high.Evidence
gtp-ossmodels confirm expected differences.medium).Impact