Skip to content

feat: forward reasoning_effort to the backend so jinja models honor it#10184

Merged
mudler merged 2 commits into
masterfrom
feat/reasoning-effort-passthrough
Jun 5, 2026
Merged

feat: forward reasoning_effort to the backend so jinja models honor it#10184
mudler merged 2 commits into
masterfrom
feat/reasoning-effort-passthrough

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

What

reasoning_effort was only mapped to the binary enable_thinking toggle and otherwise reached Go-side templates — it was never forwarded to the backend. So jinja-templated models whose chat template keys on reasoning_effort (gpt-oss / Harmony, LFM2.5) could not be driven by it. In particular LFM2.5 ignores enable_thinking and keeps emitting <think>, but honors reasoning_effort.

This PR forwards the effective reasoning_effort to the backend as a chat_template_kwarg (mirroring how enable_thinking is already forwarded), and adds a config-level default so it can be pinned per model / per realtime pipeline.

Changes

  • C++ (grpc-server.cpp): forward metadata["reasoning_effort"]body_json["chat_template_kwargs"]["reasoning_effort"] in both the streaming and non-streaming predict paths, next to enable_thinking.
  • Go (options.go): put the effective reasoning_effort into PredictOptions.Metadata.
  • Config: ModelConfig.reasoning_effort (model default) and Pipeline.reasoning_effort (realtime), resolved by ModelConfig.ApplyReasoningEffort — a per-request value overrides the config default; none→disable, level→enable, and an operator's reasoning.disable: true still wins. request.go now uses that shared helper (no behavior change for the existing per-request path).
  • Realtime: apply Pipeline.ReasoningEffort to the pipeline's LLM (per-session copy) and surface it on the template input.
  • Registry entries + docs.
# model default
name: lfm2.5
reasoning_effort: none

# realtime pipeline (overrides the LLM's own)
name: gpt-realtime
pipeline:
  llm: lfm2.5
  reasoning_effort: none

Test plan

  • ApplyReasoningEffort unit tests (request>config precedence, none/level mapping, operator-disable wins).
  • gRPCPredictOpts forwards reasoning_effort into metadata (and omits when empty).
  • request.go refactor — middleware tests green.
  • realtime applyPipelineReasoning test.
  • gofmt / golangci-lint clean on changed packages.
  • Live check against LFM2.5: reasoning_effort: none suppresses <think> (C++ path — verifies on CI build / live).

Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint

🤖 Generated with Claude Code

mudler added 2 commits June 5, 2026 13:24
reasoning_effort was only mapped to the binary enable_thinking toggle and
otherwise reached Go-side templates — it was never sent to the backend. So
jinja-templated models whose chat template keys on reasoning_effort (gpt-oss
Harmony, LFM2.5) could not be driven by it: LFM2.5 ignores enable_thinking and
kept emitting <think>.

Forward the effective reasoning_effort to the backend as a chat_template_kwarg
(mirroring enable_thinking) in grpc-server.cpp, and put it in PredictOptions
metadata (gRPCPredictOpts). Add a config-level default: ModelConfig.reasoning_effort
and Pipeline.reasoning_effort, resolved by ModelConfig.ApplyReasoningEffort
(request value overrides config default, none->disable / level->enable, an
operator's reasoning.disable wins). request.go now uses that helper.

Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Apply Pipeline.ReasoningEffort to the pipeline's LLM config when the realtime
model is built (per-session copy, overrides the LLM's own reasoning_effort),
and surface the resolved effort on the template input so Go-templated models
get it too. jinja models receive it via the backend metadata. This lets a
realtime pipeline disable thinking on models that only honor reasoning_effort
(e.g. LFM2.5), which enable_thinking can't.

Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler enabled auto-merge (squash) June 5, 2026 13:36
@mudler mudler merged commit e837921 into master Jun 5, 2026
67 checks passed
@mudler mudler deleted the feat/reasoning-effort-passthrough branch June 5, 2026 13:45
@localai-bot localai-bot added the enhancement New feature or request label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants