feat: forward reasoning_effort to the backend so jinja models honor it by localai-bot · Pull Request #10184 · mudler/LocalAI

localai-bot · 2026-06-05T13:25:03Z

What

reasoning_effort was only mapped to the binary enable_thinking toggle and otherwise reached Go-side templates — it was never forwarded to the backend. So jinja-templated models whose chat template keys on reasoning_effort (gpt-oss / Harmony, LFM2.5) could not be driven by it. In particular LFM2.5 ignores enable_thinking and keeps emitting <think>, but honors reasoning_effort.

This PR forwards the effective reasoning_effort to the backend as a chat_template_kwarg (mirroring how enable_thinking is already forwarded), and adds a config-level default so it can be pinned per model / per realtime pipeline.

Changes

C++ (grpc-server.cpp): forward metadata["reasoning_effort"] → body_json["chat_template_kwargs"]["reasoning_effort"] in both the streaming and non-streaming predict paths, next to enable_thinking.
Go (options.go): put the effective reasoning_effort into PredictOptions.Metadata.
Config: ModelConfig.reasoning_effort (model default) and Pipeline.reasoning_effort (realtime), resolved by ModelConfig.ApplyReasoningEffort — a per-request value overrides the config default; none→disable, level→enable, and an operator's reasoning.disable: true still wins. request.go now uses that shared helper (no behavior change for the existing per-request path).
Realtime: apply Pipeline.ReasoningEffort to the pipeline's LLM (per-session copy) and surface it on the template input.
Registry entries + docs.

# model default
name: lfm2.5
reasoning_effort: none

# realtime pipeline (overrides the LLM's own)
name: gpt-realtime
pipeline:
  llm: lfm2.5
  reasoning_effort: none

Test plan

ApplyReasoningEffort unit tests (request>config precedence, none/level mapping, operator-disable wins).
gRPCPredictOpts forwards reasoning_effort into metadata (and omits when empty).
request.go refactor — middleware tests green.
realtime applyPipelineReasoning test.
gofmt / golangci-lint clean on changed packages.
Live check against LFM2.5: reasoning_effort: none suppresses <think> (C++ path — verifies on CI build / live).

Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint

🤖 Generated with Claude Code

reasoning_effort was only mapped to the binary enable_thinking toggle and otherwise reached Go-side templates — it was never sent to the backend. So jinja-templated models whose chat template keys on reasoning_effort (gpt-oss Harmony, LFM2.5) could not be driven by it: LFM2.5 ignores enable_thinking and kept emitting <think>. Forward the effective reasoning_effort to the backend as a chat_template_kwarg (mirroring enable_thinking) in grpc-server.cpp, and put it in PredictOptions metadata (gRPCPredictOpts). Add a config-level default: ModelConfig.reasoning_effort and Pipeline.reasoning_effort, resolved by ModelConfig.ApplyReasoningEffort (request value overrides config default, none->disable / level->enable, an operator's reasoning.disable wins). request.go now uses that helper. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Apply Pipeline.ReasoningEffort to the pipeline's LLM config when the realtime model is built (per-session copy, overrides the LLM's own reasoning_effort), and surface the resolved effort on the template input so Go-templated models get it too. jinja models receive it via the backend metadata. This lets a realtime pipeline disable thinking on models that only honor reasoning_effort (e.g. LFM2.5), which enable_thinking can't. Assisted-by: Claude:claude-opus-4-8 go test, golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 2 commits June 5, 2026 13:24

mudler approved these changes Jun 5, 2026

View reviewed changes

mudler enabled auto-merge (squash) June 5, 2026 13:36

mudler merged commit e837921 into master Jun 5, 2026
67 checks passed

mudler deleted the feat/reasoning-effort-passthrough branch June 5, 2026 13:45

localai-bot mentioned this pull request Jun 5, 2026

Model overlays: base model + merged overrides (generalize per-pipeline overrides) #10185

Open

localai-bot added the enhancement New feature or request label Jun 10, 2026

BrewTestBot mentioned this pull request Jun 10, 2026

localai 4.4.0 Homebrew/homebrew-core#287347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: forward reasoning_effort to the backend so jinja models honor it#10184

feat: forward reasoning_effort to the backend so jinja models honor it#10184
mudler merged 2 commits into
masterfrom
feat/reasoning-effort-passthrough

localai-bot commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jun 5, 2026

What

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants