Skip to content

fix(ollama): send think=false when thinking is off#50741

Closed
SnowSky1 wants to merge 16 commits intoopenclaw:mainfrom
SnowSky1:codex/50712-ollama-think-false
Closed

fix(ollama): send think=false when thinking is off#50741
SnowSky1 wants to merge 16 commits intoopenclaw:mainfrom
SnowSky1:codex/50712-ollama-think-false

Conversation

@SnowSky1
Copy link
Copy Markdown
Contributor

Summary

  • send options.think = false for native Ollama requests when the resolved thinking level is off
  • let the native Ollama stream path honor onPayload mutations before serializing the request body
  • add regression coverage for both the extra-params wrapper path and the final Ollama request body

Testing

  • pnpm test -- src/agents/ollama-stream.test.ts
  • pnpm test -- src/agents/pi-embedded-runner/extra-params.ollama.test.ts
  • pnpm exec oxfmt --check src/agents/ollama-stream.ts src/agents/ollama-stream.test.ts src/agents/pi-embedded-runner/extra-params.ts src/agents/pi-embedded-runner/extra-params.test-support.ts src/agents/pi-embedded-runner/ollama-stream-wrappers.ts src/agents/pi-embedded-runner/extra-params.ollama.test.ts

Closes #50712

@SnowSky1
Copy link
Copy Markdown
Contributor Author

@codex review

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 20, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 20, 2026

Greptile Summary

This PR fixes a two-part problem with Ollama's "thinking" feature: (1) the native createOllamaStreamFn never called onPayload, so any payload mutations registered through the extra-params wrapper pipeline were silently dropped before the request body was serialised; and (2) there was no mechanism to inject options.think = false for Ollama requests when the resolved thinking level is off, causing Ollama to default to reasoning-only responses.

The fix is clean and well-scoped:

  • ollama-stream.ts – adds a single onPayload call after the request body is assembled, bringing the native Ollama stream path in line with other provider stream implementations.
  • ollama-stream-wrappers.ts – new file with sanitizeOllamaThinkingPayload (handles null/invalid options defensively) and createOllamaThinkingPayloadWrapper (guards on model.api === "ollama" so it is a transparent pass-through for all other providers).
  • extra-params.ts – wires the new wrapper into the existing middleware chain whenever thinkingLevel === "off", consistent with how SiliconFlow and Moonshot thinking wrappers are conditionally applied.
  • extra-params.test-support.ts – minor type improvement: thinkingLevel now uses the canonical ThinkLevel import instead of a manually-maintained string literal union.
  • Tests – new regression coverage for both the extra-params wrapper path and the native Ollama stream onPayload path, using existing test helpers and patterns.

Confidence Score: 5/5

  • This PR is safe to merge; the changes are narrowly scoped, well-guarded, and covered by new regression tests.
  • All changes are additive or single-line fixes with no behaviour change for non-Ollama providers. The runtime model.api !== "ollama" guard in the new wrapper ensures it is a transparent pass-through everywhere else. Both affected code paths (native Ollama stream and extra-params wrapper) have direct test coverage added in this PR. No existing tests are broken, and the type fix in extra-params.test-support.ts reduces future maintenance risk.
  • No files require special attention.

Last reviewed commit: "fix(ollama): send th..."

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e3f01c0d43

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}

// Ollama emits reasoning-only responses unless thinking is explicitly disabled.
options.think = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Put Ollama's think flag at the top level

For native model.api == "ollama" runs with thinkingLevel === "off", this writes think into payload.options, but Ollama's chat API expects think as a top-level request field. src/agents/ollama-stream.ts serializes body.options unchanged, so Qwen/DeepSeek requests still go out without an effective disable flag and continue returning reasoning-only chunks. The new tests assert the nested shape too, so this stays undetected.

Useful? React with 👍 / 👎.

Comment on lines +241 to +242
if (thinkingLevel === "off") {
agent.streamFn = createOllamaThinkingPayloadWrapper(agent.streamFn, thinkingLevel);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude GPT-OSS from the blanket think=false wrapper

This unconditional wrapper now runs for every Ollama request when thinking is off, including ollama/gpt-oss:* models that OpenClaw already documents/supports. Ollama's thinking docs say GPT-OSS ignores boolean think values and only accepts low|medium|high, so --thinking off still has no effect for that model family even after this change.

Useful? React with 👍 / 👎.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

The targeted Ollama regressions pass locally:

  • pnpm test -- src/agents/ollama-stream.test.ts
  • pnpm test -- src/agents/pi-embedded-runner/extra-params.ollama.test.ts

From the current CI run, the 3 failing jobs look unrelated to the files touched here:

  • checks (node, test, 1, 2, ...) fails in src/secrets/runtime.integration.test.ts and later hits an out-of-memory failure in the broad test shard
  • checks (node, test, 2, 2, ...) ends in an out-of-memory failure in the broad test shard
  • checks-windows (node, test, 1, 6, ...) times out in src/secrets/runtime.integration.test.ts

This PR only updates the Ollama payload path and adds focused regression coverage in:

  • src/agents/ollama-stream.ts
  • src/agents/ollama-stream.test.ts
  • src/agents/pi-embedded-runner/extra-params.ts
  • src/agents/pi-embedded-runner/extra-params.ollama.test.ts
  • src/agents/pi-embedded-runner/ollama-stream-wrappers.ts

@SnowSky1 SnowSky1 requested a review from a team as a code owner March 20, 2026 02:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da13ec7c96

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}),
options: ollamaOptions,
};
options?.onPayload?.(body as unknown as Record<string, unknown>, model);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent OpenAI-format Moonshot patches from hitting native Ollama

This new onPayload call makes every existing payload wrapper run against native /api/chat bodies. That regresses supported Ollama Cloud Kimi setups (docs/providers/ollama.md:80-86 and docs/providers/ollama.md:199-203): applyExtraParamsToAgent still applies Moonshot compatibility to provider="ollama" / modelId="kimi-k2.5:cloud" (src/agents/pi-embedded-runner/moonshot-stream-wrappers.ts:58-76, exercised in src/agents/pi-embedded-runner-extraparams.test.ts:874-895), and that wrapper injects OpenAI-style thinking / tool_choice fields. Before this change those mutations were inert on the native path; after it, native Ollama Kimi Cloud requests can be malformed whenever thinking or pinned tool choice is used.

Useful? React with 👍 / 👎.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

The latest rerun narrowed the remaining CI failures down to the two Linux checks (node, test, *, 2, ...) shards, and both are still failing as shared unit-fast OOM lanes rather than assertion failures.

I followed up by isolating the top offenders called out in the shard summaries (for example src/tui/tui-event-handlers.test.ts, src/tui/tui.submit-handler.test.ts, src/media-understanding/apply.test.ts, src/node-host/invoke-system-run.test.ts, src/plugin-sdk/webhook-targets.test.ts, and related heavy tests) into unit.singletonIsolated in test/fixtures/test-parallel.behavior.json.

The updated behavior file parses cleanly, and loadTestRunnerBehavior() still loads successfully locally.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d758bfafc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}

// Ollama emits reasoning-only responses unless thinking is explicitly disabled.
options.think = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Send Ollama's think flag at the request top level

For native /api/chat requests with thinkingLevel === "off", this wrapper writes think into payload.options, but createOllamaStreamFn now serializes that mutated object unchanged in src/agents/ollama-stream.ts:462-469. Ollama's chat API reads think as a top-level request field, so qwen/deepseek-style native runs still go out without an effective disable flag and continue returning reasoning-only chunks.

Useful? React with 👍 / 👎.

Comment on lines +241 to +242
if (thinkingLevel === "off") {
agent.streamFn = createOllamaThinkingPayloadWrapper(agent.streamFn, thinkingLevel);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip boolean think=false for native GPT-OSS runs

This wrapper is installed for every thinkingLevel === "off" run, and src/agents/pi-embedded-runner/run.ts:451-452 makes off the default when the user does not pick a thinking level. That means the documented default Ollama model (docs/providers/ollama.md:222-223, ollama/gpt-oss:20b) now gets a boolean think=false on every native request, but Ollama only accepts low|medium|high for GPT-OSS, so /thinking off still cannot disable reasoning for that supported setup.

Useful? React with 👍 / 👎.

...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}),
options: ollamaOptions,
};
options?.onPayload?.(body as unknown as Record<string, unknown>, model);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep Moonshot payload rewrites off native Ollama bodies

This new onPayload hook now activates existing payload wrappers on native Ollama requests. applyExtraParamsToAgent still installs the Moonshot compatibility path for provider="ollama" + kimi-k*:cloud, and that wrapper emits OpenAI-style thinking/tool_choice fields (see src/agents/pi-embedded-runner/moonshot-stream-wrappers.ts, exercised by src/agents/pi-embedded-runner-extraparams.test.ts:859-910). Before this change those mutations were inert on the native path; now native Kimi Cloud calls with thinking or pinned tool choice can be serialized as malformed /api/chat payloads.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle Bot added the channel: signal Channel integration: signal label Mar 20, 2026
@openclaw-barnacle openclaw-barnacle Bot added the channel: matrix Channel integration: matrix label Mar 20, 2026
@openclaw-barnacle openclaw-barnacle Bot added the scripts Repository scripts label Mar 20, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the channel: matrix Channel integration: matrix label Mar 20, 2026
@SnowSky1
Copy link
Copy Markdown
Contributor Author

Follow-up

I pushed a small CI-focused follow-up for the remaining failures:

  • added the missing resolveBundledPluginWebSearchProviders mock export in src/secrets/runtime.integration.test.ts
  • moved the latest unit-fast OOM hotspots from the failing Linux shards into singletonIsolated

Local verification:

  • pnpm test -- src/secrets/runtime.integration.test.ts
  • pnpm exec oxfmt --check src/secrets/runtime.integration.test.ts
  • parsed test/fixtures/test-parallel.behavior.json and confirmed there are no duplicate singletonIsolated entries

This should address the last Windows runtime.integration failure and reduce the two Linux unit-fast OOM lanes.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

Follow-up

I pushed one more small CI-only follow-up based on the latest Linux shard OOM summaries.

Added the current top unit-fast retainers from both failing shards to singletonIsolated, including:

  • src/node-host/invoke-system-run-plan.test.ts
  • src/acp/translator.cancel-scoping.test.ts
  • src/config/sessions/store.pruning.integration.test.ts
  • src/infra/heartbeat-runner.scheduler.test.ts
  • src/infra/outbound/message-action-params.test.ts
  • src/channels/plugins/contracts/outbound-payload.contract.test.ts
  • src/memory/embeddings.test.ts
  • src/plugin-sdk/subpaths.test.ts
  • src/tui/commands.test.ts
  • src/cron/service.store-load-invalid-main-job.test.ts
  • src/cron/isolated-agent.lane.test.ts
  • src/memory/embeddings-ollama.test.ts

Also re-validated that test/fixtures/test-parallel.behavior.json parses cleanly and contains no duplicate singletonIsolated entries.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

Follow-up

I pushed one more tiny CI-only follow-up for the last remaining Linux shard.

Added the latest unit-fast top retainers from shard 1 to singletonIsolated:

  • src/wizard/setup.gateway-config.test.ts
  • src/plugins/status.test.ts
  • src/tui/tui-session-actions.test.ts
  • src/infra/state-migrations.test.ts
  • src/infra/outbound/message.test.ts
  • src/cli/daemon-cli/install.integration.test.ts

Re-validated that test/fixtures/test-parallel.behavior.json still parses cleanly and has no duplicate singletonIsolated entries.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

Follow-up

I pushed one more tiny CI-only follow-up for the latest shard 2 unit-fast OOM summary.

Added these new top retainers to singletonIsolated:

  • src/security/windows-acl.test.ts
  • src/cli/cron-cli.test.ts
  • src/cron/service.restart-catchup.test.ts
  • src/channels/plugins/contracts/session-binding.contract.test.ts
  • src/plugins/services.test.ts
  • src/infra/provider-usage.load.plugin.test.ts

Re-validated that test/fixtures/test-parallel.behavior.json still parses cleanly and has no duplicate singletonIsolated entries.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 564c1f0979

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +241 to +242
if (thinkingLevel === "off") {
agent.streamFn = createOllamaThinkingPayloadWrapper(agent.streamFn, thinkingLevel);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip think=false for Ollama models without thinking support

Because src/agents/pi-embedded-runner/run.ts:451 defaults an unset thinking level to off, these lines now wrap every native model.api === "ollama" request and force think=false. That also hits ordinary Ollama models such as the documented ollama/llama3.3 fallback in docs/providers/ollama.md:223, even though Ollama only accepts think on thinking-capable models and returns HTTP 400 (does not support thinking) for unsupported ones. In other words, a standard non-reasoning native Ollama setup can start failing every request instead of behaving as before.

Useful? React with 👍 / 👎.

…think-false

# Conflicts:
#	scripts/test-parallel.mjs
@openclaw-barnacle openclaw-barnacle Bot added size: S and removed channel: signal Channel integration: signal scripts Repository scripts size: L labels Mar 20, 2026
@SnowSky1
Copy link
Copy Markdown
Contributor Author

Synced this branch onto the latest main in ffe1d610 and resolved the branch conflict.

I also dropped the earlier CI-only follow-up edits that are now covered upstream, so the net diff is back to the Ollama fix + focused regressions only:

  • src/agents/ollama-stream.ts
  • src/agents/ollama-stream.test.ts
  • src/agents/pi-embedded-runner/extra-params.ts
  • src/agents/pi-embedded-runner/extra-params.ollama.test.ts
  • src/agents/pi-embedded-runner/extra-params.test-support.ts
  • src/agents/pi-embedded-runner/ollama-stream-wrappers.ts

Local verification on the rebased branch:

  • pnpm test -- src/agents/ollama-stream.test.ts
  • pnpm test -- src/agents/pi-embedded-runner/extra-params.ollama.test.ts
  • pnpm exec oxfmt --check src/agents/ollama-stream.ts src/agents/ollama-stream.test.ts src/agents/pi-embedded-runner/extra-params.ts src/agents/pi-embedded-runner/extra-params.ollama.test.ts src/agents/pi-embedded-runner/extra-params.test-support.ts src/agents/pi-embedded-runner/ollama-stream-wrappers.ts

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffe1d6101e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +25 to +29
payloadObj.options = options;
}

// Ollama emits reasoning-only responses unless thinking is explicitly disabled.
options.think = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move the Ollama think flag out of options

This patch writes think to payload.options, but Ollama's native /api/chat request schema uses think as a top-level field, separate from options. In practice that means thinkingLevel=off still reaches Ollama without the disabling flag, so reasoning-capable native Ollama models keep emitting thinking-only traces and the regression remains. The new tests stay green because they assert the same nested shape instead of the wire format Ollama actually accepts.

Useful? React with 👍 / 👎.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

Synced this branch onto the latest main in 78be5af0, which pulls in the upstream docs / CI changes that landed after the earlier rerun.

Local verification after the sync:

  • pnpm test -- src/agents/ollama-stream.test.ts
  • pnpm test -- src/agents/pi-embedded-runner/extra-params.ollama.test.ts
  • pnpm exec oxfmt --check src/agents/ollama-stream.ts src/agents/ollama-stream.test.ts src/agents/pi-embedded-runner/extra-params.ts src/agents/pi-embedded-runner/extra-params.ollama.test.ts src/agents/pi-embedded-runner/extra-params.test-support.ts src/agents/pi-embedded-runner/ollama-stream-wrappers.ts

So the focused Ollama fix + regression coverage still pass cleanly on top of the current main.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78be5af097

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +28 to +29
// Ollama emits reasoning-only responses unless thinking is explicitly disabled.
options.think = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Send think as a top-level Ollama field

When thinkingLevel === "off" on native Ollama models, this wrapper writes options.think = false. I checked Ollama’s /api/chat schema (docs.ollama.com/api/chat), and think is a top-level request field next to options, not nested inside it. Because createOllamaStreamFn now serializes onPayload mutations into the outgoing body (src/agents/ollama-stream.ts:462-469), the new flag still lands in the wrong place, so the intended /think off fix remains ineffective on the native Ollama path.

Useful? React with 👍 / 👎.

...(ollamaTools.length > 0 ? { tools: ollamaTools } : {}),
options: ollamaOptions,
};
options?.onPayload?.(body as unknown as Record<string, unknown>, model);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip Moonshot thinking rewrites on native Ollama

This new onPayload hook makes existing payload wrappers mutate native /api/chat requests too. That includes the legacy Moonshot fallback: applyExtraParamsToAgent still installs createMoonshotThinkingWrapper for ollama/kimi-k*:cloud, and that wrapper writes payload.thinking = { type: ... } in OpenAI/Moonshot format (src/agents/pi-embedded-runner/moonshot-stream-wrappers.ts:121-128). The bundled Ollama provider defaults *:cloud models to api: "ollama" (extensions/ollama/index.ts:78, src/plugins/provider-ollama-setup.ts:21), so default Kimi cloud runs now send a non-native thinking object over /api/chat as soon as this line starts honoring onPayload mutations.

Useful? React with 👍 / 👎.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

@BunsDev when you have a moment, could you take a quick look?

This branch is now synced to the latest main, the focused Ollama regressions still pass locally, and the current CI run is green / skipped across the board.

Thanks!

@SnowSky1
Copy link
Copy Markdown
Contributor Author

@vincentkoc when you have a moment, could you take a quick look at this one as well?

It is now synced to the current main, the scope is back to the focused Ollama hink:false fix plus regression coverage, and the latest CI run is green / skipped across the board.

Thanks.

@SnowSky1
Copy link
Copy Markdown
Contributor Author

Closing this to reduce my active PR queue and focus on the smallest, most mergeable fixes. Happy to reopen if this is still useful later.

@SnowSky1 SnowSky1 closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Ollama thinking models: think:false not sent in API request body

1 participant