Skip to content

Commit 7559845

Browse files
committed
fix(ollama): avoid implicit native num_ctx override
1 parent c4194b8 commit 7559845

4 files changed

Lines changed: 19 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Docs: https://docs.openclaw.ai
6565
- Image tool/media: honor `tools.media.image.timeoutSeconds` and matching per-model image timeouts in explicit image analysis, including the MiniMax VLM fallback path, so slow local vision models are not capped by hardcoded 30s/60s aborts. Fixes #67889; supersedes #67929. Thanks @AllenT22 and @alchip.
6666
- Providers/Ollama: read larger custom Modelfile `PARAMETER num_ctx` values from `/api/show` so auto-discovered Ollama models with expanded context no longer stay pinned to the base model context. Fixes #68344. Thanks @neeravmakwana.
6767
- Providers/Ollama: honor configured model `params.num_ctx` in native and OpenAI-compatible Ollama requests so local models can cap runtime context without rebuilding Modelfiles. Fixes #44550 and #52206; supersedes #69464. Thanks @taitruong, @armi0024, and @LokiCode404.
68+
- Providers/Ollama: stop forcing native Ollama requests to use the full configured `contextWindow` as `options.num_ctx` unless `params.num_ctx` is explicit, so local models can keep Ollama's VRAM/env default instead of looking hung on first turns. Fixes #49684 and #68662. Thanks @zhouZcong and @dshenster-byte.
6869
- Providers/Ollama: forward whitelisted native Ollama model params such as `temperature`, `top_p`, and top-level `think` so users can disable API-level thinking or tune local models from config without proxy shims. Fixes #48010. Thanks @tangzhi, @pandego, @maweibin, @Adam-Researchh, and @EmpireCreator.
6970
- Providers/Ollama: expose native Ollama thinking effort levels so `/think max` is accepted for reasoning-capable Ollama models and maps to Ollama's highest supported `think` effort. Fixes #71584. Thanks @g0st1n.
7071
- Providers/Ollama: strip the active custom Ollama provider prefix before native chat and embedding requests, so custom provider ids like `ollama-spark/qwen3:32b` reach Ollama as the real model name. Fixes #72353. Thanks @maximus-dss and @hclsys.

docs/providers/ollama.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -758,7 +758,7 @@ For the full setup and behavior details, see [Ollama Web Search](/tools/ollama-s
758758
<Accordion title="Context windows">
759759
For auto-discovered models, OpenClaw uses the context window reported by Ollama when available, including larger `PARAMETER num_ctx` values from custom Modelfiles. Otherwise it falls back to the default Ollama context window used by OpenClaw.
760760

761-
You can set provider-level `contextWindow`, `contextTokens`, and `maxTokens` defaults for every model under that Ollama provider, then override them per model when needed. To cap Ollama's per-request runtime context without rebuilding a Modelfile, set `params.num_ctx`; OpenClaw sends it as `options.num_ctx` for both native Ollama and the OpenAI-compatible Ollama adapter. Invalid, zero, negative, and non-finite values are ignored and fall back to `contextWindow`.
761+
You can set provider-level `contextWindow`, `contextTokens`, and `maxTokens` defaults for every model under that Ollama provider, then override them per model when needed. `contextWindow` is OpenClaw's prompt and compaction budget. Native Ollama requests leave `options.num_ctx` unset unless you explicitly configure `params.num_ctx`, so Ollama can apply its own model, `OLLAMA_CONTEXT_LENGTH`, or VRAM-based default. To cap or force Ollama's per-request runtime context without rebuilding a Modelfile, set `params.num_ctx`; invalid, zero, negative, and non-finite values are ignored. The OpenAI-compatible Ollama adapter still injects `options.num_ctx` by default from the configured `params.num_ctx` or `contextWindow`; disable that with `injectNumCtxForOpenAICompat: false` if your upstream rejects `options`.
762762

763763
Native Ollama model entries also accept the common Ollama runtime options under `params`, including `temperature`, `top_p`, `top_k`, `min_p`, `num_predict`, `stop`, `repeat_penalty`, `num_batch`, `num_thread`, and `use_mmap`. OpenClaw forwards only Ollama request keys, so OpenClaw runtime params such as `streaming` are not leaked to Ollama. Use `params.think` or `params.thinking` to send top-level Ollama `think`; `false` disables API-level thinking for Qwen-style thinking models.
764764

@@ -999,7 +999,7 @@ For the full setup and behavior details, see [Ollama Web Search](/tools/ollama-s
999999
</Accordion>
10001000

10011001
<Accordion title="Large-context model is too slow or runs out of memory">
1002-
Many Ollama models advertise contexts that are larger than your hardware can run comfortably. Cap both OpenClaw's budget and Ollama's request context:
1002+
Many Ollama models advertise contexts that are larger than your hardware can run comfortably. Native Ollama uses Ollama's own runtime context default unless you set `params.num_ctx`. Cap both OpenClaw's budget and Ollama's request context when you want predictable first-token latency:
10031003

10041004
```json5
10051005
{
@@ -1021,7 +1021,7 @@ For the full setup and behavior details, see [Ollama Web Search](/tools/ollama-s
10211021
}
10221022
```
10231023

1024-
Lower `contextWindow` first if the prompt ingestion phase is slow. Lower `maxTokens` if generation runs too long.
1024+
Lower `contextWindow` first if OpenClaw is sending too much prompt. Lower `params.num_ctx` if Ollama is loading a runtime context that is too large for the machine. Lower `maxTokens` if generation runs too long.
10251025

10261026
</Accordion>
10271027
</AccordionGroup>

extensions/ollama/src/stream-runtime.test.ts

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ describe("createConfiguredOllamaCompatStreamWrapper", () => {
207207
};
208208
expect(requestBody.think).toBe(false);
209209
expect(requestBody.options?.think).toBeUndefined();
210-
expect(requestBody.options?.num_ctx).toBe(131072);
210+
expect(requestBody.options?.num_ctx).toBeUndefined();
211211
},
212212
);
213213
});
@@ -260,7 +260,7 @@ describe("createConfiguredOllamaCompatStreamWrapper", () => {
260260
};
261261
expect(requestBody.think).toBe("low");
262262
expect(requestBody.options?.think).toBeUndefined();
263-
expect(requestBody.options?.num_ctx).toBe(131072);
263+
expect(requestBody.options?.num_ctx).toBeUndefined();
264264
},
265265
);
266266
});
@@ -332,7 +332,7 @@ describe("createConfiguredOllamaCompatStreamWrapper", () => {
332332
};
333333
expect(requestBody.think).toBe("high");
334334
expect(requestBody.options?.think).toBeUndefined();
335-
expect(requestBody.options?.num_ctx).toBe(131072);
335+
expect(requestBody.options?.num_ctx).toBeUndefined();
336336
},
337337
);
338338
});
@@ -1296,9 +1296,12 @@ describe("createOllamaStreamFn", () => {
12961296
}
12971297

12981298
const requestBody = JSON.parse(requestInit.body) as {
1299-
options: { num_ctx?: number; num_predict?: number };
1299+
options?: { num_ctx?: number; num_predict?: number };
13001300
};
1301-
expect(requestBody.options.num_ctx).toBe(131072);
1301+
if (!requestBody.options) {
1302+
throw new Error("Expected Ollama request options");
1303+
}
1304+
expect(requestBody.options?.num_ctx).toBeUndefined();
13021305
expect(requestBody.options.num_predict).toBe(123);
13031306
},
13041307
);

extensions/ollama/src/stream.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,12 +247,18 @@ function resolveOllamaModelOptions(model: ProviderRuntimeModel): Record<string,
247247
const params = model.params;
248248
if (params && typeof params === "object" && !Array.isArray(params)) {
249249
for (const [key, value] of Object.entries(params)) {
250+
if (key === "num_ctx") {
251+
continue;
252+
}
250253
if (value !== undefined && OLLAMA_OPTION_PARAM_KEYS.has(key)) {
251254
options[key] = value;
252255
}
253256
}
254257
}
255-
options.num_ctx = resolveOllamaNumCtx(model);
258+
const numCtx = resolveOllamaConfiguredNumCtx(model);
259+
if (numCtx !== undefined) {
260+
options.num_ctx = numCtx;
261+
}
256262
return options;
257263
}
258264

0 commit comments

Comments
 (0)