Skip to content

fix(agents): inject num_ctx for Ollama OpenAI-compat API to prevent 4096 token cap#27292

Closed
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/27278-ollama-context-window-override
Closed

fix(agents): inject num_ctx for Ollama OpenAI-compat API to prevent 4096 token cap#27292
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/27278-ollama-context-window-override

Conversation

@Sid-Qin
Copy link
Contributor

@Sid-Qin Sid-Qin commented Feb 26, 2026

Summary

  • Problem: Ollama models always cap input at 4096 tokens when configured with api: "openai-completions", causing conversation history to be lost after a few messages. The native api: "ollama" path sends num_ctx but the OpenAI-compat path does not.
  • Why it matters: Users lose all conversation context — the model forgets everything said earlier in the same conversation.
  • What changed: (1) Detect Ollama providers using the OpenAI-compat API (by provider name or baseUrl pattern) and inject num_ctx into the request payload via onPayload. (2) Fix fallback model resolution to match by model ID instead of blindly using models[0].
  • What did NOT change (scope boundary): The native api: "ollama" path, non-Ollama providers, and the model registry lookup are unaffected.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Ollama models configured with api: "openai-completions" now respect the configured contextWindow value. Conversation history is maintained up to the configured limit instead of being silently truncated at 4096 tokens.

Security Impact (required)

  • New permissions/capabilities: None — only adds a parameter to the existing HTTP request body
  • Auth/token changes: None
  • Data exposure risk: None

Testing

  • npx vitest run src/agents/pi-embedded-runner/run/attempt — 9 tests ✓
  • npx vitest run src/agents/pi-embedded-runner/model — 20 tests ✓

Rollback Plan

Revert the single commit. Non-Ollama providers are unaffected. Ollama reverts to the 4096 default.

…096 token cap

Ollama defaults to num_ctx=4096, so conversations lose history after a
few messages.  The native "ollama" API already sends num_ctx via the
options field, but when users configure api: "openai-completions" the
parameter is never sent and the server silently truncates input.

1. Detect Ollama providers using the OpenAI-compat API (by provider name
   or baseUrl pattern) and wrap the stream function to inject num_ctx
   into the request payload via onPayload.

2. Fix fallback model resolution to match models by ID instead of
   blindly using models[0], so the correct contextWindow/maxTokens
   values are used when the model isn't in the registry.

Closes openclaw#27278
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Fixes two related bugs affecting Ollama models configured with api: "openai-completions":

Root Cause 1: Missing num_ctx parameter - When using Ollama via OpenAI-compatible API, the num_ctx parameter was never sent, causing Ollama's server to default to 4096 tokens regardless of the configured contextWindow. The fix detects Ollama providers (by name or baseUrl heuristics) and injects num_ctx into the request payload via the onPayload callback, matching the behavior of the native Ollama API path.

Root Cause 2: Incorrect fallback model resolution - When a model wasn't found in the registry, the fallback code blindly used models[0] config instead of matching by model ID, potentially using the wrong contextWindow value. The fix now attempts to find the correct model by ID before falling back to models[0].

The implementation is backward-compatible, follows existing patterns (consistent with ollama-stream.ts:431), and properly chains existing onPayload callbacks. The detection logic uses provider name and baseUrl heuristics (port :11434 or path /ollama) which could theoretically false-positive, but the injected num_ctx parameter is harmless to non-Ollama providers.

Confidence Score: 5/5

  • Safe to merge - well-structured bug fix following existing patterns with minimal risk
  • The changes are narrowly scoped, backward-compatible, and consistent with existing codebase patterns. The Ollama detection logic is defensive, the payload injection properly chains callbacks, and the model resolution fix is a clear improvement. No new dependencies, security issues, or breaking changes.
  • No files require special attention

Last reviewed commit: d2cd136

@vincentkoc
Copy link
Member

Superseded by #29205, which carries forward the num_ctx OpenAI-compatible Ollama fix together with fallback model-ID token-limit resolution and additional tests.

Credit: @Sid-Qin for the core num_ctx direction and problem report captured in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 4096 token hard cap on input when using Ollama local models - conversation history never passed to mode

2 participants