fix(ollama): unify context window handling across discovery, merge, and OpenAI-compat transport by vincentkoc · Pull Request #29205 · openclaw/openclaw

vincentkoc · 2026-02-27T23:22:20Z

Summary

Problem: Ollama context handling was split across three partial fixes: per-model discovery via /api/show (fix(ollama): query per-model context window via /api/show #24146), preserving larger configured limits (fix(ollama): preserve configured context window #26475), and injecting num_ctx on the OpenAI-compatible path (fix(agents): inject num_ctx for Ollama OpenAI-compat API to prevent 4096 token cap #27292).
Why it matters: Without all pieces together, users can still get silent low-context behavior (especially the 4096 default on OpenAI-compatible Ollama), stale fallback limits, or regressions in provider-model merging.
What changed:
- Query per-model context windows from Ollama /api/show during provider discovery, with timeout and safe fallback to 128k.
- Preserve explicit contextWindow/maxTokens only when they are larger than implicit catalog values, so stale low explicit values are still refreshed.
- Fix fallback model resolution to use the matching configured model ID before models[0].
- Inject payload.options.num_ctx for Ollama OpenAI-compatible transport so context window is respected.
- Added focused tests for all above paths.
What did NOT change (scope boundary):
- Native api: "ollama" transport behavior and non-Ollama providers are unchanged.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Ollama model discovery now uses per-model context windows from Ollama metadata when available.
Ollama OpenAI-compatible requests now include num_ctx, avoiding silent fallback to 4096.
Fallback model resolution now picks matching configured model limits by model id.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? Yes (POST /api/show during Ollama model discovery)
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation:
- /api/show calls are local/provider-scoped metadata fetches with 3s timeout and safe fallback.

Repro + Verification

Environment

OS: macOS
Runtime/container: local dev
Model/provider: Ollama + provider merge paths
Integration/channel (if any): N/A
Relevant config (redacted): N/A

Steps

Run targeted Vitest suites for changed paths.
Verify per-model context discovery and fallback behavior.
Verify stream payload wrapper injects num_ctx for Ollama OpenAI-compat.

Expected

No regression in stale capability refresh behavior.
Higher explicit token limits can be preserved.
OpenAI-compatible Ollama requests carry num_ctx.

Actual

All targeted tests passed.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- src/agents/models-config.providers.ollama.test.ts
- src/agents/models-config.fills-missing-provider-apikey-from-env-var.test.ts
- src/agents/pi-embedded-runner/model.test.ts
- src/agents/pi-embedded-runner/run/attempt.test.ts
Edge cases checked:
- /api/show failure fallback.
- non-Ollama provider is not classified as Ollama compat.
- stale low explicit limits still refresh from implicit defaults.
What you did not verify:
- Full end-to-end live Ollama gateway run.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
- Revert this PR.
Files/config to restore:
- src/agents/models-config.providers.ts
- src/agents/models-config.ts
- src/agents/pi-embedded-runner/model.ts
- src/agents/pi-embedded-runner/run/attempt.ts
Known bad symptoms reviewers should watch for:
- Unexpected context window values for non-Ollama providers.

Risks and Mitigations

Risk:
- Over-classifying non-Ollama OpenAI-compatible providers.
- Mitigation:
  - Restrict detection to explicit provider id/api and localhost:11434 heuristic only.

aisle-research-bot · 2026-02-28T00:44:47Z

🔒 Aisle Security Analysis

We found 4 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Potential denial-of-service via synchronous Ollama model discovery (/api/show) on every agent run
2	🟡 Medium	Unbounded Ollama OpenAI-compat `options.num_ctx` injection can enable resource-exhaustion DoS
3	🔵 Low	SSRF via configurable Ollama baseUrl used for automatic model discovery (/api/tags, /api/show)
4	🔵 Low	Unbounded contextWindow/maxTokens preserved during provider merge can enable resource-exhaustion (DoS) via large token limits

1. 🟡 Potential denial-of-service via synchronous Ollama model discovery (/api/show) on every agent run

Property	Value
Severity	Medium
CWE	CWE-400
Location	`src/agents/models-config.providers.ts:304-324`

Description

The new Ollama discovery logic performs per-model /api/show requests (with a 3s timeout each) in sequential batches, and it is executed as part of provider resolution that is awaited on the critical path.

Key points:

discoverOllamaModels() calls /api/tags then inspects up to 200 models (OLLAMA_SHOW_MAX_MODELS) via /api/show in batches of 8 (OLLAMA_SHOW_CONCURRENCY).
Each /api/show call uses AbortSignal.timeout(3000); in the worst case where requests time out, total wall time is roughly:
- 5s (/api/tags timeout) + (200/8)*3s ≈ 75s ⇒ ~80 seconds per discovery run.
This discovery runs inside resolveImplicitProviders() (provider auto-resolution), which is invoked by ensureOpenClawModelsJson().
runEmbeddedPiAgent() awaits ensureOpenClawModelsJson(...) at the start of each embedded agent run, meaning repeated agent runs can repeatedly incur this cost.

Impact:

If the configured Ollama base URL points to a slow/unresponsive host (including a remote host) or if /api/show consistently stalls until timeout, an attacker who can trigger agent runs (e.g., by sending messages that cause runs) can cause significant latency/queue buildup and resource consumption.

Vulnerable code (sequential batched /api/show inspection):

for (let index = 0; index < modelsToInspect.length; index += OLLAMA_SHOW_CONCURRENCY) {
  const batch = modelsToInspect.slice(index, index + OLLAMA_SHOW_CONCURRENCY);
  const batchDiscovered = await Promise.all(
    batch.map(async (model) => {
      const modelId = model.name;
      const contextWindow = await queryOllamaContextWindow(apiBase, modelId);
      ...
      return { ... };
    }),
  );
  discovered.push(...batchDiscovered);
}

Recommendation

Reduce DoS risk by moving discovery off the per-run critical path and bounding total time.

Recommended changes (one or more):

Memoize discovery / add TTL caching so repeated calls during agent runs do not re-fetch tags/show each time.
Add a global deadline for the entire discovery process (not per-request), and stop inspecting once the deadline is exceeded.
Only run deep inspection when Ollama is actually selected/needed (e.g., only if provider is explicitly configured and/or is the active provider).

Example: global deadline + early exit:

const deadlineMs = 5000; // total budget for /api/show inspection
const deadlineAt = Date.now() + deadlineMs;

for (let index = 0; index < modelsToInspect.length; index += OLLAMA_SHOW_CONCURRENCY) {
  if (Date.now() > deadlineAt) break;

  const batch = modelsToInspect.slice(index, index + OLLAMA_SHOW_CONCURRENCY);
  const batchDiscovered = await Promise.all(
    batch.map(async (model) => {
      const remaining = Math.max(0, deadlineAt - Date.now());
      const signal = AbortSignal.timeout(Math.min(3000, remaining));
      return queryOllamaContextWindow(apiBase, model.name, signal);
    }),
  );
  ...
}

Also consider making OLLAMA_SHOW_MAX_MODELS and/or inspection behavior configurable, and performing discovery asynchronously in the background during startup rather than awaiting it for every run.

2. 🟡 Unbounded Ollama OpenAI-compat `options.num_ctx` injection can enable resource-exhaustion DoS

Property	Value
Severity	Medium
CWE	CWE-400
Location	`src/agents/pi-embedded-runner/run/attempt.ts:874-893`

Description

runEmbeddedAttempt() now injects options.num_ctx into every OpenAI-compatible request when the provider is detected as Ollama-compatible.

Security impact:

num_ctx is derived directly from model.contextWindow / model.maxTokens (or a default) without any upper bound.
contextWindow can come from user-controlled configuration (and for Ollama discovery is sourced from /api/show), so it can be set extremely large.
For Ollama, num_ctx influences the context size the server attempts to allocate for inference; setting it very large can significantly increase CPU/RAM usage per request.
In deployments where untrusted users can trigger model runs (e.g., a hosted OpenClaw Gateway / multi-tenant environment), an attacker can repeatedly invoke the model to amplify resource consumption and cause denial of service.

Vulnerable code:

const numCtx = Math.max(
  1,
  Math.floor(
    params.model.contextWindow ?? params.model.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
  ),
);
activeSession.agent.streamFn = wrapOllamaCompatNumCtx(activeSession.agent.streamFn, numCtx);

and the wrapper that writes it into the outgoing payload:

(payloadRecord.options as Record<string, unknown>).num_ctx = numCtx;

Recommendation

Add a hard upper bound (and ideally a configurable bound) for injected num_ctx, and validate/discard absurd contextWindow values coming from config or discovery.

Example mitigation:

const MAX_OLLAMA_NUM_CTX = 262_144; // choose a safe cap for your deployment

const raw = params.model.contextWindow ?? params.model.maxTokens ?? DEFAULT_CONTEXT_TOKENS;
const numCtx = Math.min(
  MAX_OLLAMA_NUM_CTX,
  Math.max(1, Math.floor(raw)),
);

Additional hardening options:

Only inject num_ctx when baseUrl is localhost/loopback unless explicitly opted-in for remote hosts.
Consider defaulting injectNumCtxForOpenAICompat to false for safety, or auto-disable when the upstream is not clearly Ollama.
Add a zod max constraint for contextWindow/maxTokens in configuration schemas (or clamp at runtime) to prevent misconfiguration-driven outages.

3. 🔵 SSRF via configurable Ollama baseUrl used for automatic model discovery (/api/tags, /api/show)

Property	Value
Severity	Low
CWE	CWE-918
Location	`src/agents/models-config.providers.ts:246-251`

Description

discoverOllamaModels() performs HTTP requests to an Ollama native API base derived directly from models.providers.ollama.baseUrl (config), without validation or network allowlisting.

With this change, model discovery now performs:

GET ${apiBase}/api/tags
Up to 200 additional POST ${apiBase}/api/show requests (batched with concurrency 8)

If an attacker can influence models.providers.ollama.baseUrl (e.g., via a config injection path, runtime overrides, or multi-tenant user-supplied config), this becomes a server-side request primitive that can target internal services (including link-local/cloud metadata IPs) as long as the target host exposes the expected paths. Node's fetch() follows redirects by default, which can also enable pivoting to other hosts.

Vulnerable code:

const response = await fetch(`${apiBase}/api/show`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ name: modelName }),
  signal: AbortSignal.timeout(3000),
});

Recommendation

Harden Ollama discovery so a configurable base URL cannot be abused as an SSRF gadget:

Validate and constrain baseUrl/apiBase before use (scheme, hostname, port).
Consider defaulting discovery to localhost-only and requiring an explicit opt-in flag for remote discovery.
Disable redirects (or only allow same-origin redirects) to prevent cross-host pivoting.

Example (sketch):

function validateOllamaBaseUrl(raw: string): string {
  const u = new URL(raw);
  if (u.protocol !== "http:" && u.protocol !== "https:") {
    throw new Error("Invalid Ollama baseUrl scheme");
  }
  // safest default: only allow loopback unless explicitly opted-in
  if (!["127.0.0.1", "localhost", "::1"].includes(u.hostname)) {
    throw new Error("Remote Ollama discovery is not allowed");
  }
  u.username = "";
  u.password = "";
  return u.toString().replace(/\/+$/, "");
}

await fetch(`${apiBase}/api/show`, {
  method: "POST",
  redirect: "error", // prevent pivot
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ name: modelName }),
  signal: AbortSignal.timeout(3000),
});

If remote Ollama is a supported feature, replace the loopback-only check with an explicit allowlist (exact hostnames) or IP-range blocking (e.g., deny 169.254.0.0/16, RFC1918, etc.) depending on your threat model.

4. 🔵 Unbounded contextWindow/maxTokens preserved during provider merge can enable resource-exhaustion (DoS) via large token limits

Property	Value
Severity	Low
CWE	CWE-400
Location	`src/agents/models-config.ts:18-69`

Description

mergeProviderModels() now prefers the higher of the explicit (user-configured) and implicit (catalog) token limits for contextWindow/maxTokens.

Because there is no upper bound enforced on these values (config schema only requires .positive()), a config or generated models.json entry can set extremely large token limits, which then propagate into runtime behavior that uses these numbers to size budgets and inject provider parameters:

The explicit larger values are preserved during merge (changed behavior), making it easier for very large values to reach runtime.
Downstream, these values are used to derive num_ctx for Ollama OpenAI-compat requests (payload.options.num_ctx) and to set context-guard budgets. Excessively large values can effectively disable truncation/guardrails and/or cause providers (and potentially the client) to consume excessive CPU/memory, resulting in denial-of-service.

Vulnerable code (merge preserves larger explicit limits):

function resolvePreferredTokenLimit(explicitValue: number, implicitValue: number): number {
  return explicitValue > implicitValue ? explicitValue : implicitValue;
}
...
contextWindow: resolvePreferredTokenLimit(
  explicitModel.contextWindow,
  implicitModel.contextWindow,
),
maxTokens: resolvePreferredTokenLimit(explicitModel.maxTokens, implicitModel.maxTokens),

Why this is security-relevant:

If the configuration (or model metadata used to produce it) is influenced by an untrusted party (e.g., in a hosted/multi-tenant setup, or when pointing at an untrusted Ollama endpoint that reports a huge context_length), the process may send requests with huge context settings and/or retain extremely large tool/model outputs in memory.

Recommendation

Add hard upper bounds (and integer/finite checks) for contextWindow and maxTokens, and clamp at merge-time and/or normalization-time.

Suggested approach:

Enforce bounds in schema and normalization (recommended):

const MAX_CONTEXT_WINDOW = 1_000_000;      // choose a safe global cap
const MAX_MAX_TOKENS = 262_144;            // choose a safe global cap

const TokenLimitSchema = z.number().int().positive().finite();

contextWindow: TokenLimitSchema.max(MAX_CONTEXT_WINDOW).optional(),
maxTokens: TokenLimitSchema.max(MAX_MAX_TOKENS).optional(),

Also clamp after merge (defense in depth):

function clampTokenLimit(value: number, cap: number): number {
  return Math.min(Math.max(1, Math.floor(value)), cap);
}

const contextWindow = clampTokenLimit(
  resolvePreferredTokenLimit(explicitModel.contextWindow, implicitModel.contextWindow),
  MAX_CONTEXT_WINDOW,
);
const maxTokens = clampTokenLimit(
  resolvePreferredTokenLimit(explicitModel.maxTokens, implicitModel.maxTokens),
  Math.min(MAX_MAX_TOKENS, contextWindow),
);

This prevents extreme values from disabling truncation safeguards or being propagated into provider-specific knobs like num_ctx.

Analyzed PR: #29205 at commit 108bd73

greptile-apps · 2026-02-28T00:49:14Z

Greptile Summary

Unifies Ollama context window handling across discovery, merging, and OpenAI-compatible transport to prevent silent low-context fallback behavior.

Key improvements:

Queries per-model context windows from /api/show during discovery (batched at 8 concurrent, capped at 200 models) with safe 128k fallback
Preserves explicit contextWindow/maxTokens when larger than catalog values, refreshing stale low limits automatically
Fallback model resolution now prefers matching configured model ID before models[0]
Injects options.num_ctx for OpenAI-compatible Ollama transport (configurable via injectNumCtxForOpenAICompat)
Comprehensive test coverage for all discovery, merge, and transport paths

Previous review feedback addressed:

Native Ollama API classification fixed (detection now keys on provider identity + localhost heuristic only)
IPv6 loopback support added (::1 and [::1])
Concurrency capped with batching to prevent discovery overload
Documentation clarified that merge mode uses higher of explicit/implicit limits

Minor observation:
The function name isOllamaCompatProvider checks for Ollama providers generally (not specifically OpenAI-compatible), but this is properly guarded at the call site by checking model.api === "openai-completions", so no functional issue.

All changes are backward-compatible with proper fallbacks and sensible defaults.

Confidence Score: 5/5

Safe to merge with high confidence — all previous review concerns addressed with comprehensive test coverage
Score reflects thorough implementation with proper error handling, comprehensive test coverage across all changed paths, backward compatibility via safe fallbacks, and successful resolution of all previous review threads. The changes are well-scoped to Ollama context handling without affecting other providers.
No files require special attention — implementation is solid with proper guards and fallbacks throughout

_{Last reviewed commit: 108bd73}

greptile-apps

_{14 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

src/agents/pi-embedded-runner/run/attempt.ts

src/agents/models-config.ts

src/agents/models-config.providers.ts

src/agents/pi-embedded-runner/run/attempt.ts

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f342fb9d7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/agents/pi-embedded-runner/run/attempt.ts

…nd OpenAI-compat transport (openclaw#29205) * fix(ollama): inject num_ctx for OpenAI-compatible transport * fix(ollama): discover per-model context and preserve higher limits * fix(agents): prefer matching provider model for fallback limits * fix(types): require numeric token limits in provider model merge * fix(types): accept unknown payload in ollama num_ctx wrapper * fix(types): simplify ollama settled-result extraction * config(models): add provider flag for Ollama OpenAI num_ctx injection * config(schema): allow provider num_ctx injection flag * config(labels): label provider num_ctx injection flag * config(help): document provider num_ctx injection flag * agents(ollama): gate OpenAI num_ctx injection with provider config * tests(ollama): cover provider num_ctx injection flag behavior * docs(config): list provider num_ctx injection option * docs(ollama): document OpenAI num_ctx injection toggle * docs(config): clarify merge token-limit precedence * config(help): note merge uses higher model token limits * fix(ollama): cap /api/show discovery concurrency * fix(ollama): restrict num_ctx injection to OpenAI compat * tests(ollama): cover ipv6 and compat num_ctx gating * fix(ollama): detect remote compat endpoints for ollama-labeled providers * fix(ollama): cap per-model /api/show lookups to bound discovery load

* Changelog: add LanceDB custom baseUrl + dimensions entry (openclaw#17874) * Changelog: add Ollama autodiscovery hardening entry (openclaw#29201) * Changelog: add Ollama context-window unification entry (openclaw#29205) * Changelog: add compaction audit injection removal entry (openclaw#28507) * Changelog: add browser url alias entry (openclaw#29260) * Changelog: add codex weekly usage label entry (openclaw#26267)

…nd OpenAI-compat transport (openclaw#29205) * fix(ollama): inject num_ctx for OpenAI-compatible transport * fix(ollama): discover per-model context and preserve higher limits * fix(agents): prefer matching provider model for fallback limits * fix(types): require numeric token limits in provider model merge * fix(types): accept unknown payload in ollama num_ctx wrapper * fix(types): simplify ollama settled-result extraction * config(models): add provider flag for Ollama OpenAI num_ctx injection * config(schema): allow provider num_ctx injection flag * config(labels): label provider num_ctx injection flag * config(help): document provider num_ctx injection flag * agents(ollama): gate OpenAI num_ctx injection with provider config * tests(ollama): cover provider num_ctx injection flag behavior * docs(config): list provider num_ctx injection option * docs(ollama): document OpenAI num_ctx injection toggle * docs(config): clarify merge token-limit precedence * config(help): note merge uses higher model token limits * fix(ollama): cap /api/show discovery concurrency * fix(ollama): restrict num_ctx injection to OpenAI compat * tests(ollama): cover ipv6 and compat num_ctx gating * fix(ollama): detect remote compat endpoints for ollama-labeled providers * fix(ollama): cap per-model /api/show lookups to bound discovery load

* Changelog: add LanceDB custom baseUrl + dimensions entry (openclaw#17874) * Changelog: add Ollama autodiscovery hardening entry (openclaw#29201) * Changelog: add Ollama context-window unification entry (openclaw#29205) * Changelog: add compaction audit injection removal entry (openclaw#28507) * Changelog: add browser url alias entry (openclaw#29260) * Changelog: add codex weekly usage label entry (openclaw#26267)

…nd OpenAI-compat transport (openclaw#29205) * fix(ollama): inject num_ctx for OpenAI-compatible transport * fix(ollama): discover per-model context and preserve higher limits * fix(agents): prefer matching provider model for fallback limits * fix(types): require numeric token limits in provider model merge * fix(types): accept unknown payload in ollama num_ctx wrapper * fix(types): simplify ollama settled-result extraction * config(models): add provider flag for Ollama OpenAI num_ctx injection * config(schema): allow provider num_ctx injection flag * config(labels): label provider num_ctx injection flag * config(help): document provider num_ctx injection flag * agents(ollama): gate OpenAI num_ctx injection with provider config * tests(ollama): cover provider num_ctx injection flag behavior * docs(config): list provider num_ctx injection option * docs(ollama): document OpenAI num_ctx injection toggle * docs(config): clarify merge token-limit precedence * config(help): note merge uses higher model token limits * fix(ollama): cap /api/show discovery concurrency * fix(ollama): restrict num_ctx injection to OpenAI compat * tests(ollama): cover ipv6 and compat num_ctx gating * fix(ollama): detect remote compat endpoints for ollama-labeled providers * fix(ollama): cap per-model /api/show lookups to bound discovery load

* Changelog: add LanceDB custom baseUrl + dimensions entry (openclaw#17874) * Changelog: add Ollama autodiscovery hardening entry (openclaw#29201) * Changelog: add Ollama context-window unification entry (openclaw#29205) * Changelog: add compaction audit injection removal entry (openclaw#28507) * Changelog: add browser url alias entry (openclaw#29260) * Changelog: add codex weekly usage label entry (openclaw#26267)

* Changelog: add LanceDB custom baseUrl + dimensions entry (openclaw#17874) * Changelog: add Ollama autodiscovery hardening entry (openclaw#29201) * Changelog: add Ollama context-window unification entry (openclaw#29205) * Changelog: add compaction audit injection removal entry (openclaw#28507) * Changelog: add browser url alias entry (openclaw#29260) * Changelog: add codex weekly usage label entry (openclaw#26267) (cherry picked from commit 8090cb4) # Conflicts: # CHANGELOG.md