Skip to content

[Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813

@hulynn

Description

@hulynn

Description

After a successful Ollama onboard (provider=ollama-local) where NemoClaw reports "OpenClaw is ready", the first openclaw agent turn fails with "Context overflow: prompt too large for the model". NemoClaw auto-detects the Ollama runtime context length (num_ctx=4096, Ollama's default) and adopts it as the model context window, but the agent's own base prompt (system prompt + 28 cataloged tools) is ~7378 tokens — larger than 4096. The wizard prints a warning (low context window ctx=4096 (warn<8000)) yet still completes onboard and declares the sandbox ready, leaving an agent that cannot service a single request. Raising the context window to 16384 (OLLAMA_CONTEXT_LENGTH + modelsConfig contextWindow) resolves the overflow and a response is produced via the auth-proxy path.

Environment

Device:        Brev VM brev-hv34p39z2 (g6.xlarge), NVIDIA L4 23034 MiB, 4 vCPU, 15 GiB RAM
OS:            Ubuntu 22.04.5 LTS
Architecture:  x86_64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.5.2
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.59
OpenClaw:      2026.5.27 (27ae826)

Steps to Reproduce

  1. nemoclaw onboard --fresh --name brev-ollama
  2. Select Local Ollama; select any model loaded at Ollama's default num_ctx=4096 (e.g. llama3.2:3b). Wizard prints Using Ollama runtime context length: 4096 tokens and completes onboard ("OpenClaw is ready").
  3. nemoclaw brev-ollama connect
  4. openclaw agent --agent main -m "hello" --session-id brev-ollama-e2e

Expected Result

The agent returns a non-empty response. If the auto-detected context window is too small for the agent to function, onboard should raise num_ctx (the model supports far more — llama3.2:3b max ctx 131072), block the model, or not report "ready" — rather than completing onboard and leaving an unusable agent.

Actual Result

[agent/embedded] low context window: inference/llama3.2:3b ctx=4096 (warn<8000) source=modelsConfig
[agent/embedded] tool-search: cataloged 28 tools behind compact prompt surface
[context-overflow-precheck] estimatedPromptTokens=7378 promptBudgetBeforeReserve=2048 overflowTokens=5330
[agent/embedded] auto-compaction failed: no real conversation messages
[context-overflow-recovery] exhausted provider overflow recovery; livenessState=blocked
Context overflow: prompt too large for the model. Try /reset (or /new) to start a fresh session, or use a larger-context model.

Reproduced on both the embedded path and, after auto-pair scope approval, the gateway path.

Logs

nemoclaw brev-ollama status reported routing healthy throughout (so this is the context budget, not routing):
  Inference (ollama backend): healthy (http://127.0.0.1:11434/api/tags)
  Inference (auth proxy):     healthy (http://127.0.0.1:11435/api/tags)

Root cause (source, NemoClaw v0.0.59):
- nemoclaw/src/lib/inference/ollama-runtime-context.ts (applyOllamaRuntimeContextWindow):
  when NEMOCLAW_CONTEXT_WINDOW is unset, NemoClaw adopts runtimeStatus.contextLength
  (=4096) as the context window and logs the "low context window" warning but does not
  raise it or block onboard.
- nemoclaw/src/lib/onboard/dockerfile-patch.ts:127 — NEMOCLAW_CONTEXT_WINDOW is baked
  into the sandbox image, so 4096 lands in /sandbox/.openclaw/openclaw.json
  models[].contextWindow; the agent overflow precheck uses it (source=modelsConfig).
Ollama's default num_ctx is 4096 regardless of the model's true max, so the agent's
~7378-token base prompt cannot fit and every turn overflows before any conversation
exists (compaction has nothing to compact -> hard block).

Workaround verified: set OLLAMA_CONTEXT_LENGTH=16384 (Ollama serves 16384) AND set
modelsConfig contextWindow=16384 -> overflow gone, agent returns a non-empty response.

Interplay: because the default model (nemotron-3-nano:30b) is blocked on a standard L4 by NVB 6272260 (Ollama model-probe dead-loop), users are pushed to smaller models that load at Ollama's default 4096 num_ctx, making this the next wall.


NVB#6272262

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: inferenceInference routing, serving, model selection, or outputsarea: local-modelsLocal model providers, downloads, launch, or connectivityplatform: brevAffects Brev hosted development environmentsprovider: ollamaOllama local model provider behavior

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions