Description
After a successful Ollama onboard (provider=ollama-local) where NemoClaw reports "OpenClaw is ready", the first openclaw agent turn fails with "Context overflow: prompt too large for the model". NemoClaw auto-detects the Ollama runtime context length (num_ctx=4096, Ollama's default) and adopts it as the model context window, but the agent's own base prompt (system prompt + 28 cataloged tools) is ~7378 tokens — larger than 4096. The wizard prints a warning (low context window ctx=4096 (warn<8000)) yet still completes onboard and declares the sandbox ready, leaving an agent that cannot service a single request. Raising the context window to 16384 (OLLAMA_CONTEXT_LENGTH + modelsConfig contextWindow) resolves the overflow and a response is produced via the auth-proxy path.
Environment
Device: Brev VM brev-hv34p39z2 (g6.xlarge), NVIDIA L4 23034 MiB, 4 vCPU, 15 GiB RAM
OS: Ubuntu 22.04.5 LTS
Architecture: x86_64
Node.js: v22.22.3
npm: 10.9.8
Docker: 29.5.2
OpenShell CLI: 0.0.44
NemoClaw: v0.0.59
OpenClaw: 2026.5.27 (27ae826)
Steps to Reproduce
nemoclaw onboard --fresh --name brev-ollama
- Select Local Ollama; select any model loaded at Ollama's default
num_ctx=4096 (e.g. llama3.2:3b). Wizard prints Using Ollama runtime context length: 4096 tokens and completes onboard ("OpenClaw is ready").
nemoclaw brev-ollama connect
openclaw agent --agent main -m "hello" --session-id brev-ollama-e2e
Expected Result
The agent returns a non-empty response. If the auto-detected context window is too small for the agent to function, onboard should raise num_ctx (the model supports far more — llama3.2:3b max ctx 131072), block the model, or not report "ready" — rather than completing onboard and leaving an unusable agent.
Actual Result
[agent/embedded] low context window: inference/llama3.2:3b ctx=4096 (warn<8000) source=modelsConfig
[agent/embedded] tool-search: cataloged 28 tools behind compact prompt surface
[context-overflow-precheck] estimatedPromptTokens=7378 promptBudgetBeforeReserve=2048 overflowTokens=5330
[agent/embedded] auto-compaction failed: no real conversation messages
[context-overflow-recovery] exhausted provider overflow recovery; livenessState=blocked
Context overflow: prompt too large for the model. Try /reset (or /new) to start a fresh session, or use a larger-context model.
Reproduced on both the embedded path and, after auto-pair scope approval, the gateway path.
Logs
nemoclaw brev-ollama status reported routing healthy throughout (so this is the context budget, not routing):
Inference (ollama backend): healthy (http://127.0.0.1:11434/api/tags)
Inference (auth proxy): healthy (http://127.0.0.1:11435/api/tags)
Root cause (source, NemoClaw v0.0.59):
- nemoclaw/src/lib/inference/ollama-runtime-context.ts (applyOllamaRuntimeContextWindow):
when NEMOCLAW_CONTEXT_WINDOW is unset, NemoClaw adopts runtimeStatus.contextLength
(=4096) as the context window and logs the "low context window" warning but does not
raise it or block onboard.
- nemoclaw/src/lib/onboard/dockerfile-patch.ts:127 — NEMOCLAW_CONTEXT_WINDOW is baked
into the sandbox image, so 4096 lands in /sandbox/.openclaw/openclaw.json
models[].contextWindow; the agent overflow precheck uses it (source=modelsConfig).
Ollama's default num_ctx is 4096 regardless of the model's true max, so the agent's
~7378-token base prompt cannot fit and every turn overflows before any conversation
exists (compaction has nothing to compact -> hard block).
Workaround verified: set OLLAMA_CONTEXT_LENGTH=16384 (Ollama serves 16384) AND set
modelsConfig contextWindow=16384 -> overflow gone, agent returns a non-empty response.
Interplay: because the default model (nemotron-3-nano:30b) is blocked on a standard L4 by NVB 6272260 (Ollama model-probe dead-loop), users are pushed to smaller models that load at Ollama's default 4096 num_ctx, making this the next wall.
NVB#6272262
Description
After a successful Ollama onboard (
provider=ollama-local) where NemoClaw reports "OpenClaw is ready", the firstopenclaw agentturn fails with "Context overflow: prompt too large for the model". NemoClaw auto-detects the Ollama runtime context length (num_ctx=4096, Ollama's default) and adopts it as the model context window, but the agent's own base prompt (system prompt + 28 cataloged tools) is ~7378 tokens — larger than 4096. The wizard prints a warning (low context window ctx=4096 (warn<8000)) yet still completes onboard and declares the sandbox ready, leaving an agent that cannot service a single request. Raising the context window to 16384 (OLLAMA_CONTEXT_LENGTH+ modelsConfigcontextWindow) resolves the overflow and a response is produced via the auth-proxy path.Environment
Steps to Reproduce
nemoclaw onboard --fresh --name brev-ollamanum_ctx=4096(e.g.llama3.2:3b). Wizard printsUsing Ollama runtime context length: 4096 tokensand completes onboard ("OpenClaw is ready").nemoclaw brev-ollama connectopenclaw agent --agent main -m "hello" --session-id brev-ollama-e2eExpected Result
The agent returns a non-empty response. If the auto-detected context window is too small for the agent to function, onboard should raise
num_ctx(the model supports far more —llama3.2:3bmax ctx 131072), block the model, or not report "ready" — rather than completing onboard and leaving an unusable agent.Actual Result
Reproduced on both the embedded path and, after auto-pair scope approval, the gateway path.
Logs
Interplay: because the default model (nemotron-3-nano:30b) is blocked on a standard L4 by NVB 6272260 (Ollama model-probe dead-loop), users are pushed to smaller models that load at Ollama's default 4096
num_ctx, making this the next wall.NVB#6272262