[Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096

## Description

After a successful Ollama onboard (`provider=ollama-local`) where NemoClaw reports "OpenClaw is ready", the first `openclaw agent` turn fails with **"Context overflow: prompt too large for the model"**. NemoClaw auto-detects the Ollama runtime context length (`num_ctx=4096`, Ollama's default) and adopts it as the model context window, but the agent's own base prompt (system prompt + 28 cataloged tools) is ~7378 tokens — larger than 4096. The wizard prints a warning (`low context window ctx=4096 (warn<8000)`) yet still completes onboard and declares the sandbox ready, leaving an agent that cannot service a single request. Raising the context window to 16384 (`OLLAMA_CONTEXT_LENGTH` + modelsConfig `contextWindow`) resolves the overflow and a response is produced via the auth-proxy path.

## Environment

```text
Device:        Brev VM brev-hv34p39z2 (g6.xlarge), NVIDIA L4 23034 MiB, 4 vCPU, 15 GiB RAM
OS:            Ubuntu 22.04.5 LTS
Architecture:  x86_64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.5.2
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.59
OpenClaw:      2026.5.27 (27ae826)
```

## Steps to Reproduce

1. `nemoclaw onboard --fresh --name brev-ollama`
2. Select Local Ollama; select any model loaded at Ollama's default `num_ctx=4096` (e.g. `llama3.2:3b`). Wizard prints `Using Ollama runtime context length: 4096 tokens` and completes onboard ("OpenClaw is ready").
3. `nemoclaw brev-ollama connect`
4. `openclaw agent --agent main -m "hello" --session-id brev-ollama-e2e`

## Expected Result

The agent returns a non-empty response. If the auto-detected context window is too small for the agent to function, onboard should raise `num_ctx` (the model supports far more — `llama3.2:3b` max ctx 131072), block the model, or not report "ready" — rather than completing onboard and leaving an unusable agent.

## Actual Result

```text
[agent/embedded] low context window: inference/llama3.2:3b ctx=4096 (warn<8000) source=modelsConfig
[agent/embedded] tool-search: cataloged 28 tools behind compact prompt surface
[context-overflow-precheck] estimatedPromptTokens=7378 promptBudgetBeforeReserve=2048 overflowTokens=5330
[agent/embedded] auto-compaction failed: no real conversation messages
[context-overflow-recovery] exhausted provider overflow recovery; livenessState=blocked
Context overflow: prompt too large for the model. Try /reset (or /new) to start a fresh session, or use a larger-context model.
```

Reproduced on both the embedded path and, after auto-pair scope approval, the gateway path.

## Logs

```text
nemoclaw brev-ollama status reported routing healthy throughout (so this is the context budget, not routing):
  Inference (ollama backend): healthy (http://127.0.0.1:11434/api/tags)
  Inference (auth proxy):     healthy (http://127.0.0.1:11435/api/tags)

Root cause (source, NemoClaw v0.0.59):
- nemoclaw/src/lib/inference/ollama-runtime-context.ts (applyOllamaRuntimeContextWindow):
  when NEMOCLAW_CONTEXT_WINDOW is unset, NemoClaw adopts runtimeStatus.contextLength
  (=4096) as the context window and logs the "low context window" warning but does not
  raise it or block onboard.
- nemoclaw/src/lib/onboard/dockerfile-patch.ts:127 — NEMOCLAW_CONTEXT_WINDOW is baked
  into the sandbox image, so 4096 lands in /sandbox/.openclaw/openclaw.json
  models[].contextWindow; the agent overflow precheck uses it (source=modelsConfig).
Ollama's default num_ctx is 4096 regardless of the model's true max, so the agent's
~7378-token base prompt cannot fit and every turn overflows before any conversation
exists (compaction has nothing to compact -> hard block).

Workaround verified: set OLLAMA_CONTEXT_LENGTH=16384 (Ollama serves 16384) AND set
modelsConfig contextWindow=16384 -> overflow gone, agent returns a non-empty response.
```

**Interplay:** because the default model (nemotron-3-nano:30b) is blocked on a standard L4 by NVB 6272260 (Ollama model-probe dead-loop), users are pushed to smaller models that load at Ollama's default 4096 `num_ctx`, making this the next wall.

---
[NVB#6272262](https://nvbugspro.nvidia.com/bug/6272262)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Brev][Inference] OpenClaw agent immediately hits context overflow after Ollama onboard — runtime context window 4096 #4813

Description

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions