[Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out

## Description

On a standard Brev g6.xlarge (NVIDIA L4, 23 GB VRAM / 15 GB RAM), `nemoclaw onboard` with the Local Ollama provider offers `nemotron-3-nano:30b` (24 GB) as the default starter model. The model pulls successfully but the wizard's post-load probe times out ("did not answer the local probe in time") and the wizard **dead-loops**, re-offering the same too-large model + "Other" with no smaller suggestion and no abort. Onboard never completes. The model itself is healthy (a direct Ollama request returns in ~4.5s); the rejection is a false negative caused by cold-load latency exceeding the 120s probe window, which is only extended (to 300s) on DGX Spark hosts.

**Related (not a duplicate):** NVB 6216882 / #4178 fixed the same dead-loop *symptom* for a different trigger (a pre-installed old Ollama 0.6.2 crashing on model load). This is a distinct trigger — current Ollama, a healthy model, oversized default + 120s probe timeout with no non-Spark retry — and the dead-loop still occurs on v0.0.59, so the loop-escape/diagnostic added for #4178 does not cover this path.

## Environment

```text
Device:        Brev VM brev-hv34p39z2 (g6.xlarge), NVIDIA L4 23034 MiB, 4 vCPU, 15 GiB RAM
OS:            Ubuntu 22.04.5 LTS
Architecture:  x86_64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.5.2
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.59
OpenClaw:      N/A (onboard does not complete with the default model)
```

## Steps to Reproduce

1. On a fresh Brev g6.xlarge (L4) with no Ollama models installed, run `nemoclaw onboard --fresh --name brev-ollama`
2. At `[3/8]` inference, select Local Ollama (Install Ollama on first run).
3. At "Ollama starter models", accept the default `[2] nemotron-3-nano:30b`.
4. Confirm the 22.61 GB download (`y`). Wait for the pull to finish.

## Expected Result

After the model loads, the wizard verifies it and proceeds. If a model cannot be loaded on this host, the wizard should not offer it as the default and/or should fall back to a host-fitting model, and must never loop indefinitely with no way forward.

## Actual Result

```text
Loading Ollama model: nemotron-3-nano:30b
Selected Ollama model 'nemotron-3-nano:30b' did not answer the local probe in time.
It may still be loading, too large for the host, or otherwise unhealthy.
Choose a different Ollama model or select Other.

  Ollama models:
    1) nemotron-3-nano:30b
    2) Other...
  Choose model [1]: 1
Loading Ollama model: nemotron-3-nano:30b
Selected Ollama model 'nemotron-3-nano:30b' did not answer the local probe in time.
... (loops indefinitely; onboard never completes)
```

Evidence the model is healthy (warm direct probe, ~4.5s):

```text
curl -s --max-time 90 http://localhost:11434/api/generate -d '{"model":"nemotron-3-nano:30b","prompt":"say hi","stream":false}'
 -> {"response":"Hi there! ..."}  total_duration 4493715275 ns  (real 0m4.502s)
ollama ps  -> nemotron-3-nano:30b  25 GB  15%/85% CPU/GPU  CONTEXT 4096
nvidia-smi -> 23034 MiB total, 21332 used, 1232 free  (model spills GPU to CPU)
```

## Logs

```text
Onboard preflight detected the host limits BEFORE offering the 24GB default:
  NVIDIA GPU detected (NVIDIA L4, 23034 MB)
  Memory OK: 15368 MB RAM + 0 MB swap

Root cause (source, NemoClaw v0.0.59):
- nemoclaw/src/lib/inference/local.ts:929 getOllamaProbeCommand — probe is
  `curl -sS --max-time 120 http://localhost:11434/api/generate` (direct :11434).
- nemoclaw/src/lib/inference/local.ts:954+ validateOllamaModel — extended 300s retry
  only fires when sparkHost is true; a non-Spark host (Brev L4) gets only 120s, no retry,
  so a cold load that exceeds 120s reports "did not answer the local probe in time".
- nemoclaw/src/lib/inference/local.ts:85 DEFAULT_OLLAMA_MODEL = nemotron-3-nano:30b (~24GB).
- nemoclaw/src/lib/inference/ollama/proxy.ts (starter list ~line 394) — the starter-model
  list is NOT filtered by modelFitsAvailableMemory() (that filter only applies to the
  installed-models list); after a probe failure the wizard re-offers the same model with
  no host-fitting fallback and no abort.

Workaround verified: pre-pull a host-fitting model (llama3.2:3b) and select it — onboard
completes, and the installed-models list correctly filters out the 24GB model.
```

---
[NVB#6272260](https://nvbugspro.nvidia.com/bug/6272260)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Brev][Onboard] nemoclaw onboard dead-loops on Ollama model selection after the default model probe times out #4812

Description

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions