Description
On a standard Brev g6.xlarge (NVIDIA L4, 23 GB VRAM / 15 GB RAM), nemoclaw onboard with the Local Ollama provider offers nemotron-3-nano:30b (24 GB) as the default starter model. The model pulls successfully but the wizard's post-load probe times out ("did not answer the local probe in time") and the wizard dead-loops, re-offering the same too-large model + "Other" with no smaller suggestion and no abort. Onboard never completes. The model itself is healthy (a direct Ollama request returns in ~4.5s); the rejection is a false negative caused by cold-load latency exceeding the 120s probe window, which is only extended (to 300s) on DGX Spark hosts.
Related (not a duplicate): NVB 6216882 / #4178 fixed the same dead-loop symptom for a different trigger (a pre-installed old Ollama 0.6.2 crashing on model load). This is a distinct trigger — current Ollama, a healthy model, oversized default + 120s probe timeout with no non-Spark retry — and the dead-loop still occurs on v0.0.59, so the loop-escape/diagnostic added for #4178 does not cover this path.
Environment
Device: Brev VM brev-hv34p39z2 (g6.xlarge), NVIDIA L4 23034 MiB, 4 vCPU, 15 GiB RAM
OS: Ubuntu 22.04.5 LTS
Architecture: x86_64
Node.js: v22.22.3
npm: 10.9.8
Docker: 29.5.2
OpenShell CLI: 0.0.44
NemoClaw: v0.0.59
OpenClaw: N/A (onboard does not complete with the default model)
Steps to Reproduce
- On a fresh Brev g6.xlarge (L4) with no Ollama models installed, run
nemoclaw onboard --fresh --name brev-ollama
- At
[3/8] inference, select Local Ollama (Install Ollama on first run).
- At "Ollama starter models", accept the default
[2] nemotron-3-nano:30b.
- Confirm the 22.61 GB download (
y). Wait for the pull to finish.
Expected Result
After the model loads, the wizard verifies it and proceeds. If a model cannot be loaded on this host, the wizard should not offer it as the default and/or should fall back to a host-fitting model, and must never loop indefinitely with no way forward.
Actual Result
Loading Ollama model: nemotron-3-nano:30b
Selected Ollama model 'nemotron-3-nano:30b' did not answer the local probe in time.
It may still be loading, too large for the host, or otherwise unhealthy.
Choose a different Ollama model or select Other.
Ollama models:
1) nemotron-3-nano:30b
2) Other...
Choose model [1]: 1
Loading Ollama model: nemotron-3-nano:30b
Selected Ollama model 'nemotron-3-nano:30b' did not answer the local probe in time.
... (loops indefinitely; onboard never completes)
Evidence the model is healthy (warm direct probe, ~4.5s):
curl -s --max-time 90 http://localhost:11434/api/generate -d '{"model":"nemotron-3-nano:30b","prompt":"say hi","stream":false}'
-> {"response":"Hi there! ..."} total_duration 4493715275 ns (real 0m4.502s)
ollama ps -> nemotron-3-nano:30b 25 GB 15%/85% CPU/GPU CONTEXT 4096
nvidia-smi -> 23034 MiB total, 21332 used, 1232 free (model spills GPU to CPU)
Logs
Onboard preflight detected the host limits BEFORE offering the 24GB default:
NVIDIA GPU detected (NVIDIA L4, 23034 MB)
Memory OK: 15368 MB RAM + 0 MB swap
Root cause (source, NemoClaw v0.0.59):
- nemoclaw/src/lib/inference/local.ts:929 getOllamaProbeCommand — probe is
`curl -sS --max-time 120 http://localhost:11434/api/generate` (direct :11434).
- nemoclaw/src/lib/inference/local.ts:954+ validateOllamaModel — extended 300s retry
only fires when sparkHost is true; a non-Spark host (Brev L4) gets only 120s, no retry,
so a cold load that exceeds 120s reports "did not answer the local probe in time".
- nemoclaw/src/lib/inference/local.ts:85 DEFAULT_OLLAMA_MODEL = nemotron-3-nano:30b (~24GB).
- nemoclaw/src/lib/inference/ollama/proxy.ts (starter list ~line 394) — the starter-model
list is NOT filtered by modelFitsAvailableMemory() (that filter only applies to the
installed-models list); after a probe failure the wizard re-offers the same model with
no host-fitting fallback and no abort.
Workaround verified: pre-pull a host-fitting model (llama3.2:3b) and select it — onboard
completes, and the installed-models list correctly filters out the 24GB model.
NVB#6272260
Description
On a standard Brev g6.xlarge (NVIDIA L4, 23 GB VRAM / 15 GB RAM),
nemoclaw onboardwith the Local Ollama provider offersnemotron-3-nano:30b(24 GB) as the default starter model. The model pulls successfully but the wizard's post-load probe times out ("did not answer the local probe in time") and the wizard dead-loops, re-offering the same too-large model + "Other" with no smaller suggestion and no abort. Onboard never completes. The model itself is healthy (a direct Ollama request returns in ~4.5s); the rejection is a false negative caused by cold-load latency exceeding the 120s probe window, which is only extended (to 300s) on DGX Spark hosts.Related (not a duplicate): NVB 6216882 / #4178 fixed the same dead-loop symptom for a different trigger (a pre-installed old Ollama 0.6.2 crashing on model load). This is a distinct trigger — current Ollama, a healthy model, oversized default + 120s probe timeout with no non-Spark retry — and the dead-loop still occurs on v0.0.59, so the loop-escape/diagnostic added for #4178 does not cover this path.
Environment
Steps to Reproduce
nemoclaw onboard --fresh --name brev-ollama[3/8]inference, select Local Ollama (Install Ollama on first run).[2] nemotron-3-nano:30b.y). Wait for the pull to finish.Expected Result
After the model loads, the wizard verifies it and proceeds. If a model cannot be loaded on this host, the wizard should not offer it as the default and/or should fall back to a host-fitting model, and must never loop indefinitely with no way forward.
Actual Result
Evidence the model is healthy (warm direct probe, ~4.5s):
Logs
NVB#6272260