Skip to content

[macOS][Onboard] Ollama model pull hardcoded 10-min wall-clock timeout — fails on any network below ~8 MB/s sustained, default 7B model unable to complete #2610

@hulynn

Description

@hulynn

Description

Description

During `nemoclaw onboard` with Local Ollama provider, the Ollama model pull is wrapped with a hardcoded 10-minute wall-clock timeout that is NOT progress-aware. Any network slower than ~8 MB/s sustained (4.7 GB / 600 s) will hit the timeout before the pull completes, even though the underlying `ollama pull` supports incremental resume.

Source (NemoClaw v0.0.28, dist/lib/onboard-ollama-proxy.js:204-215):

function pullOllamaModel(model) {
const result = spawnSync("bash", ["-c", ollama pull ${shellQuote(model)}], {
cwd: ROOT,
encoding: "utf8",
stdio: "inherit",
timeout: 600_000, ← 10-min HARD wall-clock
env: { ...process.env },
});
if (result.signal === "SIGTERM") {
console.error( Model pull timed out after 10 minutes. Try a smaller model or check your network connection.);
return false;
}
return result.status === 0;
}

There is no environment-variable override (no NEMOCLAW_OLLAMA_PULL_TIMEOUT or similar; verified by grep across dist/) and no progress-aware "no new bytes for N seconds" mode. The user has to retry manually; each retry resumes from the last checkpoint, but each retry also wastes a full 10 minutes before the wrapper SIGTERMs the in-progress download.

User impact: any developer on China mainland / corporate VPN / public WiFi / remote Brev-style mac all experience this. The default Ollama menu recommendation is qwen2.5:7b (4.7 GB) — the very first option in the "local Ollama starter models" list — so this hits the recommended path, not an edge case.

Environment

Device:        MacBook (Apple M4)
OS:            macOS (recent)
Architecture:  arm64
Node.js:       Not captured
npm:           Not captured
Docker:        Not captured
OpenShell CLI: openshell 0.0.36
NemoClaw:      v0.0.28
OpenClaw:      Not reached (onboard fails before sandbox creation)

Source ref: dist/lib/onboard-ollama-proxy.js:204-215 (v0.0.28 release).
Steps to Reproduce
1. On a macOS host where sustained download bandwidth from registry.ollama.ai
   is below ~8 MB/s (e.g., behind a corporate VPN, on public WiFi, or on a
   remote-developer machine in a region far from the Ollama CDN edge):

     curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash

2. Onboard with Local Ollama → first menu option (qwen2.5:7b, 4.7 GB):

     nemoclaw onboard
     # Provider [1]: 7  (Install Ollama macOS)  [or 7) Local Ollama if already installed]
     # Choose model [1]: 1   (qwen2.5:7b)

3. Watch the pull progress. Around 10 minutes wall-clock, the wrapper kills
   the pull with SIGTERM and prints:

     Model pull timed out after 10 minutes. Try a smaller model or check
     your network connection.

4. Re-run `nemoclaw onboard`, pick the same model. Pull resumes from where
   it stopped (Ollama supports incremental download), but again gets killed
   at the next 10-minute boundary.

Observed in real run: 4 retries reached 21% / 43% / 67% / 91% before each SIGTERM at ~600 s.
Expected Result
Either:

(a) **No-progress timeout** — only kill the pull if no new bytes have been
    received for N seconds (e.g., 60 s). A slow but progressing download
    should be allowed to complete.

(b) **User-configurable wall-clock timeout** via env var, e.g.
    `NEMOCLAW_OLLAMA_PULL_TIMEOUT=3600`. Document this in the onboard help
    or error message.

(c) **At minimum, a longer default timeout** (30+ minutes) and an error
    message that suggests both retry-to-resume AND the env var override —
    not just "Try a smaller model" (the menu's first option IS the model
    that hits this).
Actual Result
Hardcoded 10-minute SIGTERM. Error message:

  Model pull timed out after 10 minutes. Try a smaller model or check
  your network connection.

The "smaller model" advice is misleading — qwen2.5:7b is the menu's
DEFAULT recommendation. The available smaller alternative (1.5b at
~986 MB) is not surfaced in the wizard's starter-model list as a
fallback option. The user has to know to type a custom name, or
manually retry-and-resume up to 4 times.

Real observed run (4 retries, each killed at 10 min, all incremental):
  retry 1: 21%   ← SIGTERM at 10:00
  retry 2: 43%   ← SIGTERM at 10:00
  retry 3: 67%   ← SIGTERM at 10:00
  retry 4: 91%   ← SIGTERM at 10:00 (last 0.5 GB still pending)

Total wall-clock spent on a single 4.7 GB pull: ~40 minutes vs. an
ollama-direct pull that just completes when it completes.
Logs
Source ref: NemoClaw v0.0.28 release.
File:       dist/lib/onboard-ollama-proxy.js
Function:   pullOllamaModel(model)
Line:       208 (timeout: 600_000)

Searched dist/ for "NEMOCLAW_OLLAMA_PULL_TIMEOUT" / "PULL_TIMEOUT" / any
env-var-driven override of the 600_000 constant — none found. So the
10-minute limit is the only path; the user has no escape hatch.


Suggested fix (better):
  Replace wall-clock timer with a no-progress timer — track stdout/stderr
  byte count and SIGTERM only if no new bytes for N seconds.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw-SWQA-RelBlckr-Recommended, NemoClaw-SWQA-VDR

[NVB#6122412]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.area: cliCommand line interface, flags, terminal UX, or outputplatform: macosAffects macOS, including Apple Siliconprovider: ollamaOllama local model provider behavior

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions