Description
Description
On ARM64/aarch64 (Snapdragon X + NVIDIA JMJWOA-Generic-GPU, **64 GB iGPU** — hardware upgraded from earlier 8 GB SKU), the guide-recommended Ollama model qwen3.6:35b is still functionally unusable under the OpenClaw agent loop: every sanity prompt returns no token within OpenClaw's 300s wait window, and the openclaw-gateway then crashes with 1006 abnormal closure, dropping into the EMBEDDED FALLBACK path. Three different prompts (file create / quicksort exec / URL fetch) all hit the same silent-timeout pattern, EXIT=124 from `timeout 300`, with `~/openclaw-sanity/` left completely empty (no tool actually fired).
The 8 GB-SKU variant of this problem was tracked as NVBug 6129886 (filed by me) and is CLOSED as Bug-Fixed, since the public DGX Spark guide was the immediate concern and Windows ARM reference hardware was being upgraded to a larger iGPU. After the hardware upgrade to 64 GB the symptom persists, so this is a separate, post-hardware-upgrade regression that the closed bug does not cover. Filing fresh per QA's prior guidance — see comments below for cross-references.
Contrast (same box, same OpenClaw build, same prompts, only model differs):
| Model | p1 file-create | p2 quicksort | p3 URL-fetch | Workspace |
|--------------------|----------------|--------------|--------------|-----------|
| qwen3.6:35b | EXIT=124 @300s | EXIT=124 @300s | EXIT=124 @300s | empty |
| qwen3.6:35b (retry)| EXIT=124 @410s | (not reached) | (not reached) | empty |
| qwen3.5 (default ~8B) | EXIT=0 @78s | EXIT=0 @17s | EXIT=0 @26s | empty * |
* qwen3.5 returns a coherent natural-language reply each time but the
embedded-agent tool plumbing does not persist filesystem side effects
in this run — that is a separate openclaw-gateway issue (1006 closure
+ EMBEDDED FALLBACK) and is tracked alongside NVBug 6168039.
The 35B vs 8B-class delta is the operative finding: with the same OpenClaw build, same gateway state, same prompt strings, same Ollama daemon, only the model name changing, the 8B-class model returns a model-side answer in tens of seconds while 35B returns zero bytes within 5 minutes. This isolates the failure to a 35B-on-this-iGPU compute/bandwidth ceiling under the agentic request shape (long system prompt + tools array + streaming).
Cross-reference: NVBug 6174615 (Arhan Banerjee, Open) tracks the same model (Qwen 3.6-35B Q4_K_M) showing significantly lower performance on Windows ARM reference host versus M5 Pro on the SoC perf team's Llama.cpp + AIPerf benchmark suite. That is a perf-comparison bug; mine is a functional-blocker bug under OpenClaw's actual agent-loop request shape. They corroborate each other (same model, same platform, both report the model is too slow to be useful) and should likely be linked.
Cross-reference: NVBug 6162831 (text-JSON pseudo-tool-call under agentic clients) is a DIFFERENT degradation mode on the same Windows ARM platform — there the model emits output that the agentic client cannot route to a tool; here the model emits no output at all within the agent-loop timeout. Both reduce to "agentic workload on local Ollama on Windows ARM reference host is currently not viable", but the failure points are distinct and worth tracking separately.
Environment
Device: ARM64 reference (DESKTOP-FBOH36I), accessed via two-hop SSH
OS: Windows 11 + WSL2 Ubuntu 24.04.4 LTS (kernel 6.6.87.2-microsoft-standard-WSL2)
Architecture: aarch64
Node.js: v22.22.3
npm: 10.9.8
Docker: N/A (standalone OpenClaw path, not NemoClaw sandbox)
OpenShell CLI: N/A
NemoClaw: N/A
OpenClaw: 2026.5.7 (eeef486)
Ollama: 0.23.4
GPU: NVIDIA JMJWOA-Generic-GPU, 65471 MiB VRAM (~64 GB) — POST hardware upgrade
GPU driver: 592.60, CUDA 13.1
Host RAM: 63.9 GiB
Steps to Reproduce
1. On ARM64/aarch64 WSL2 Ubuntu 24.04 with 64 GB iGPU, install Ollama 0.23.4 and OpenClaw 2026.5.7. Pre-pull qwen3.6:35b.
2. Confirm hardware: `nvidia-smi` shows JMJWOA-Generic-GPU with 65471 MiB. Confirm model: `ollama list | grep qwen3.6:35b` shows the Q4_K_M weights are present.
3. Set the OpenClaw env: `export OLLAMA_API_KEY=dummy`
4. Create a clean workspace: `mkdir -p ~/openclaw-sanity-35b && cd ~/openclaw-sanity-35b`
5. Run prompt 1 (file create) non-interactively:
timeout 300 /home/lab/.hermes/node/bin/openclaw agent --agent main \
--model ollama/qwen3.6:35b \
--message "Create a text file named test.txt in the current directory containing the text 'hello from OpenClaw WSL2'"
6. Observe: process exits with code 124 at 300s wall clock; no tokens streamed; workspace empty; gateway log shows 1006 abnormal closure + EMBEDDED FALLBACK message.
7. Repeat with prompts 2 (quicksort exec) and 3 (URL fetch + author identification) — same EXIT=124 / empty workspace / no model output.
8. Swap model to `ollama/qwen3.5` (default ~8B variant), same script, same gateway — model returns a coherent reply in 17-78s for each prompt.
Expected Result
qwen3.6:35b on a 64 GB iGPU should return tokens within OpenClaw's standard wait window (NEMOCLAW_LOCAL_INFERENCE_TIMEOUT default 180s; OpenClaw's 300s agent-step timeout). For the canonical sanity prompts in the DGX Spark + Windows ARM reference host guide, `test.txt` should land on disk with the expected content, quicksort should run and print `[1,2,3,6,7,8]`, and the URL-fetch prompt should return the article's author. This is the behavior observed for 8B-class models (qwen3.5) on the same box.
Actual Result
Six consecutive runs over two sessions on the upgraded 64 GB box:
Session A (qwen3.6:35b direct):
--- 11:47:35 p1-create-test-txt ---
--- p1-create-test-txt WALL=717.185s EXIT=124 ---
EMBEDDED FALLBACK: Gateway agent failed; running embedded agent:
GatewayTransportError: gateway closed (1006 abnormal closure (no close frame))
workspace: empty
Session B (qwen3.6:35b retry after gateway restart):
--- 12:15:43 p1-q8b ---
--- p1-q8b WALL=410.609s EXIT=124 ---
EMBEDDED FALLBACK: ... 1006 abnormal closure ...
workspace: empty
Contrast (qwen3.5 default ~8B, same box, same gateway, immediately after):
--- 12:25:29 p1-q35 ---
--- p1-q35 WALL=78.1795s EXIT=0 ---
model reply: "Done! Created `test.txt` with the content ..."
--- p2-q35 WALL=17.2206s EXIT=0 ---
model reply: "Sorted: [1, 2, 3, 6, 7, 8] ..."
--- p3-q35 WALL=25.5321s EXIT=0 ---
model reply: "... I can't find an author listed on this page ..."
The only delta between the failing and passing runs is the model name (qwen3.6:35b vs qwen3.5). All other variables (host, kernel, Ollama daemon, OpenClaw build, gateway state, prompts, env vars) are held constant.
Logs
Full session logs on the lab dev host:
/home/lab/day0-automation/20260514/T5987925.log (~520 lines, both sessions)
/home/lab/day0-automation/20260514/_t5987925_q35.out (qwen3.5 contrast run stdout)
Representative gateway error block reproduced on every 35B attempt:
EMBEDDED FALLBACK: Gateway agent failed; running embedded agent:
GatewayTransportError: gateway closed (1006 abnormal closure (no close frame)): no close reason
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/lab/.openclaw/openclaw.json
Bind: loopback
[plugins][ollama] WSL2 crash-loop risk: ollama.service is enabled with
Restart=always and CUDA is visible. (...std mitigation suggestions...)
NOT captured: per-token streaming logs from inside Ollama for 35B (the runner stays "warming up" with no tokens produced before the 300s OpenClaw timeout fires).
Suggested Fix
Short term:
1. Bump the recommended default Ollama model on Windows ARM reference host 64 GB iGPU from qwen3.6:35b to an 8B-class model (qwen3.5 default, qwen2.5:7b, llama3.1:8b). The DGX Spark guide's model recommendation continues to not carry to Windows ARM reference host even after the iGPU upgrade.
2. In OpenClaw onboarding / preflight, detect Windows ARM reference host (JMJWOA-Generic-GPU) and warn / refuse 35B+ class models with an actionable error pointing at the 8B-class recommendation, similar to how the existing preflight detects insufficient RAM.
Longer term:
3. Coordinate with the SoC perf team owning NVBug 6174615 — the underlying issue is that qwen3.6:35b Q4_K_M is significantly slower on Windows ARM reference host than on competitor SoCs, which makes it incompatible with OpenClaw's 300s agent-step timeout. Either the model needs SoC-specific tuning (offload split, attention impl) OR the agent-step timeout for known-slow models needs to be configurable per-model.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Engineer |
Aaron Erickson |
| Requester |
Eric Wang (SW-GPU - SH) |
| Keyword |
ARM64_Day0, ARM64_SW_Tracking, NemoClaw, SWQA_N1, WinAI-Windows ARM reference host |
| Days Open |
4 |
[NVB#6178932]
Description
Description
On ARM64/aarch64 (Snapdragon X + NVIDIA JMJWOA-Generic-GPU, **64 GB iGPU** — hardware upgraded from earlier 8 GB SKU), the guide-recommended Ollama model qwen3.6:35b is still functionally unusable under the OpenClaw agent loop: every sanity prompt returns no token within OpenClaw's 300s wait window, and the openclaw-gateway then crashes with 1006 abnormal closure, dropping into the EMBEDDED FALLBACK path. Three different prompts (file create / quicksort exec / URL fetch) all hit the same silent-timeout pattern, EXIT=124 from `timeout 300`, with `~/openclaw-sanity/` left completely empty (no tool actually fired). The 8 GB-SKU variant of this problem was tracked as NVBug 6129886 (filed by me) and is CLOSED as Bug-Fixed, since the public DGX Spark guide was the immediate concern and Windows ARM reference hardware was being upgraded to a larger iGPU. After the hardware upgrade to 64 GB the symptom persists, so this is a separate, post-hardware-upgrade regression that the closed bug does not cover. Filing fresh per QA's prior guidance — see comments below for cross-references. Contrast (same box, same OpenClaw build, same prompts, only model differs): | Model | p1 file-create | p2 quicksort | p3 URL-fetch | Workspace | |--------------------|----------------|--------------|--------------|-----------| | qwen3.6:35b | EXIT=124 @300s | EXIT=124 @300s | EXIT=124 @300s | empty | | qwen3.6:35b (retry)| EXIT=124 @410s | (not reached) | (not reached) | empty | | qwen3.5 (default ~8B) | EXIT=0 @78s | EXIT=0 @17s | EXIT=0 @26s | empty * | * qwen3.5 returns a coherent natural-language reply each time but the embedded-agent tool plumbing does not persist filesystem side effects in this run — that is a separate openclaw-gateway issue (1006 closure + EMBEDDED FALLBACK) and is tracked alongside NVBug 6168039. The 35B vs 8B-class delta is the operative finding: with the same OpenClaw build, same gateway state, same prompt strings, same Ollama daemon, only the model name changing, the 8B-class model returns a model-side answer in tens of seconds while 35B returns zero bytes within 5 minutes. This isolates the failure to a 35B-on-this-iGPU compute/bandwidth ceiling under the agentic request shape (long system prompt + tools array + streaming). Cross-reference: NVBug 6174615 (Arhan Banerjee, Open) tracks the same model (Qwen 3.6-35B Q4_K_M) showing significantly lower performance on Windows ARM reference host versus M5 Pro on the SoC perf team's Llama.cpp + AIPerf benchmark suite. That is a perf-comparison bug; mine is a functional-blocker bug under OpenClaw's actual agent-loop request shape. They corroborate each other (same model, same platform, both report the model is too slow to be useful) and should likely be linked. Cross-reference: NVBug 6162831 (text-JSON pseudo-tool-call under agentic clients) is a DIFFERENT degradation mode on the same Windows ARM platform — there the model emits output that the agentic client cannot route to a tool; here the model emits no output at all within the agent-loop timeout. Both reduce to "agentic workload on local Ollama on Windows ARM reference host is currently not viable", but the failure points are distinct and worth tracking separately.Environment Steps to Reproduce1. On ARM64/aarch64 WSL2 Ubuntu 24.04 with 64 GB iGPU, install Ollama 0.23.4 and OpenClaw 2026.5.7. Pre-pull qwen3.6:35b. 2. Confirm hardware: `nvidia-smi` shows JMJWOA-Generic-GPU with 65471 MiB. Confirm model: `ollama list | grep qwen3.6:35b` shows the Q4_K_M weights are present. 3. Set the OpenClaw env: `export OLLAMA_API_KEY=dummy` 4. Create a clean workspace: `mkdir -p ~/openclaw-sanity-35b && cd ~/openclaw-sanity-35b` 5. Run prompt 1 (file create) non-interactively: timeout 300 /home/lab/.hermes/node/bin/openclaw agent --agent main \ --model ollama/qwen3.6:35b \ --message "Create a text file named test.txt in the current directory containing the text 'hello from OpenClaw WSL2'" 6. Observe: process exits with code 124 at 300s wall clock; no tokens streamed; workspace empty; gateway log shows 1006 abnormal closure + EMBEDDED FALLBACK message. 7. Repeat with prompts 2 (quicksort exec) and 3 (URL fetch + author identification) — same EXIT=124 / empty workspace / no model output. 8. Swap model to `ollama/qwen3.5` (default ~8B variant), same script, same gateway — model returns a coherent reply in 17-78s for each prompt.Expected Result Actual ResultSix consecutive runs over two sessions on the upgraded 64 GB box: Session A (qwen3.6:35b direct): --- 11:47:35 p1-create-test-txt --- --- p1-create-test-txt WALL=717.185s EXIT=124 --- EMBEDDED FALLBACK: Gateway agent failed; running embedded agent: GatewayTransportError: gateway closed (1006 abnormal closure (no close frame)) workspace: empty Session B (qwen3.6:35b retry after gateway restart): --- 12:15:43 p1-q8b --- --- p1-q8b WALL=410.609s EXIT=124 --- EMBEDDED FALLBACK: ... 1006 abnormal closure ... workspace: empty Contrast (qwen3.5 default ~8B, same box, same gateway, immediately after): --- 12:25:29 p1-q35 --- --- p1-q35 WALL=78.1795s EXIT=0 --- model reply: "Done! Created `test.txt` with the content ..." --- p2-q35 WALL=17.2206s EXIT=0 --- model reply: "Sorted: [1, 2, 3, 6, 7, 8] ..." --- p3-q35 WALL=25.5321s EXIT=0 --- model reply: "... I can't find an author listed on this page ..." The only delta between the failing and passing runs is the model name (qwen3.6:35b vs qwen3.5). All other variables (host, kernel, Ollama daemon, OpenClaw build, gateway state, prompts, env vars) are held constant.LogsFull session logs on the lab dev host: /home/lab/day0-automation/20260514/T5987925.log (~520 lines, both sessions) /home/lab/day0-automation/20260514/_t5987925_q35.out (qwen3.5 contrast run stdout) Representative gateway error block reproduced on every 35B attempt: EMBEDDED FALLBACK: Gateway agent failed; running embedded agent: GatewayTransportError: gateway closed (1006 abnormal closure (no close frame)): no close reason Gateway target: ws://127.0.0.1:18789 Source: local loopback Config: /home/lab/.openclaw/openclaw.json Bind: loopback [plugins][ollama] WSL2 crash-loop risk: ollama.service is enabled with Restart=always and CUDA is visible. (...std mitigation suggestions...) NOT captured: per-token streaming logs from inside Ollama for 35B (the runner stays "warming up" with no tokens produced before the 300s OpenClaw timeout fires).Suggested FixBug Details
[NVB#6178932]