[WSL2][Inference] qwen3.6:35b yields no token within 300s under OpenClaw agent loop on ARM64/aarch64 (64 GB iGPU)

## Description

Description
<pre>On ARM64/aarch64 (Snapdragon X + NVIDIA JMJWOA-Generic-GPU, **64 GB iGPU** — hardware upgraded from earlier 8 GB SKU), the guide-recommended Ollama model qwen3.6:35b is still functionally unusable under the OpenClaw agent loop: every sanity prompt returns no token within OpenClaw's 300s wait window, and the openclaw-gateway then crashes with 1006 abnormal closure, dropping into the EMBEDDED FALLBACK path. Three different prompts (file create / quicksort exec / URL fetch) all hit the same silent-timeout pattern, EXIT=124 from `timeout 300`, with `~/openclaw-sanity/` left completely empty (no tool actually fired).

The 8 GB-SKU variant of this problem was tracked as NVBug 6129886 (filed by me) and is CLOSED as Bug-Fixed, since the public DGX Spark guide was the immediate concern and Windows ARM reference hardware was being upgraded to a larger iGPU. After the hardware upgrade to 64 GB the symptom persists, so this is a separate, post-hardware-upgrade regression that the closed bug does not cover. Filing fresh per QA's prior guidance — see comments below for cross-references.

Contrast (same box, same OpenClaw build, same prompts, only model differs):

 | Model | p1 file-create | p2 quicksort | p3 URL-fetch | Workspace |
 |--------------------|----------------|--------------|--------------|-----------|
 | qwen3.6:35b | EXIT=124 @300s | EXIT=124 @300s | EXIT=124 @300s | empty |
 | qwen3.6:35b (retry)| EXIT=124 @410s | (not reached) | (not reached) | empty |
 | qwen3.5 (default ~8B) | EXIT=0 @78s | EXIT=0 @17s | EXIT=0 @26s | empty * |

 * qwen3.5 returns a coherent natural-language reply each time but the
 embedded-agent tool plumbing does not persist filesystem side effects
 in this run — that is a separate openclaw-gateway issue (1006 closure
 + EMBEDDED FALLBACK) and is tracked alongside NVBug 6168039.

The 35B vs 8B-class delta is the operative finding: with the same OpenClaw build, same gateway state, same prompt strings, same Ollama daemon, only the model name changing, the 8B-class model returns a model-side answer in tens of seconds while 35B returns zero bytes within 5 minutes. This isolates the failure to a 35B-on-this-iGPU compute/bandwidth ceiling under the agentic request shape (long system prompt + tools array + streaming).

Cross-reference: NVBug 6174615 (Arhan Banerjee, Open) tracks the same model (Qwen 3.6-35B Q4_K_M) showing significantly lower performance on Windows ARM reference host versus M5 Pro on the SoC perf team's Llama.cpp + AIPerf benchmark suite. That is a perf-comparison bug; mine is a functional-blocker bug under OpenClaw's actual agent-loop request shape. They corroborate each other (same model, same platform, both report the model is too slow to be useful) and should likely be linked.

Cross-reference: NVBug 6162831 (text-JSON pseudo-tool-call under agentic clients) is a DIFFERENT degradation mode on the same Windows ARM platform — there the model emits output that the agentic client cannot route to a tool; here the model emits no output at all within the agent-loop timeout. Both reduce to "agentic workload on local Ollama on Windows ARM reference host is currently not viable", but the failure points are distinct and worth tracking separately.

</pre>Environment
<pre>Device: ARM64 reference (DESKTOP-FBOH36I), accessed via two-hop SSH
OS: Windows 11 + WSL2 Ubuntu 24.04.4 LTS (kernel 6.6.87.2-microsoft-standard-WSL2)
Architecture: aarch64
Node.js: v22.22.3
npm: 10.9.8
Docker: N/A (standalone OpenClaw path, not NemoClaw sandbox)
OpenShell CLI: N/A
NemoClaw: N/A
OpenClaw: 2026.5.7 (eeef486)
Ollama: 0.23.4
GPU: NVIDIA JMJWOA-Generic-GPU, 65471 MiB VRAM (~64 GB) — POST hardware upgrade
GPU driver: 592.60, CUDA 13.1
Host RAM: 63.9 GiB
</pre>Steps to Reproduce
<pre>1. On ARM64/aarch64 WSL2 Ubuntu 24.04 with 64 GB iGPU, install Ollama 0.23.4 and OpenClaw 2026.5.7. Pre-pull qwen3.6:35b.
2. Confirm hardware: `nvidia-smi` shows JMJWOA-Generic-GPU with 65471 MiB. Confirm model: `ollama list | grep qwen3.6:35b` shows the Q4_K_M weights are present.
3. Set the OpenClaw env: `export OLLAMA_API_KEY=dummy`
4. Create a clean workspace: `mkdir -p ~/openclaw-sanity-35b && cd ~/openclaw-sanity-35b`
5. Run prompt 1 (file create) non-interactively:
 timeout 300 /home/lab/.hermes/node/bin/openclaw agent --agent main \
 --model ollama/qwen3.6:35b \
 --message "Create a text file named test.txt in the current directory containing the text 'hello from OpenClaw WSL2'"
6. Observe: process exits with code 124 at 300s wall clock; no tokens streamed; workspace empty; gateway log shows 1006 abnormal closure + EMBEDDED FALLBACK message.
7. Repeat with prompts 2 (quicksort exec) and 3 (URL fetch + author identification) — same EXIT=124 / empty workspace / no model output.
8. Swap model to `ollama/qwen3.5` (default ~8B variant), same script, same gateway — model returns a coherent reply in 17-78s for each prompt.
</pre>Expected Result
<pre>qwen3.6:35b on a 64 GB iGPU should return tokens within OpenClaw's standard wait window (NEMOCLAW_LOCAL_INFERENCE_TIMEOUT default 180s; OpenClaw's 300s agent-step timeout). For the canonical sanity prompts in the DGX Spark + Windows ARM reference host guide, `test.txt` should land on disk with the expected content, quicksort should run and print `[1,2,3,6,7,8]`, and the URL-fetch prompt should return the article's author. This is the behavior observed for 8B-class models (qwen3.5) on the same box.
</pre>Actual Result
<pre>Six consecutive runs over two sessions on the upgraded 64 GB box:

Session A (qwen3.6:35b direct):
 --- 11:47:35 p1-create-test-txt ---
 --- p1-create-test-txt WALL=717.185s EXIT=124 ---
 EMBEDDED FALLBACK: Gateway agent failed; running embedded agent:
 GatewayTransportError: gateway closed (1006 abnormal closure (no close frame))
 workspace: empty

Session B (qwen3.6:35b retry after gateway restart):
 --- 12:15:43 p1-q8b ---
 --- p1-q8b WALL=410.609s EXIT=124 ---
 EMBEDDED FALLBACK: ... 1006 abnormal closure ...
 workspace: empty

Contrast (qwen3.5 default ~8B, same box, same gateway, immediately after):
 --- 12:25:29 p1-q35 ---
 --- p1-q35 WALL=78.1795s EXIT=0 ---
 model reply: "Done! Created `test.txt` with the content ..."
 --- p2-q35 WALL=17.2206s EXIT=0 ---
 model reply: "Sorted: [1, 2, 3, 6, 7, 8] ..."
 --- p3-q35 WALL=25.5321s EXIT=0 ---
 model reply: "... I can't find an author listed on this page ..."

The only delta between the failing and passing runs is the model name (qwen3.6:35b vs qwen3.5). All other variables (host, kernel, Ollama daemon, OpenClaw build, gateway state, prompts, env vars) are held constant.
</pre>Logs
<pre>Full session logs on the lab dev host:
 /home/lab/day0-automation/20260514/T5987925.log (~520 lines, both sessions)
 /home/lab/day0-automation/20260514/_t5987925_q35.out (qwen3.5 contrast run stdout)

Representative gateway error block reproduced on every 35B attempt:
 EMBEDDED FALLBACK: Gateway agent failed; running embedded agent:
 GatewayTransportError: gateway closed (1006 abnormal closure (no close frame)): no close reason
 Gateway target: ws://127.0.0.1:18789
 Source: local loopback
 Config: /home/lab/.openclaw/openclaw.json
 Bind: loopback
 [plugins][ollama] WSL2 crash-loop risk: ollama.service is enabled with
 Restart=always and CUDA is visible. (...std mitigation suggestions...)

NOT captured: per-token streaming logs from inside Ollama for 35B (the runner stays "warming up" with no tokens produced before the 300s OpenClaw timeout fires).
</pre>Suggested Fix
<pre>Short term:
 1. Bump the recommended default Ollama model on Windows ARM reference host 64 GB iGPU from qwen3.6:35b to an 8B-class model (qwen3.5 default, qwen2.5:7b, llama3.1:8b). The DGX Spark guide's model recommendation continues to not carry to Windows ARM reference host even after the iGPU upgrade.
 2. In OpenClaw onboarding / preflight, detect Windows ARM reference host (JMJWOA-Generic-GPU) and warn / refuse 35B+ class models with an actionable error pointing at the 8B-class recommendation, similar to how the existing preflight detects insufficient RAM.

Longer term:
 3. Coordinate with the SoC perf team owning NVBug 6174615 — the underlying issue is that qwen3.6:35b Q4_K_M is significantly slower on Windows ARM reference host than on competitor SoCs, which makes it incompatible with OpenClaw's 300s agent-step timeout. Either the model needs SoC-specific tuning (offload split, attention impl) OR the agent-step timeout for known-slow models needs to be configurable per-model.
</pre>

## Bug Details

| Field | Value |
|-------|-------|
| Priority | Unprioritized |
| Action | Dev - Open - To fix |
| Disposition | Open issue |
| Module | Machine Learning - NemoClaw |
| Engineer | Aaron Erickson |
| Requester | Eric Wang (SW-GPU - SH) |
| Keyword | ARM64_Day0, ARM64_SW_Tracking, NemoClaw, SWQA_N1, WinAI-Windows ARM reference host |
| Days Open | 4 |

---
[NVB#6178932]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WSL2][Inference] qwen3.6:35b yields no token within 300s under OpenClaw agent loop on ARM64/aarch64 (64 GB iGPU) #3707

Description

Bug Details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
Priority	Unprioritized
Action	Dev - Open - To fix
Disposition	Open issue
Module	Machine Learning - NemoClaw
Engineer	Aaron Erickson
Requester	Eric Wang (SW-GPU - SH)
Keyword	ARM64_Day0, ARM64_SW_Tracking, NemoClaw, SWQA_N1, WinAI-Windows ARM reference host
Days Open	4

[WSL2][Inference] qwen3.6:35b yields no token within 300s under OpenClaw agent loop on ARM64/aarch64 (64 GB iGPU) #3707

Description

Description

Bug Details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions