You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
compatible-endpoint provider does not honour NEMOCLAW_LOCAL_INFERENCE_TIMEOUT (vllm-local and ollama-local do); 60s default leaks through to reasoning-model streams #2403
NemoClaw supports three local-inference provider paths: ollama-local, vllm-local, and compatible-endpoint. compatible-endpoint is the path an operator picks when inference is served by an OpenAI-compatible endpoint running outside the sandbox (for example a user-owned Ollama, LM Studio, or vLLM server reachable on the LAN).
Observed behaviour: setting NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600 before nemoclaw onboard propagates the value into the gateway config when the chosen provider is ollama-local or vllm-local, but not when the chosen provider is compatible-endpoint. openshell inference get after onboard reports Timeout: 60s (default) for compatible-endpoint regardless of the exported env var.
Impact: reasoning / thinking models commonly pause longer than 60 seconds before first token (Qwen 3.6-35B during its reasoning phase, DeepSeek-R1, and similar). With the 60-second timeout in effect, those streams are cut mid-output. The client-side symptom matches the pattern reported in openclaw/openclaw#64432 and the TUI-hang pattern in NemoClaw #2099.
Reproduction
Point NemoClaw at an external OpenAI-compatible endpoint (provider compatible-endpoint, baseUrl: http://<host>:11434/v1).
Export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600 in the shell that will run onboard.
Run nemoclaw onboard.
After the sandbox is up, run openshell inference get.
For contrast, repeating the same sequence with provider: ollama-local (Ollama running inside the sandbox container) produces Timeout: 600s as expected.
Observed evidence
The finding is observational at the runtime layer. What I can attest to:
openshell inference get output after NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600 nemoclaw onboard with provider compatible-endpoint: timeout shows 60s (default).
Same environment variable, same sequence, provider ollama-local: timeout shows 600s.
The difference is consistent across repeated onboards on v0.0.23 and was also present on v0.0.20 before upgrade.
Live repro, 2026-04-23
Host: Intel NUC (i7-10710U, 64 GB) running DietPi on Debian 13 (Trixie). External Ollama served from a separate LAN host (AMD Ryzen AI 9 HX 370, Radeon 890M iGPU via ROCm, Qwen 3.6-35B quantised Q4_K_M). Provider: compatible-endpoint.
Pre-onboard:
export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600
nemoclaw onboard # choose compatible-endpoint, base URL above, model qwen3.6:35b
However the override is not durable across sessions: the value reverts to 60s (default) on a subsequent nemoclaw <sandbox> connect (the connect subcommand re-applies blueprint defaults). I will report that reversion behaviour in a separate filing; for this issue the relevant consequence is that the post-onboard manual override is single-session only.
The other in-sandbox path, openclaw config set provider.compatible-endpoint.timeoutSeconds 600, is currently blocked by the validator behaviour reported in NVIDIA/NemoClaw#2400: the validator rejects the path on a sandbox where the key has not yet been written.
Net effect for compatible-endpoint + reasoning-model operators on v0.0.23:
openshell inference set --timeout 600 after every connect, or
Use ollama-local instead (which changes the operational topology: Ollama then runs inside the sandbox rather than as a pre-existing service), or
Plumb NEMOCLAW_LOCAL_INFERENCE_TIMEOUT into the compatible-endpoint provider so the env var's effect matches ollama-local and vllm-local.
Document the three providers' env-var contracts side by side in the NemoClaw blueprint reference so future providers are added with consistent env handling.
Operators who configure NemoClaw with a pre-existing external inference service (LAN Ollama, internal vLLM, LM Studio) use the compatible-endpoint provider to reach it. That path is also the one where the model on the far end is often larger or more reasoning-heavy than the sandbox-embedded runtime would run locally. Reasoning models in that class can pause longer than 60 seconds before first token. With the 60-second timeout in effect, those streams are cut before the first content chunk reaches the client.
The fix is local (one provider's config-construction path); the cost of leaving it as-is is that compatible-endpoint silently applies a 60-second cap that the NEMOCLAW_LOCAL_INFERENCE_TIMEOUT documentation does not mention as provider-conditional.
Cross-reference to adjacent issues
openclaw/openclaw#64432 (LLM idle timeout kills Ollama reasoning streams): the upstream symptom. If #64432 closes at the OpenClaw layer (for example by resetting the idle timer on thinking chunks), this timeout gap on compatible-endpoint still applies to other scenarios where a 60-second ceiling is too low.
Summary
NemoClaw supports three local-inference provider paths:
ollama-local,vllm-local, andcompatible-endpoint.compatible-endpointis the path an operator picks when inference is served by an OpenAI-compatible endpoint running outside the sandbox (for example a user-owned Ollama, LM Studio, or vLLM server reachable on the LAN).Observed behaviour: setting
NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600beforenemoclaw onboardpropagates the value into the gateway config when the chosen provider isollama-localorvllm-local, but not when the chosen provider iscompatible-endpoint.openshell inference getafter onboard reportsTimeout: 60s (default)forcompatible-endpointregardless of the exported env var.Impact: reasoning / thinking models commonly pause longer than 60 seconds before first token (Qwen 3.6-35B during its reasoning phase, DeepSeek-R1, and similar). With the 60-second timeout in effect, those streams are cut mid-output. The client-side symptom matches the pattern reported in
openclaw/openclaw#64432and the TUI-hang pattern in NemoClaw #2099.Reproduction
compatible-endpoint,baseUrl: http://<host>:11434/v1).NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600in the shell that will run onboard.nemoclaw onboard.openshell inference get.Expected:
Timeout: 600s.Actual:
Timeout: 60s (default).For contrast, repeating the same sequence with
provider: ollama-local(Ollama running inside the sandbox container) producesTimeout: 600sas expected.Observed evidence
The finding is observational at the runtime layer. What I can attest to:
openshell inference getoutput afterNEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600 nemoclaw onboardwith providercompatible-endpoint: timeout shows60s (default).ollama-local: timeout shows600s.Live repro, 2026-04-23
Host: Intel NUC (i7-10710U, 64 GB) running DietPi on Debian 13 (Trixie). External Ollama served from a separate LAN host (AMD Ryzen AI 9 HX 370, Radeon 890M iGPU via ROCm, Qwen 3.6-35B quantised Q4_K_M). Provider:
compatible-endpoint.Pre-onboard:
Post-onboard:
Manual overrides
openshell inference set --timeout 600after onboard does writeTimeout: 600sinto the gateway config successfully:However the override is not durable across sessions: the value reverts to
60s (default)on a subsequentnemoclaw <sandbox> connect(the connect subcommand re-applies blueprint defaults). I will report that reversion behaviour in a separate filing; for this issue the relevant consequence is that the post-onboard manual override is single-session only.The other in-sandbox path,
openclaw config set provider.compatible-endpoint.timeoutSeconds 600, is currently blocked by the validator behaviour reported in NVIDIA/NemoClaw#2400: the validator rejects the path on a sandbox where the key has not yet been written.Net effect for
compatible-endpoint+ reasoning-model operators on v0.0.23:openshell inference set --timeout 600after everyconnect, orollama-localinstead (which changes the operational topology: Ollama then runs inside the sandbox rather than as a pre-existing service), or/sandbox/.openclaw/openclaw.jsondirectly viakubectl exec(works, unsupported, bypasses both the validator in openclaw config set isRecognizedConfigPath rejects unset keys, blocking the documented per-agent override path (and the workaround for openclaw/openclaw#64432) #2400 and the gateway reconcile), orWhat would fix this
NEMOCLAW_LOCAL_INFERENCE_TIMEOUTinto thecompatible-endpointprovider so the env var's effect matchesollama-localandvllm-local.Environment
qwen3.6:35bquantised Q4_K_Mb3d832b596...)compatible-endpointSupporting artifacts available on request
nemoclaw onboardtranscript withNEMOCLAW_LOCAL_INFERENCE_TIMEOUT=600exportedopenshell inference getoutput after onboard (60s (default)) and after manualopenshell inference set --timeout 600(600s)ollama-localonboard transcript showing600scorrectly propagatedWhy this matters
Operators who configure NemoClaw with a pre-existing external inference service (LAN Ollama, internal vLLM, LM Studio) use the
compatible-endpointprovider to reach it. That path is also the one where the model on the far end is often larger or more reasoning-heavy than the sandbox-embedded runtime would run locally. Reasoning models in that class can pause longer than 60 seconds before first token. With the 60-second timeout in effect, those streams are cut before the first content chunk reaches the client.The fix is local (one provider's config-construction path); the cost of leaving it as-is is that
compatible-endpointsilently applies a 60-second cap that theNEMOCLAW_LOCAL_INFERENCE_TIMEOUTdocumentation does not mention as provider-conditional.Cross-reference to adjacent issues
openclaw/openclaw#64432(LLM idle timeout kills Ollama reasoning streams): the upstream symptom. If #64432 closes at the OpenClaw layer (for example by resetting the idle timer on thinking chunks), this timeout gap oncompatible-endpointstill applies to other scenarios where a 60-second ceiling is too low.openclaw config setvalidator rejects unset keys): blocks the in-sandbox fix path for this timeout.