Description
Description:
On Jetson Orin (platform), OpenClaw TUI fails with repeated HTTP 503 inference service unavailable and LLM request timed out when using NVIDIA cloud inference (nemotron-3-super-120b-a12b via integrate.api.nvidia.com).
The gateway proxy routes the request correctly (NET:OPEN ALLOWED) but the inference call fails with NET:FAIL on inference.local:443 approximately every 4 seconds, suggesting the request times out before the remote LLM can respond.
[Environment]
Device: NVIDIA IGX Orin Development Kit
OS: Ubuntu 22.04.5 LTS (Jetson L4T R36.4.6)
Architecture: aarch64
Node.js: v22.22.2
npm: 10.9.7
Docker: Docker version 29.1.2
OpenShell CLI: 0.0.26
NemoClaw: v0.0.16
OpenClaw: TUI shows openclaw-tui (version not retrievable due to 503 error)
[Steps to Reproduce]
- On Jetson Orin, install NemoClaw and complete onboarding with NVIDIA Endpoints provider
- Verify sandbox is created:
nemoclaw list
Output: test2 — model: nvidia/nemotron-3-super-120b-a12b, provider: nvidia-prod
- Connect to sandbox and launch TUI:
nemoclaw test2 connect
openclaw tui
- Send any prompt (e.g. what's your name )
- Observe repeated 503 errors and eventual timeout
[Expected Result]
The TUI should connect to integrate.api.nvidia.com via the gateway inference proxy and return a response from nemotron-3-super-120b-a12b within the configured timeout.
[Actual Result] TUI shows repeated errors:
HTTP 503: inference service unavailable
HTTP 503: inference service unavailable
HTTP 503: inference service unavailable
HTTP 503: inference service unavailable
run error: LLM request timed out.
connected | error
Gateway logs show a repeating cycle (~4s interval):
[sandbox] [OCSF] NET:FAIL [LOW] inference.local:443
[sandbox] [OCSF] NET:OPEN [INFO] ALLOWED inference.local:443
[sandbox] [INFO] [openshell_router] routing proxy inference request (streaming)
endpoint=https://integrate.api.nvidia.com/v1
path=/v1/chat/completions
protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
The request is routed and allowed by policy, but fails repeatedly before the LLM can complete inference.
Root Cause:
The remote NVIDIA cloud inference provider (nvidia-prod) does not have a timeout configured, unlike local providers (vllm, ollama) which set timeout_secs: 180.
-
Missing timeout in blueprint profile:
- File: nemoclaw-blueprint/blueprint.yaml
- The default nvidia inference profile does NOT define timeout_secs
- Local profiles (nim-local, vllm) correctly set timeout_secs: 180
-
Missing --timeout flag for remote providers in onboard:
- File: src/lib/onboard.ts (lines 3585-3614)
- When running
openshell inference set for nvidia-prod provider, no --timeout flag is passed
- For local providers (vllm-local, ollama-local), the code correctly adds:
--timeout , String(LOCAL_INFERENCE_TIMEOUT_SECS) // 180 seconds
- For remote providers (nvidia-prod, openai-api, anthropic-prod), NO timeout is added
-
Blueprint runner respects timeout when defined but default profile omits it:
- File: nemoclaw/src/blueprint/runner.ts (lines 311-313)
- Code: if (inferenceCfg.timeout_secs !== undefined) { inferenceArgs.push( --timeout , ...) }
- Since default profile has no timeout_secs, this branch is never taken
Without an explicit timeout, OpenShell's gateway uses a very short default HTTP idle timeout. For remote cloud LLM inference (which can take 10-30+ seconds for
large models), this default is insufficient, causing the gateway to return 503 before the LLM response arrives.
This is likely exacerbated on Jetson Orin due to:
- ARM64 platform may have different network stack behavior
- Possible additional latency from NVIDIA network path on Jetson devices
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB# 6081485]
Description
Description:
On Jetson Orin (platform), OpenClaw TUI fails with repeated HTTP 503 inference service unavailable and LLM request timed out when using NVIDIA cloud inference (nemotron-3-super-120b-a12b via integrate.api.nvidia.com).
The gateway proxy routes the request correctly (NET:OPEN ALLOWED) but the inference call fails with NET:FAIL on inference.local:443 approximately every 4 seconds, suggesting the request times out before the remote LLM can respond.
[Environment]
Device: NVIDIA IGX Orin Development Kit
OS: Ubuntu 22.04.5 LTS (Jetson L4T R36.4.6)
Architecture: aarch64
Node.js: v22.22.2
npm: 10.9.7
Docker: Docker version 29.1.2
OpenShell CLI: 0.0.26
NemoClaw: v0.0.16
OpenClaw: TUI shows openclaw-tui (version not retrievable due to 503 error)
[Steps to Reproduce]
nemoclaw list
Output: test2 — model: nvidia/nemotron-3-super-120b-a12b, provider: nvidia-prod
nemoclaw test2 connect
openclaw tui
[Expected Result]
The TUI should connect to integrate.api.nvidia.com via the gateway inference proxy and return a response from nemotron-3-super-120b-a12b within the configured timeout.
[Actual Result] TUI shows repeated errors:
HTTP 503: inference service unavailable HTTP 503: inference service unavailable HTTP 503: inference service unavailable HTTP 503: inference service unavailable run error: LLM request timed out. connected | errorGateway logs show a repeating cycle (~4s interval):
[sandbox] [OCSF] NET:FAIL [LOW] inference.local:443 [sandbox] [OCSF] NET:OPEN [INFO] ALLOWED inference.local:443 [sandbox] [INFO] [openshell_router] routing proxy inference request (streaming) endpoint=https://integrate.api.nvidia.com/v1 path=/v1/chat/completions protocols=openai_chat_completions,openai_completions,openai_responses,model_discoveryThe request is routed and allowed by policy, but fails repeatedly before the LLM can complete inference.
Root Cause:
The remote NVIDIA cloud inference provider (nvidia-prod) does not have a timeout configured, unlike local providers (vllm, ollama) which set timeout_secs: 180.
Missing timeout in blueprint profile:
Missing --timeout flag for remote providers in onboard:
openshell inference setfor nvidia-prod provider, no --timeout flag is passed--timeout , String(LOCAL_INFERENCE_TIMEOUT_SECS) // 180 seconds
Blueprint runner respects timeout when defined but default profile omits it:
Without an explicit timeout, OpenShell's gateway uses a very short default HTTP idle timeout. For remote cloud LLM inference (which can take 10-30+ seconds for
large models), this default is insufficient, causing the gateway to return 503 before the LLM response arrives.
This is likely exacerbated on Jetson Orin due to:
Bug Details
[NVB# 6081485]