Description
When using deepseek-ai/deepseek-v4-pro as the inference provider via NVIDIA Endpoints, responses through Discord and WeChat messaging channels consistently take 2–4 minutes to arrive. This makes the messaging channel experience unusable for interactive conversation. The latency appears to be end-to-end (from user message sent to bot reply received in the channel).
Environment
Device: MacBook Pro (Apple Silicon)
OS: macOS 26.0.1 (Darwin, arm64)
Architecture: arm64
Node.js: v22.22.1
npm: 10.9.4
Docker: 29.2.1
OpenShell CLI: 0.0.39
NemoClaw: v0.0.49
OpenClaw: 2026.4.24 (cbcfdf6)
Model: deepseek-ai/deepseek-v4-pro
Provider: NVIDIA Endpoints
Steps to Reproduce
nemoclaw onboard with NVIDIA Endpoints provider, model deepseek-ai/deepseek-v4-pro
nemoclaw <name> channels add discord (or wechat) — configure channel credentials
nemoclaw <name> policy-add discord (or wechat)
- Send a simple message (e.g. "hello" or "what is 1+1") via Discord DM or WeChat to the bot
- Wait for response
Expected Result
Bot responds within a reasonable time (under 30 seconds for simple queries).
Actual Result
Bot response takes 2–4 minutes on average for both Discord and WeChat channels. Observed consistently across multiple messages and both channel types, not a one-off spike. Simple prompts like "hello" exhibit the same latency.
Logs
Not captured — latency observed from user-facing Discord and WeChat clients.
NVB#6205544
Description
When using deepseek-ai/deepseek-v4-pro as the inference provider via NVIDIA Endpoints, responses through Discord and WeChat messaging channels consistently take 2–4 minutes to arrive. This makes the messaging channel experience unusable for interactive conversation. The latency appears to be end-to-end (from user message sent to bot reply received in the channel).
Environment
Steps to Reproduce
nemoclaw onboardwith NVIDIA Endpoints provider, modeldeepseek-ai/deepseek-v4-pronemoclaw <name> channels add discord(orwechat) — configure channel credentialsnemoclaw <name> policy-add discord(orwechat)Expected Result
Bot responds within a reasonable time (under 30 seconds for simple queries).
Actual Result
Bot response takes 2–4 minutes on average for both Discord and WeChat channels. Observed consistently across multiple messages and both channel types, not a one-off spike. Simple prompts like "hello" exhibit the same latency.
Logs
Not captured — latency observed from user-facing Discord and WeChat clients.
NVB#6205544