Skip to content

[macOS][Inference] deepseek-v4-pro responses take 2–4 minutes on average through Discord and WeChat messaging channels #4063

@mercl-lau

Description

@mercl-lau

Description

When using deepseek-ai/deepseek-v4-pro as the inference provider via NVIDIA Endpoints, responses through Discord and WeChat messaging channels consistently take 2–4 minutes to arrive. This makes the messaging channel experience unusable for interactive conversation. The latency appears to be end-to-end (from user message sent to bot reply received in the channel).

Environment

Device:        MacBook Pro (Apple Silicon)
OS:            macOS 26.0.1 (Darwin, arm64)
Architecture:  arm64
Node.js:       v22.22.1
npm:           10.9.4
Docker:        29.2.1
OpenShell CLI: 0.0.39
NemoClaw:      v0.0.49
OpenClaw:      2026.4.24 (cbcfdf6)
Model:         deepseek-ai/deepseek-v4-pro
Provider:      NVIDIA Endpoints

Steps to Reproduce

  1. nemoclaw onboard with NVIDIA Endpoints provider, model deepseek-ai/deepseek-v4-pro
  2. nemoclaw <name> channels add discord (or wechat) — configure channel credentials
  3. nemoclaw <name> policy-add discord (or wechat)
  4. Send a simple message (e.g. "hello" or "what is 1+1") via Discord DM or WeChat to the bot
  5. Wait for response

Expected Result

Bot responds within a reasonable time (under 30 seconds for simple queries).

Actual Result

Bot response takes 2–4 minutes on average for both Discord and WeChat channels. Observed consistently across multiple messages and both channel types, not a one-off spike. Simple prompts like "hello" exhibit the same latency.

Logs

Not captured — latency observed from user-facing Discord and WeChat clients.


NVB#6205544

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: inferenceInference routing, serving, model selection, or outputsarea: performanceLatency, throughput, resource use, benchmarks, or scalingplatform: macosAffects macOS, including Apple Silicon
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions