Skip to content

fix(dingtalk): adapter reliability — websockets proxy, card QPS throttle, inbound queue#14333

Closed
meng93 wants to merge 1 commit into
NousResearch:mainfrom
PeterGuy326:fix/dingtalk-adapter-reliability
Closed

fix(dingtalk): adapter reliability — websockets proxy, card QPS throttle, inbound queue#14333
meng93 wants to merge 1 commit into
NousResearch:mainfrom
PeterGuy326:fix/dingtalk-adapter-reliability

Conversation

@meng93

@meng93 meng93 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Stabilise the DingTalk Stream-Mode adapter with three reliability improvements that prevent message loss and API-side rate-limit errors during sustained usage.

Motivation

  1. Websockets proxy capture – the dingtalk-stream SDK clobbers HTTPS_PROXY / HTTP_PROXY at import time, breaking corporate proxy setups.
  2. AI-Card 403 storms – DingTalk's interactive-card PUT API enforces a ~20 QPS limit; exceeding it returns 403 and drops the card update.
  3. Duplicate agent turns – when the agent is busy, a second inbound message for the same chat spawns a parallel turn, producing duplicate (or interleaved) replies.

Changes

File What
gateway/platforms/dingtalk.py _install_dingtalk_websockets_proxy() — snapshot + restore proxy env vars around SDK import
gateway/platforms/dingtalk.py _CardTokenBucket — token-bucket (20 QPS) + per-message 800 ms throttle on edit_message()
gateway/platforms/dingtalk.py _enqueue_inbound / _sweep_session_queues — promise-chain queue per chat; busy-ack with random phrase
gateway/config.py Hydrate *_HOME_CHANNEL yaml keys → os.environ on gateway boot (survives restart)
hermes_cli/dingtalk_auth.py Fix REGISTRATION_SOURCE default (openClawDING_DWS_CLAW)
gateway/run.py Platform display-name overrides for /sethome prompt (DingTalk, WeCom, etc.)
scripts/gateway_guard.sh Auto-restart supervisor with caffeinate support on macOS
.gitignore Add .claude

Test Plan

  • Existing tests/gateway/test_dingtalk.py suite passes.
  • Manual testing with DingTalk Stream-Mode behind corporate proxy.
  • Sustained card-edit stress test confirms throttle holds at ~18 QPS.

Risk Assessment

Low. Changes are additive (new classes / functions) and the queue serialisation is opt-in per chat. The proxy capture only activates when env vars are pre-set.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Card throttle and reliability fixes overlap with #12769. Base of stacked series: #14333#14334#14335#14336.

…tle, inbound queue serialization

- Capture websockets proxy env vars before dingtalk-stream SDK can
  clobber them; restore after import (_install_dingtalk_websockets_proxy).
- Add _CardTokenBucket (token-bucket, 20 QPS) and per-message 800 ms
  edit_message() throttle to stay under DingTalk AI-Card rate limits.
  403 responses trigger an automatic 2 s exponential backoff.
- Inbound message queue (_enqueue_inbound / _sweep_session_queues)
  serialises same-chat messages so long-running agent turns are not
  duplicated; a random 'busy' acknowledgement is sent for queued msgs.
- Hydrate *_HOME_CHANNEL yaml keys into os.environ on gateway boot so
  /sethome survives process restarts (gateway/config.py).
- Fix REGISTRATION_SOURCE default (openClaw → DING_DWS_CLAW) in
  hermes_cli/dingtalk_auth.py.
- Platform display-name overrides for /sethome prompt (DingTalk, WeCom,
  etc.) in gateway/run.py.
- Add scripts/gateway_guard.sh — auto-restart supervisor for the
  gateway process with caffeinate support on macOS.
- Add .claude to .gitignore.
@jackjin1997

Copy link
Copy Markdown
Contributor

The CI failures on this stack all look like main-breakage at submit time, not anything in the DingTalk changes:

  • tests/run_agent/test_413_compression.py (9 fails) — overhauled in 897f953 (5-29)
  • tests/tools/test_browser_cdp_tool.py::test_registered_in_browser_toolset — touched in 18f3fc8 the day after this PR was opened (browser-cdp rename)
  • tests/tools/test_modal_sandbox_fixes.py (2 fails) — fixed in febc4cf (5-27)
  • tests/agent/test_minimax_provider.py::test_switch_to_minimax_does_not_resolve_anthropic_token — fixed in fa8e2f9 (6-2)

None of these touch DingTalk code. A rebase onto current main should turn the test job green — same situation I hit on #17141 back in April. That might also help move the review queue @alt-glitch flagged.

@jackjin1997

Copy link
Copy Markdown
Contributor

Followup: I rebased your single commit onto current main and pushed it to my fork so you can pick it up without redoing the conflict resolution yourself.

Branch: https://github.com/jackjin1997/hermes-agent/tree/rebase/meng93-14333-onto-main (commit ca355cb2c — your author info preserved)

Two conflicts I resolved (no behavior change):

  • .gitignore — folded your .claude entry into the existing editor-tooling block (.codex/, .cursor/, .gemini/, .zed/) added on main.
  • gateway/run.py — kept your _display_overrides dict but routed through main's newer _home_target_env_var(platform_name) helper and _deliver_platform_notice(source, notice) wrapper (added on main after this PR opened, plus the Slack /hermes sethome dispatch quirk). Same display-name behavior you intended.

If you want it:

git fetch https://github.com/jackjin1997/hermes-agent.git rebase/meng93-14333-onto-main
git reset --hard FETCH_HEAD
git push origin fix/dingtalk-adapter-reliability --force-with-lease

That should turn the CI job green (the test failures were all main-breakage from 4-23, already fixed on main as noted above). Happy to do the same for #14334-#14336 if this one lands cleanly.

@meng93

meng93 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #40929 (rebased onto current main, CI failures resolved).

@meng93 meng93 closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/dingtalk DingTalk adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants