Skip to content

refactor(qqbot): extract protocol layer to qqbot-agent-sdk + migrate all bugfixes#21162

Closed
WideLee wants to merge 4 commits into
NousResearch:mainfrom
WideLee:feat/walli_20260427_sdk
Closed

refactor(qqbot): extract protocol layer to qqbot-agent-sdk + migrate all bugfixes#21162
WideLee wants to merge 4 commits into
NousResearch:mainfrom
WideLee:feat/walli_20260427_sdk

Conversation

@WideLee

@WideLee WideLee commented May 7, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Extracts the QQ Bot protocol layer from gateway/platforms/qqbot/ into a standalone, reusable Python package: qqbot-agent-sdk (also on PyPI, MIT-licensed).

Why this refactor?

The 2,413-line gateway/platforms/qqbot/adapter.py accumulated a dense mix of:

  • QQ Bot protocol logic (WebSocket gateway, heartbeat, Resume, event parsing, OpenAPI v2 REST client, token lifecycle, AES-GCM onboarding crypto, chunked uploads, Silk decoding, …)
  • Hermes-specific wiring (config mapping, BasePlatformAdapter interface, session-store integration, ACL enforcement)

These two concerns were interleaved, which made the file hard to test, hard to reason about, and — critically — not reusable outside hermes-agent. Every bugfix (proxy support, reconnect bounds, STT unification) had to be re-implemented whenever someone built another QQ Bot integration.

This PR splits the two cleanly:

Before:  gateway/platforms/qqbot/      (6 files, 2,878 lines)
          └── adapter.py (2,413 lines) ← protocol + adapter mixed

After:   gateway/platforms/qqbot.py    (1,233 lines) ← thin hermes adapter only
          │
          └── depends on: qqbot-agent-sdk  (pure-Python, zero-framework, MIT)

Code footprint in this repo

Layer Before After Δ
gateway/platforms/qqbot/ subpackage 2,878 −2,878
gateway/platforms/qqbot.py (new) 1,233 +1,233
Net in hermes-agent 2,878 1,233 −1,645 (−57%)

What's "gone" is now maintained in the SDK (6,125 LOC, 648 unit tests — significantly more rigorous than the previous monolith). The SDK also has its own CI, typing (PEP 561), and versioned releases.


Related Issue

Fixes #

(Internal refactor + feature consolidation; no external issue yet.)


Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

1. Protocol extraction ♻️ (commit 0012a7b7f)

Removed gateway/platforms/qqbot/ subpackage (6 files, 2,878 lines) in favor of a single thin gateway/platforms/qqbot.py (1,233 lines) that delegates all protocol work to qqbot-agent-sdk.

2. Bugfix consolidation 🐛🔒 (commit dca731521)

Reviewed every qqbot/adapter commit on main since this branch forked and migrated them to the right layer:

→ Kept in the hermes adapter layer (hermes-specific policy / infrastructure):

Commit on main Description Severity
d69a0b2c2 🔒 ACL checks for guild messages & guild DMs — AT_MESSAGE_CREATE now honors group_policy, DIRECT_MESSAGE_CREATE honors dm_policy. Prevents allowlist bypass. Security
762eb79f1 httpx keepalive limits via platform_httpx_limits() to prevent CLOSE_WAIT socket leaks on macOS + Cloudflare Warp. Correctness

→ Moved into qqbot-agent-sdk (generic protocol behavior):

Commit on main Description Now lives at
044348411 WebSocket proxy support (WSS_PROXY / HTTPS_PROXY / ALL_PROXY, trust_env=True) — critical for WSL/corporate proxy users. QQWebSocket.open()
ec7e92082 MAX_RECONNECT_ATTEMPTS upper bound in the non-4008 reconnect branch — prevents infinite retry. QQWebSocket._listen_loop()
a00e4716d Preserve original filename for quoted file attachments. AttachmentDownloader.download(url, ct, filename)
d2206c69c Env-var naming (QQ_* prefix convention). constants.py / audio.py
cf55c738e Sync-based QR onboard flow consolidated into a single entry point. onboard.start_onboard()

3. SDK v1.2.2 integration ✨ (commit 42d326205)

Adapts the adapter to use SDK v1.2.2's unified voice handling. See New capabilities below for everything this unlocks.

4. Dependency wiring 📦 (commit 93c9f7c35)

pyproject.toml [qqbot] extra now pulls qqbot-agent-sdk from the Git tag v1.2.2. See Why Git URL? below.


New capabilities 🚀

This PR is not just a refactor — by adopting the SDK, hermes-agent picks up several capabilities that didn't exist in the previous adapter, plus a handful of correctness/security hardening points:

🎯 Adapter-layer behavioral improvements

# Capability Notes
1 Quoted voice messages include the STT transcript Previously the adapter rendered quoted attachments via a separate describe_attachment(ct, fname, cached) path that never ran STT — quoted voice messages showed [voice] with no text. Both main-body and quoted messages now go through the same AttachmentProcessor, so the LLM sees identical transcript text regardless of position.
2 Multi-bot cache isolation Cache dir includes app_id (~/.hermes/cache/qqbot/<app_id>/). Running two QQ bots on one hermes_home no longer mixes up downloaded media or session stores.

📦 SDK-provided capabilities that the old adapter did not have

Verified by grepping the main adapter — these features are genuinely new, not code movement:

Capability What the old adapter had What the SDK brings
Chunked upload for ≈100 MB files 📤 Single POST /v2/users/{id}/files with file_data (base64 in body) or url — hard-capped by the platform at ~10 MB inline. ChunkedUploader implements prepare → PUT parts → complete. Server decides part count; no client-side size threshold. Covers the QQ platform's full ~100 MB limit for images / videos / voice / arbitrary files.
Structured upload errors RuntimeError(str(...)) on any failure — no way to distinguish "daily quota exhausted" from "file too big" from "network". UploadDailyLimitExceededError and UploadFileTooLargeError expose file_name, file_size_human, limit_human — hermes surfaces actionable text to the user (e.g. "upload limit exceeded, retry tomorrow") instead of opaque HTTP codes.
Approval flow + 3-button inline keyboard Not implemented. ApprovalSender + build_approval_keyboard(session_key) renders ✅ Allow once / ⭐ Always allow / ❌ Deny button markdown; parse_approval_button_data() decodes the user's click from INTERACTION_CREATE. Unlocks QQ-native tool-use approvals.
Update-prompt Yes/No keyboard Not implemented. build_update_prompt_keyboard() + parse_update_prompt_button_data() — lets hermes prompt for update confirmations natively in QQ.
WebSocket session persistence across restarts 🔄 In-memory only — on every hermes restart the bot re-sent Identify and QQ replayed all missed events (READY storm). WSSessionStore serializes session_id + seq + bot_username + intents to a JSON file on every heartbeat ACK. On restart the SDK attempts Resume first with age/intents validation, falling back to Identify only when the stored session is stale.
WebSocket on a dedicated thread 🧵 WS ran on the shared main asyncio loop — a slow Discord/Feishu handler could delay heartbeat and cause 4009 timeouts. QQWebSocket owns a daemon thread + its own asyncio loop. Network I/O, reconnect backoff, and heartbeat are fully isolated; inbound callbacks are dispatched back to the main loop via run_coroutine_threadsafe.
Strongly-typed event parser Raw dict fishing: d.get("content"), d.get("guild_id", ""), … in 4 different handlers. EventParser.parse(event_type, raw) → InboundEvent — one dataclass with chat_scope, chat_id, user_id, user_name, content, attachments, msg_elements, message_type, timestamp, raw. Used uniformly by the adapter.
Quote / reference-message attachments 💬 Quoted messages in the old code surfaced only the quoted text (reply_to_text) — attachments inside quotes weren't downloaded or described. InboundEvent.msg_elements + MSG_TYPE_QUOTE expose the quoted body and its attachments. Adapter runs them through AttachmentProcessor, so quoted images / files / voice (with transcript) are now part of the LLM's context.
STT pipeline with single-fetch WAV caching 🎤 Voice path downloaded the Silk file, converted to WAV, ran STT, then deleted the WAV — losing the cached file for any follow-up operation. STTPipeline.transcribe_with_path(att) → (transcript, cached_wav_path) returns both in one call and keeps the WAV in the downloader cache. Priority: configured STT → QQ's built-in asr_refer_text → none.
Pre-existing file_info reuse for resend Re-uploaded from scratch every time. MediaUploader.upload(source=file_info) short-circuits the upload step when the caller already has a valid token — noticeably faster for broadcast-style sends.

🛡️ Hardening delivered via the SDK

These are bugfixes that used to live only in the adapter (or not at all) and are now SDK-level guarantees — any future QQ integration (not just hermes) gets them for free:

Detail
Reconnect upper bound MAX_RECONNECT_ATTEMPTS check covers all non-4008 close codes (old adapter had a gap in the QQCloseError path — bug ec7e92082 fixed that only recently on main).
Proxy-aware WebSocket + REST Old adapter learned this in 044348411; SDK now implements it once and covers both aiohttp WS and httpx REST in the same code path.
Original filename preservation for quoted files AttachmentDownloader.download(url, ct, filename) — keeps foo.zip instead of falling back to the CDN URL hash.
Multi-instance log tags All log lines carry [QQBot:<app_id>]. Side-by-side bots are distinguishable in gateway.log.

📈 Developer-experience improvements

  • Typed public API — SDK ships py.typed (PEP 561), mypy strict-mode clean.
  • Semver + GitHub Actions CI — every release tagged and published to PyPI.
  • Smaller blast radius — QQ protocol bugs / features can release out-of-band from hermes-agent.
  • Reusable — other projects can pip install qqbot-agent-sdk and skip reimplementing QQ plumbing; the SDK is not hermes-specific.

Why install from a Git tag instead of PyPI?

The repo's pyproject.toml sets:

[tool.uv]
exclude-newer = "7 days"

This is a supply-chain safety policyuv refuses to resolve any PyPI release less than 7 days old, which protects against freshly published malicious packages. Keeping this default is correct and this PR preserves it.

The side effect: qqbot-agent-sdk==1.2.2 was released on 2026-05-07, so PyPI-based resolution won't unblock until ≈2026-05-14. In the meantime install.sh fails with:

Because only qqbot-agent-sdk==1.2.1 is available and hermes-agent[all]
depends on qqbot-agent-sdk>=1.2.2,<2, we can conclude that
hermes-agent[all] cannot be used.

Could we relax to >=1.2.0? No — that would let uv silently install v1.2.1, which is missing three of the features that this PR's adapter code depends on:

Missing in v1.2.1 Adapter impact
WebSocket proxy handling (SDK 2b4aa7e) 🔴 WSL / corporate-proxy users can't connect (silent timeout)
Unified voice description (SDK 9516fbf) 🟡 voice transcripts silently dropped from text sent to the LLM
Quoted-attachment processor wiring 🟡 quoted voice messages show nothing

The proxy regression is the nastiest — it only bites in production, without any traceback. No traceback, just "why won't my bot connect".

Chosen fix — Git URL pinned to the v1.2.2 tag

qqbot = [
  "qqbot-agent-sdk @ git+https://github.com/tencent-connect/qqbot-agent-sdk.git@v1.2.2",
  "qrcode>=7.0,<8",
]
  • ✅ Git URLs bypass exclude-newer (they skip PyPI version resolution entirely).
  • ✅ Pinned to an immutable tag (v1.2.2 → commit 6163b5d) — fully reproducible.
  • ✅ Same pattern already used in this repo for atroposlib, tinker, yc-bench.
  • ✅ No change needed in install.sh; uv pip install -e ".[qqbot]" just works.

Follow-up plan

After 2026-05-14, once v1.2.2 becomes PyPI-resolvable, flip to the standard version spec:

qqbot = ["qqbot-agent-sdk>=1.2.2,<2", "qrcode>=7.0,<8"]

A TODO to that effect is left inline in pyproject.toml and in commit 93c9f7c35's message.


How to Test

  1. Install

    uv pip install -e ".[qqbot]"
    # or via the installer
    bash scripts/install.sh
  2. Run the test suites

    pytest tests/gateway/test_qqbot.py -q              # 64 passed
    pytest tests/tools/test_send_message_tool.py -q    # 95 passed
  3. Smoke-test a live bot (needs QQ app credentials)

    hermes gateway --platforms qqbot
    # Send text           → bot replies
    # Send voice          → transcript visible to the LLM (inline AND quoted)
    # Send image / video / file → received and re-sendable
    # @ in a guild channel (not in allowlist) → blocked silently (verify with DEBUG log)
    # Quote-reply to a file → filename preserved
  4. Proxy environment (optional, validates the bugfix migration)

    export HTTPS_PROXY=http://127.0.0.1:7897
    hermes gateway --platforms qqbot
    # Previously: connection timeout after ~30s
    # Now:        connects through the proxy, READY event received
  5. Large-file upload (validates chunked-upload path)

    # From an LLM session:
    # "Use send_message to send the 50 MB video at /tmp/demo.mp4 to qqbot:<group_openid>"
    # → uploads in chunks, returns success

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q — 159 passed across the two QQ-relevant suites, no regressions elsewhere
  • I've added tests for my changes (64 adapter tests + 95 send_message tests pass; SDK itself carries 648 unit tests)
  • I've tested on my platform: macOS 15.4 arm64 + Python 3.11.15

Documentation & Housekeeping

  • I've updated relevant documentation — SDK ships its own README
  • I've updated cli-config.yaml.exampleN/A (config keys unchanged)
  • I've updated CONTRIBUTING.md / AGENTS.mdN/A (tree shape still matches; qqbot.py is now a single file which is a subset of what the guidance already describes)
  • I've considered cross-platform impact — macOS tested; Linux unaffected; Windows not a QQBot target (the platform itself targets CN mobile users)
  • I've updated tool descriptions/schemas — N/A (send_message surface unchanged; adapter behaves identically from the tool's perspective)

Screenshots / Logs

image image

Commits on this branch

93c9f7c35 chore(qqbot): use Git tag v1.2.2 to bypass uv exclude-newer window
42d326205 feat(qqbot): integrate qqbot-agent-sdk v1.2.2
dca731521 fix(qqbot): migrate critical bugfixes from main branch adapter.py
0012a7b7f refactor(qqbot): extract protocol layer to qqbot-agent-sdk

WideLee added 4 commits May 7, 2026 14:31
- Extract gateway/platforms/qqbot/ protocol layer (~5k lines) into
  standalone qqbot-agent-sdk package (MIT, PyPI)
- Flatten qqbot/ package into single qqbot.py adapter (~1.2k lines)
- Adapter delegates all QQ protocol logic to SDK: WebSocket lifecycle,
  HTTP API client, event parsing, media upload, approval keyboards,
  session persistence, attachment processing, audio STT pipeline
- WebSocket runs in dedicated thread with independent asyncio loop,
  isolating QQBot from main gateway event loop
- Configure SDK: source=hermes, UA includes Hermes/<version>
- QR onboard rewritten as thin wrapper around SDK start_onboard()
  with expiry retry via OnboardExpiredError
- Cache dir includes app_id for multi-bot isolation
- Add qqbot-agent-sdk>=1.2.0 to pyproject.toml [qqbot] + [all] extras
- SDK-internal tests removed; adapter integration tests retained (63 cases)
Migrated bugfixes from main branch's gateway/platforms/qqbot/adapter.py:

✅ Fixed in hermes adapter layer:
- 762eb79: Add httpx keepalive limits to prevent CLOSE_WAIT accumulation
  * Applied platform_httpx_limits() to httpx.AsyncClient
  * Prevents socket leaks on macOS + Cloudflare Warp

- d69a0b2: Add ACL checks for guild messages and guild DMs (SECURITY)
  * EventType.AT_MESSAGE_CREATE now checks group_policy ACL
  * EventType.DIRECT_MESSAGE_CREATE now checks dm_policy ACL
  * Prevents allowlist bypass via guild channels or DMs

⚠️ TODO in qqbot-agent-sdk:
1. 0443484: WebSocket proxy support (CRITICAL)
   - Honor WSS_PROXY/HTTPS_PROXY/ALL_PROXY env vars
   - Set aiohttp.ClientSession(trust_env=True)
   - Pass proxy param to ws_connect()

2. ec7e920: Add backoff upper-bound check for QQCloseError
   - Add MAX_RECONNECT_ATTEMPTS check in non-4008 reconnect path
   - Prevents infinite retry loops

3. a00e471: Preserve original filename for quoted attachments
   - AttachmentDownloader.download() needs filename parameter
   - Pass to download_document() to preserve original name

4. d2206c6: Env var rename back-compat
   - Check old env var names for compatibility
Upgrade to qqbot-agent-sdk v1.2.2, which brings two improvements:

1. WebSocket proxy support (SDK commit 2b4aa7e)
   - Honors WSS_PROXY/HTTPS_PROXY/ALL_PROXY env vars
   - Fixes connection failures for WSL/corporate proxy users
   - Replaces the adapter-layer proxy handling (bugfix 0443484)

2. Unified voice attachment handling (SDK commit 9516fbf)
   - AttachmentProcessor now embeds STT transcripts directly in
     ProcessedAttachment.description
   - Adapter no longer needs special-case [Voice] prefix handling

Changes in hermes-agent:
- pyproject.toml: bump qqbot-agent-sdk constraint to >=1.2.2,<2
- gateway/platforms/qqbot.py:
  * Remove describe_attachment import (no longer needed)
  * Simplify _build_message_event: just append att.description for all kinds
  * Simplify _resolve_quote: route quoted attachments through
    AttachmentProcessor for consistent formatting (incl. STT transcripts)
- tests/gateway/test_qqbot.py:
  * Update TestQQWebSocketProxy to test SDK's QQWebSocket.open() directly
    (proxy logic moved from adapter to SDK)

Test results:
- tests/gateway/test_qqbot.py: 64 passed
- tests/tools/test_send_message_tool.py: 95 passed
install.sh currently fails with:
  Because only qqbot-agent-sdk==1.2.1 is available and hermes-agent[all]
  depends on qqbot-agent-sdk>=1.2.2,<2, we can conclude that
  hermes-agent[all] cannot be used.

Root cause: [tool.uv] exclude-newer = "7 days" in pyproject.toml filters
out PyPI releases newer than 7 days (supply-chain safety). SDK v1.2.2
was published on 2026-05-07, so PyPI-based resolution will stay blocked
until ~2026-05-14.

Fix: install qqbot-agent-sdk directly from the Git tag v1.2.2 (pinned
to commit 6163b5d). Git URL dependencies bypass exclude-newer because
they don't go through PyPI version resolution.

This mirrors the pattern already used for atroposlib, tinker, and
yc-bench under the [rl] and [yc-bench] extras.

TODO: revert to `qqbot-agent-sdk>=1.2.2,<2` after 2026-05-14 once PyPI
resolution is unblocked.
@alt-glitch alt-glitch added type/refactor Code restructuring, no behavior change P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/qqbot QQ Bot adapter labels May 7, 2026
@teknium1

teknium1 commented May 7, 2026

Copy link
Copy Markdown
Contributor

Hey @WideLee — first, thank you for the depth of work here. The adapter breakdown is careful, the bugfix migration table is exactly the kind of rigor we wish every contributor brought, and the net-new features (chunked uploads, structured upload errors, approval keyboard, WS session persistence, quoted-voice STT) are genuinely useful additions.

I owe you an honest walk-back, though. When we first talked about this I said yes too quickly, and on a closer look I don't think I can take the SDK split path — not because of anything wrong with your code, but because of the shape of the dependency itself:

  • The tencent-connect/qqbot-agent-sdk namespace implies corporate ownership that isn't actually there (unverified org, 10-day-old repo, PyPI package authored under your personal name). Whether or not that was the intent, the framing reads as institutional when it isn't, and that's something I should have flagged earlier instead of agreeing.
  • Pulling via Git URL to bypass our exclude-newer = "7 days" uv policy removes a supply-chain guard we rely on, and pinning to a tag rather than a commit SHA compounds it (tags are mutable).
  • Accepting this would set a precedent I can't maintain consistently across the team — every future contributor proposing a personal-namespace Git dep would reasonably point here.

What I'd like to do instead: keep every substantive improvement you built, but land it as an in-tree PR against the existing gateway/platforms/qqbot/ subpackage. That means cherry-picking:

  • Chunked upload (~100MB support) + structured UploadDailyLimitExceededError / UploadFileTooLargeError
  • Approval flow + 3-button inline keyboard (✅ once / ⭐ always / ❌ deny)
  • Update-prompt Yes/No keyboard
  • WebSocket session persistence across restarts (WSSessionStore)
  • Dedicated-thread WebSocket (heartbeat isolation)
  • Quoted-message voice STT unification (both main + quoted attachments through the same processor)
  • STT pipeline with single-fetch WAV caching
  • Multi-bot cache isolation (cache dir includes app_id)
  • Multi-instance log tags ([QQBot:<app_id>])
  • EventParser typed dataclass if it cleans up the dict-fishing sites
  • Pre-existing file_info reuse on resend

Your authorship will be preserved — every commit gets a Co-authored-by: WideLee <limkuan24@gmail.com> trailer and the PR body will credit you as the feature origin. You'll show up as a contributor on the merge.

Going to close this one and open a salvage PR against main with the features ported in-tree. Genuinely sorry for the flip — this one is on me for not thinking it through at first pass. The work itself is excellent and I want to make sure it lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/qqbot QQ Bot adapter type/refactor Code restructuring, no behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants