Skip to content

QQ Bot: file attachments (PDF etc.) silently dropped when download fails #16979

@Skyline10124

Description

@Skyline10124

Summary

When QQ users send file attachments (PDFs, documents, etc.), they are silently discarded by the bot adapter if the download from QQ's file CDN fails. The agent has no indication that a file was sent, and no warning appears in logs.

Root Cause

Three compounding issues in gateway/platforms/qqbot/adapter.py:

1. No fallback text on download failure (silent data loss)

The else branch in _process_attachments() (non-image, non-voice attachments) only appends to other_attachments when cached_path is truthy — there is no else branch and the except block silently swallows the error:

# Current code
try:
    cached_path = await self._download_and_cache(url, ct)
    if cached_path:
        other_attachments.append(f"[Attachment: {filename or ct}]")
    # ← no else: download returns None → attachment disappears
except Exception as exc:
    logger.debug("[%s] Failed to cache attachment: %s", self._log_tag, exc)
    # ← no fallback text appended either

2. Debug-level logging invisible in production

Both failure paths use logger.debug():

  • _download_and_cache() download exception
  • _process_attachments() attachment cache exception

Production log level is typically INFO or WARNING, so these messages are never seen.

3. QQ file CDN (grouptalk.c2c.qq.com) auth mismatch

QQ uses different CDNs with different auth requirements:

  • Image CDN (multimedia.nt.qq.com.cn): requires Authorization: QQBot {token} header — current code handles this correctly
  • File CDN (grouptalk.c2c.qq.com): uses URL-embedded signatures (sign=, sig= params), auth header behavior is unclear/different — current code sends the same Authorization header unconditionally

The _qq_media_headers() method does not accept the URL, so it cannot differentiate between CDNs. File CDN downloads likely fail due to auth mismatch, but there is no logging to diagnose this.

4. Original filename lost

_download_and_cache() extracts filename from the CDN URL path (e.g., qqdownloadftnv5), which is a non-semantic CDN path segment. The actual filename from QQ message metadata (attachment.filename, e.g., 报告.pdf) is available in _process_attachments but never passed through.

Proposed Fix

All changes confined to gateway/platforms/qqbot/adapter.py:

  1. Add fallback text on failure — append [Attachment download failed: {filename}] in both the else and except branches so the agent can see a file was sent
  2. Upgrade logger.debuglogger.warning at both failure sites
  3. Pass original_filename through to _download_and_cache(), use it with priority over URL-path extraction
  4. Add diagnostic logging for file CDN requests — pass URL to _qq_media_headers(), log info when grouptalk.c2c.qq.com is detected

Diff

@@ -1190,11 +1190,14 @@ class QQAdapter(BasePlatformAdapter):
             else:
                 # Other attachments (video, file, etc.): record as text.
                 try:
-                    cached_path = await self._download_and_cache(url, ct)
+                    cached_path = await self._download_and_cache(url, ct, original_filename=filename)
                     if cached_path:
                         other_attachments.append(f"[Attachment: {filename or ct}]")
+                    else:
+                        other_attachments.append(f"[Attachment download failed: {filename or ct}]")
                 except Exception as exc:
-                    logger.debug("[%s] Failed to cache attachment: %s", self._log_tag, exc)
+                    logger.warning("[%s] Failed to cache attachment: %s", self._log_tag, exc)
+                    other_attachments.append(f"[Attachment download failed: {filename or ct}]")

@@ -1204,7 +1207,7 @@ class QQAdapter(BasePlatformAdapter):
             "attachment_info": attachment_info,
         }

-    async def _download_and_cache(self, url: str, content_type: str) -> Optional[str]:
+    async def _download_and_cache(self, url: str, content_type: str, original_filename: str = "") -> Optional[str]:
         """Download a URL and cache it locally."""

@@ -1218,12 +1221,12 @@ class QQAdapter(BasePlatformAdapter):
             resp = await self._http_client.get(
                 url,
                 timeout=30.0,
-                headers=self._qq_media_headers(),
+                headers=self._qq_media_headers(url),
             )
             resp.raise_for_status()
             data = resp.content
         except Exception as exc:
-            logger.debug(
+            logger.warning(
                 "[%s] Download failed for %s: %s", self._log_tag, url[:80], exc
             )
             return None

@@ -1236,7 +1239,7 @@ class QQAdapter(BasePlatformAdapter):
             # Convert to .wav using ffmpeg so STT engines can process it.
             return await self._convert_audio_to_wav(data, url)
         else:
-            filename = Path(urlparse(url).path).name or "qq_attachment"
+            filename = original_filename or Path(urlparse(url).path).name or "qq_attachment"
             return cache_document_from_bytes(data, filename)

@@ -1261,13 +1264,15 @@ class QQAdapter(BasePlatformAdapter):
             return True
         return False

-    def _qq_media_headers(self) -> Dict[str, str]:
+    def _qq_media_headers(self, url: str = "") -> Dict[str, str]:
         """Return Authorization headers for QQ multimedia CDN downloads.

         QQ multimedia URLs (multimedia.nt.qq.com.cn) require the bot's
         access token in an Authorization header, otherwise the download
         returns a non-200 status.
         """
+        if url and "grouptalk.c2c.qq.com" in url:
+            logger.info("[%s] File CDN download request: host=%s", self._log_tag, urlparse(url).hostname)
         if self._access_token:
             return {"Authorization": f"QQBot {self._access_token}"}
         return {}

Open Follow-up

The actual file CDN auth for grouptalk.c2c.qq.com needs investigation with a live QQ Bot instance (packet capture to determine required headers/cookies/signature handling). The diagnostic logging added here will help with that investigation.

Impact

Users sending PDFs or other files to the QQ bot receive no acknowledgment — the agent behaves as if nothing was sent. This is a data loss bug from the user's perspective.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/qqbotQQ Bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions