Skip to content

fix(gateway): validate Slack image downloads before caching#6971

Closed
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:fix/slack-image-html-validation
Closed

fix(gateway): validate Slack image downloads before caching#6971
Tranquil-Flow wants to merge 1 commit into
NousResearch:mainfrom
Tranquil-Flow:fix/slack-image-html-validation

Conversation

@Tranquil-Flow

Copy link
Copy Markdown
Contributor

Summary

  • Prevent Slack HTML sign-in/redirect pages from being cached as image files, which caused confusing downstream vision_analyze failures
  • Two layers of defense: Content-Type check in the Slack adapter + magic-byte validation in cache_image_from_bytes
  • Add defensive ValueError handling in WeCom and Email adapters so the new validation doesn't propagate as unrelated errors

Root cause

When the bot token lacks file access or the download URL expires, Slack returns HTTP 200 with an HTML sign-in page. _download_slack_file cached these bytes as .png/.jpg files. Later, vision_analyze rejected them with a confusing "Only real image files are supported" error.

Changes

  • gateway/platforms/slack.py: Reject text/html Content-Type before calling cache_image_from_bytes
  • gateway/platforms/base.py: Add _looks_like_image() magic-byte check (JPEG/PNG/GIF/BMP/WebP) in cache_image_from_bytes — raises ValueError on non-image data
  • gateway/platforms/wecom.py: Catch ValueError from cache_image_from_bytes to prevent false WebSocket reconnects
  • gateway/platforms/email.py: Catch ValueError from cache_image_from_bytes to prevent IMAP batch failures
  • tests/gateway/test_media_download_retry.py: 5 new TestCacheImageFromBytes tests + 1 Slack HTML rejection test; existing test data updated to use valid image magic bytes

Test plan

  • pytest tests/gateway/test_media_download_retry.py — 30 tests pass
  • pytest tests/gateway/test_wecom.py — 15 tests pass
  • pytest tests/gateway/test_email.py — 69 tests pass (+ all other callers verified safe via except Exception)

Closes #6829

Slack may return an HTML sign-in/redirect page (HTTP 200) instead of
actual image bytes when the bot token lacks file access or the
download URL has expired. Previously these HTML responses were cached
with a .png/.jpg extension, causing confusing downstream failures in
vision_analyze.

Two layers of defense:
- Slack adapter: reject text/html Content-Type before caching
- cache_image_from_bytes: validate magic bytes so no caller can
  accidentally cache non-image data regardless of upstream platform

Also add defensive ValueError handling in WeCom and Email adapters
so the new validation does not propagate as an unrelated error.

Closes NousResearch#6829
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #7125 which cherry-picks your cross-platform approach onto current main. Your centralized magic-byte validation in base.py was the winning architecture — it protects all adapters, not just Slack. Thanks for the thorough fix, @Tranquil-Flow!

@teknium1 teknium1 closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Slack image attachments can be cached as HTML sign-in pages, causing downstream vision failures

2 participants