feat(gateway): add CSV and JSON to supported document types with text injection#4109
Open
tjp2021 wants to merge 1 commit into
Open
feat(gateway): add CSV and JSON to supported document types with text injection#4109tjp2021 wants to merge 1 commit into
tjp2021 wants to merge 1 commit into
Conversation
… injection CSV (.csv) and JSON (.json) files uploaded to any messaging platform are currently silently dropped because they are not in SUPPORTED_DOCUMENT_TYPES. This is inconsistent with the WhatsApp adapter, which already handles these types including text injection. Changes: - Add .csv (text/csv) and .json (application/json) to SUPPORTED_DOCUMENT_TYPES - Introduce TEXT_INJECTABLE_EXTENSIONS constant in base.py to centralize the set of extensions eligible for inline content injection - Update text injection conditions in Slack, Discord, Telegram, and Feishu adapters to use TEXT_INJECTABLE_EXTENSIONS instead of hardcoded tuples - Add tests for CSV/JSON acceptance, text injection, oversized file handling, binary content graceful degradation, and MIME-based extension resolution Security note: text injection carries the same prompt injection surface as existing .txt/.md support. Existing mitigations apply (100KB cap, UTF-8 validation, UnicodeDecodeError handling, path traversal protection). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 task
d5992e5 to
c5fd76a
Compare
14 tasks
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds
.csvand.jsontoSUPPORTED_DOCUMENT_TYPESand introduces a centralizedTEXT_INJECTABLE_EXTENSIONSconstant so all gateway adapters consistently accept, cache, and optionally inline these file types — matching the behavior WhatsApp already had.Closes #4105
Type of Change
Problem
CSV and JSON files uploaded on Slack, Discord, Telegram, and Feishu are silently skipped — they're not in
SUPPORTED_DOCUMENT_TYPES, so the file handler'sif ext not in SUPPORTED_DOCUMENT_TYPES: continueignores them. Telegram sends an "Unsupported document type" reply; the other three give no feedback.The WhatsApp adapter already handles both types — its text injection list at line 775 of
whatsapp.pyincludes.csv,.json, and several others. The other four adapters don't.Each adapter also hardcodes its own
if ext in (".md", ".txt")check for text injection. Adding a new injectable type means editing every adapter file individually.Changes Made
Core (
gateway/platforms/base.py).csv: "text/csv"and.json: "application/json"toSUPPORTED_DOCUMENT_TYPESTEXT_INJECTABLE_EXTENSIONSfrozenset ({".md", ".txt", ".csv", ".json"}) to centralize the set of extensions eligible for inline text injectionAdapters (Slack, Discord, Telegram, Feishu)
TEXT_INJECTABLE_EXTENSIONSfrombaseif ext in (".md", ".txt")withif ext in TEXT_INJECTABLE_EXTENSIONS"text/csv"and"application/json"to the MIME-type fallback check in_maybe_extract_text_document()Tests (20 new tests across 4 files)
test_document_cache.py: Added.csvand.jsontotest_expected_extensions_presentparametrize list (+2); addedtest_text_injectable_is_subset_of_supported(+1) and parametrizedtest_expected_text_injectable_extensions(+4)test_slack.py: 4 new tests — CSV cached+injected, JSON cached+injected, large CSV (>100KB) cached but not injected, binary JSON cached but not injectedtest_discord_document_handling.py: 4 new tests — same coverage as Slacktest_telegram_documents.py: 5 new tests — same as Discord plus MIME→extension fallback test for JSON without filenameNot modified
whatsapp.py— already handles CSV/JSON; no changes neededTesting
Live-tested (Slack): Tested on a running Slack gateway with real file uploads:
@botmention — file cached, content injected, agent responds with summaryUnit-tested (all adapters): 145 tests pass across 4 test files (125 before this PR, 145 after). Discord, Telegram, and Feishu were verified via unit tests only — the code change is the same one-line swap in each adapter.
./venv/bin/python -m pytest tests/gateway/test_document_cache.py tests/gateway/test_slack.py tests/gateway/test_discord_document_handling.py tests/gateway/test_telegram_documents.py -v -o "addopts="Usage note
On Slack, files must be attached to a message that
@mentionsthe bot. Uploading files without a mention sends afile_sharedevent, which the adapter does not handle. This is a pre-existing limitation of the Slack adapter's event handling, not introduced by this PR.Security Notes
Text injection (decoding file content into
event.text) already exists for.mdand.txt. CSV and JSON use the same code path with the same guards:MAX_TEXT_INJECT_BYTES)UnicodeDecodeErrorcatch (binary files skip injection)cache_document_from_bytes()CSV/JSON content could contain adversarial text, but this is the same risk as
.txt/.md— not a new attack class.Code Checklist