Skip to content

[Feature]: Add CSV and JSON to supported document types across all gateway adapters #4105

@tjp2021

Description

@tjp2021

Problem or Use Case

CSV and JSON files uploaded on Slack, Discord, Telegram, and Feishu are silently skipped. SUPPORTED_DOCUMENT_TYPES in gateway/platforms/base.py contains 6 extensions (.pdf, .md, .txt, .docx, .xlsx, .pptx). CSV and JSON are not in the allowlist, so the file handler's if ext not in SUPPORTED_DOCUMENT_TYPES: continue ignores them.

Telegram sends an "Unsupported document type" reply. Slack, Discord, and Feishu give no feedback — the file is dropped with no indication to the user.

Inconsistency across adapters: The WhatsApp adapter already handles CSV and JSON. Its text injection list at line 775 of whatsapp.py includes .csv, .json, .xml, .yaml, .yml, and more. The other four adapters reject these same types.

Each adapter also hardcodes its own if ext in (".md", ".txt") check for text injection, rather than using a shared constant.

Proposed Solution

  1. Add .csv (text/csv) and .json (application/json) to SUPPORTED_DOCUMENT_TYPES in gateway/platforms/base.py
  2. Introduce a TEXT_INJECTABLE_EXTENSIONS constant in base.py to centralize the set of extensions eligible for inline content injection
  3. Update text injection conditions in Slack, Discord, Telegram, and Feishu adapters to use the shared constant
  4. Add tests for each adapter: acceptance, text injection, oversized file handling, binary content graceful degradation

PR: #4109

Usage note

On Slack, files must be attached to a message that @mentions the bot. Uploading files without a mention sends a file_shared event which the adapter does not handle. This is a pre-existing limitation of the Slack adapter's event handling, not introduced by this change.

Multi-file uploads work — all files in a single message are processed and injected in order.

Security considerations:

  • Text injection carries the same prompt injection surface as existing .txt/.md support — not a new attack class
  • Existing mitigations apply: 100 KB injection cap, UTF-8 validation with UnicodeDecodeError catch, 20 MB download limit, path traversal protection in cache_document_from_bytes()

Related: #3487 (Matrix gateway text-file enrichment) describes the same class of issue.

Feature Type

Gateway / messaging

Alternatives Considered

  • Config-based extension mechanism: More flexible but heavier — SUPPORTED_DOCUMENT_TYPES has no config override today. Could be a follow-up.
  • Broader text injection set (.xml, .yaml, .py, .js, .ts, etc.): WhatsApp already does this. Kept scope to CSV and JSON for now; can expand later.

Are you willing to implement this?

  • I'd like to implement this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/gatewayGateway runner, session dispatch, deliverytype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions