Skip to content

feat(gateway): accept .html / .htm in the document allowlist#12702

Open
handsdiff wants to merge 1 commit into
NousResearch:mainfrom
handsdiff:feat/html-document-support
Open

feat(gateway): accept .html / .htm in the document allowlist#12702
handsdiff wants to merge 1 commit into
NousResearch:mainfrom
handsdiff:feat/html-document-support

Conversation

@handsdiff

Copy link
Copy Markdown
Contributor

Summary

Same pattern as #12590 (epub). SUPPORTED_DOCUMENT_TYPES in gateway/platforms/base.py is the single allowlist used by every platform handler (telegram, slack, discord, feishu, whatsapp) to decide whether to cache an uploaded document. It currently rejects .html uploads at the gateway — users trying to share saved web pages get Unsupported document type '.html' — even though the ocr-and-documents skill's marker extractor already handles HTML alongside PDF/DOCX/PPTX/XLSX/EPUB.

Add .html / .htmtext/html.

Test plan

  • pytest tests/gateway/test_document_cache.py tests/gateway/test_telegram_documents.py — 58 pass
  • Send an .html file via Telegram and confirm the agent can read it via the ocr-and-documents skill

@handsdiff handsdiff force-pushed the feat/html-document-support branch 2 times, most recently from 1d4acd0 to ab19e84 Compare April 19, 2026 22:11
@handsdiff handsdiff force-pushed the feat/html-document-support branch from ab19e84 to 267bcd6 Compare April 21, 2026 20:25
@alt-glitch alt-glitch added type/feature New feature or request comp/gateway Gateway runner, session dispatch, delivery labels Apr 21, 2026
@handsdiff handsdiff force-pushed the feat/html-document-support branch from 267bcd6 to b105228 Compare April 22, 2026 15:53
Same shape as the epub addition (NousResearch#12590). ``SUPPORTED_DOCUMENT_TYPES``
in ``gateway/platforms/base.py`` is the single allowlist used by every
platform handler (telegram, slack, discord, feishu, whatsapp) to decide
whether to cache an uploaded document. It rejected ``.html`` uploads at
the gateway even though the ``ocr-and-documents`` skill's ``marker``
extractor already handles HTML alongside PDF/DOCX/PPTX/XLSX/EPUB.

Add both ``.html`` and ``.htm`` → ``text/html`` so users can share saved
web pages (Telegram's "share page as file" export, "Save As" HTML, email
attachments) and the agent can actually read them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@handsdiff handsdiff force-pushed the feat/html-document-support branch from b105228 to eab3752 Compare April 24, 2026 03:26
handsdiff added a commit to handsdiff/hermes-agent that referenced this pull request Apr 24, 2026
- ``fix/rehydrate-compaction-summary`` was wrong — actual branch is
  ``feat/rehydrate-compaction-summary``. Corrected everywhere.
- Add ``feat/html-document-support`` (PR NousResearch#12702) to the open-PR table,
  rebase workflow, merge loop, and per-PR notes. Note the expected
  union-merge conflict with epub when rebuilding fork main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants