fix(gateway): diagnosable MEDIA delivery + canonical cache roots (#31733)#33251
fix(gateway): diagnosable MEDIA delivery + canonical cache roots (#31733)#33251GodsBoy wants to merge 2 commits into
Conversation
|
Confirmed this bug cluster from the operator side on Windows 11 / Telegram DM delivery. In my repro, the normal Hermes reply path reported success but the attachment still did not appear to the user. The decisive signal was:
That strongly supports the thesis here that there is a final-response / MEDIA-path parity problem distinct from raw Telegram/media validity. Separately, I also reproduced a second independent bug in |
…sResearch#31733) Builds on the direction in NousResearch#31764 with a broader cut at the same bug surface that surfaces in operator reports as "MEDIA tag did not attach": * MEDIA_DELIVERY_SAFE_ROOTS now lists the canonical ~/.hermes/cache/{images,audio,videos,documents,screenshots} explicitly alongside the legacy ~/.hermes/{type}_cache/ entries. get_hermes_dir returns the legacy path when it exists, so IMAGE_CACHE_DIR alone drops the canonical layout on co-existing installs (root cause of NousResearch#31733). * validate_media_delivery_path now delegates to a private _validate_media_delivery_path_with_reason helper that returns (resolved_or_None, kebab_reason). filter_media_delivery_paths and filter_local_delivery_paths log the reason plus a redacted path (_redact_path_for_log), so operators chasing rejected MEDIA tags can tell allowlist-miss from stale-mtime from denied-prefix from does-not-resolve without rebuilding the bot with debug logging. * _redact_path_for_log neutralises control characters (newline, NUL, carriage return) so a model-emitted path containing an embedded newline cannot forge a second log line. * Hoisted _FINAL_RESPONSE_IMAGE_EXTS / _FINAL_RESPONSE_VIDEO_EXTS to module scope so the regression test imports the same partition the final-response dispatch actually uses, instead of snapshotting a local copy that could drift. * Regression coverage added in tests/gateway/test_platform_base.py: - default safe-roots tuple contains both legacy and canonical layouts - legacy and canonical image caches co-exist (the strengthened failure-shape test NousResearch#31764 review asked for) - PDF deliverable from canonical ~/.hermes/cache/documents and legacy ~/.hermes/document_cache - filter_*_delivery_paths logs the redacted path and the kebab reason tag (using the imported _MEDIA_REASON_* constants, not raw strings) for stale-mtime, no-recency, does-not-resolve, denied-prefix, and newline-injection cases - validate_media_delivery_path delegation invariant: public API matches the resolved path returned by the reason-bearing helper - send_message tool and final-response paths share extract_media + filter_media_delivery_paths (asserted via source introspection so a future split surfaces here) - PDF MEDIA tag routes to send_document via the production extension partition (no image / video misroute) Scope: gateway media delivery only. Does not duplicate NousResearch#32472's extension-derivation refactor or NousResearch#31561 / NousResearch#33206's markdown attachment recognition work; if any of those land first, the diagnosable-logging contract here still applies and the canonical-root entries become a clean merge against NousResearch#31764. Deferred follow-up (out of scope for this PR): - Surfacing the rejection reason to the calling agent (send_message tool currently returns {success: true} regardless of dropped attachments) - Unifying the three dispatch-site extension sets across _process_message_background, _deliver_media_from_response, and the send_message tool path - Pre-existing residual: symlink-swap of a cache root before image_generate writes; TOCTOU between resolve() and upload; /var/tmp / /var/cache not in denylist
Review follow-ups on the diagnosable-MEDIA-delivery change: - _redact_path_for_log now neutralizes Unicode line separators (NEL U+0085, LS U+2028, PS U+2029) in addition to the C0/DEL range. The previous regex only covered control chars, so a model-emitted path with an embedded U+2028 still split into a second log line and defeated the stated 'cannot forge a fake second log line' guarantee. - _validate_media_delivery_path_with_reason wraps os.path.expanduser so a '~\x00...' path returns does-not-resolve instead of raising ValueError. The filter loops also gain a per-item guard so one unprocessable path can no longer abort the batch and silently drop every other attachment in the response. - Inlined the _FINAL_RESPONSE_*_EXTS module constants at the dispatch sites, removing the pointless local aliases, and dropped a redundant str() coercion in the validator. Tests (tests/gateway/test_platform_base.py, 116 -> 128): - Direct unit coverage for _redact_path_for_log (empty, short no-elide, long elision, C0 neutralization, Unicode line separators, unprintable fallback). - Batch-isolation regression: NUL-after-tilde path and a synthetic validator raise both drop only the offending item; the good attachment survives. - Delegation parity test now reaches the recency-trusted ACCEPT and not-a-file branches that were previously unreachable. - Reason-tag log coverage for not-absolute and not-a-file. - De-overclaimed the 'identical for both paths' parity test name and docstring (it exercises the shared surface once; structural parity is asserted separately by the import test).
06ec17d to
7e3b47a
Compare
…null-path guard
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.
- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
documents,screenshots} alongside the legacy *_cache dirs (#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
"outside allowed roots") via _log_safe_path, which strips control chars
and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
so a ~\x00 path returns None / is skipped instead of raising and dropping
every other attachment in the response.
Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.
Co-authored-by: wysie <wysie@users.noreply.github.com>
|
Merged via #34485 (commit e28a668 on main), with your authorship preserved via a All three of your findings were real, verified, and reproduced — and they shipped:
We landed a slimmed implementation (~35 LOC of production change vs ~780): the |
…null-path guard
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.
- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
documents,screenshots} alongside the legacy *_cache dirs (NousResearch#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
"outside allowed roots") via _log_safe_path, which strips control chars
and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
so a ~\x00 path returns None / is skipped instead of raising and dropping
every other attachment in the response.
Salvaged and slimmed from NousResearch#33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.
Co-authored-by: wysie <wysie@users.noreply.github.com>

Summary
Adds diagnosable MEDIA delivery, canonical cache roots, and two pieces of
untrusted-input hardening to
gateway/platforms/base.py, rebased on top ofthe denylist-only default that landed in #34022.
When #34022 flipped the default to denylist-only it restored delivery of stale
and outside-allowlist artifacts, which solved the most visible "MEDIA didn't
attach" reports. It did not, however, make rejections diagnosable, add the
canonical cache roots that strict mode still needs, or fix two latent
robustness/integrity bugs in the same code path. This PR closes those gaps and
complements #34022 rather than competing with it.
What this PR adds (on top of #34022)
Diagnosable rejection reasons.
validate_media_delivery_pathnowdelegates to a private
_validate_media_delivery_path_with_reason(path) -> (resolved_or_None, kebab_reason).filter_media_delivery_pathsandfilter_local_delivery_pathslog the reason plus a redacted path, so anoperator can tell
denylist-clearedfromoutside-allowlist-denied-prefixfrom
outside-allowlist-stale-mtimefromdoes-not-resolvewithoutrebuilding the bot with debug logging. The reason taxonomy covers both the
default denylist-only path and the opt-in strict allowlist/recency path.
Canonical cache roots.
MEDIA_DELIVERY_SAFE_ROOTSlists the canonical~/.hermes/cache/{images,audio,videos,documents,screenshots}explicitlyalongside the legacy
~/.hermes/{type}_cache/entries.get_hermes_dirreturns the legacy path when it exists, so
IMAGE_CACHE_DIRalone drops thecanonical layout on co-existing installs. This is the root cause of Generated images under cache/images may not deliver via Telegram MEDIA, while image_cache works #31733
and still bites in strict mode, where the allowlist is the gate.
Log-forging hardening.
_redact_path_for_logneutralises controlcharacters AND Unicode line separators (NEL U+0085, LS U+2028, PS U+2029) so
a model-emitted path cannot forge a second operator log line. The C0 range
alone is not enough:
str.splitlines()and most log aggregators split onU+2028/U+2029 too.
Batch-abort fix.
os.path.expanduseris now wrapped so a~\x00...path returns
does-not-resolveinstead of raisingValueError: embedded null byte, and both filter loops gain a per-item guard. Without this, asingle crafted path silently drops every other attachment in the response.
Why this is still needed after #34022
Verified against current
main:filter_media_delivery_paths/filter_local_delivery_pathsstill log theblind
Skipping unsafe MEDIA directive path outside allowed rootswith nopath and no reason.
~\x00...batch-abort is a live bug: notry/exceptaroundexpanduser, no per-item guard in the loops._redact_path_for_logdoes not exist onmain, so there is nolog-separator neutralisation at all.
fixes) are still unmerged.
Relationship to the MEDIA-delivery cluster
preserves its behavior exactly. Default mode stays denylist-only; the
allowlist + recency logic and the canonical roots serve strict mode
(
gateway.strict: true/HERMES_MEDIA_DELIVERY_STRICT=1).This PR includes equivalent canonical entries plus the diagnosable logging
and hardening.
hoists the final-response extension partition to module scope so the
regression tests import the same set production dispatches on; it does not
touch the regex.
addressed here.
_send_via_adapterdroppingmedia_files): a separate bug,out of scope and untouched.
How to test
Broader MEDIA-touching suite (
tests/gateway,tests/cron/test_scheduler.py):6127 passed, 0 failed.
New / updated regression coverage:
denylist-cleared) and denied-prefixreject, plus strict-mode allowlist/recency reasons, all asserted against
imported
_MEDIA_REASON_*constants (no raw strings)MEDIA_DELIVERY_SAFE_ROOTScontains both legacy and canonical layoutsimage_cacheand canonicalcache/imagesco-exist on disk; bothdeliverable; PDF deliverable from canonical and legacy document dirs
_redact_path_for_log(elision boundary, C0neutralisation, U+2028/U+2029/U+0085 neutralisation, unprintable fallback)
~\x00...path and a synthetic validator raise each drop only theoffending item; the good attachment survives the batch
validate_media_delivery_pathagrees with the reason helperacross both modes and all branches
not-absoluteandnot-a-fileDeferred follow-up (out of scope)
send_messagereturns{success: true}regardless of dropped attachments; the agent has noprogrammatic way to observe the drop). The new
_MEDIA_REASON_*vocabularyis the natural shape for a
media_rejected: [{path_hint, reason}]field.(
_process_message_background,_deliver_media_from_response,tools/send_message_tool). This PR only hoists the final-response partition;tools/send_message_tool._VIDEO_EXTSstill omits.webmtoday.image_generatewrites; TOCTOU between
resolve()and upload;/var/tmpand/var/cachenot in the denylist.