fix(gateway): diagnosable MEDIA rejections + canonical cache roots + null-path guard#34485
Merged
Conversation
…null-path guard
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.
- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
documents,screenshots} alongside the legacy *_cache dirs (#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
"outside allowed roots") via _log_safe_path, which strips control chars
and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
so a ~\x00 path returns None / is skipped instead of raising and dropping
every other attachment in the response.
Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.
Co-authored-by: wysie <wysie@users.noreply.github.com>
| safe_media.append((safe_path, bool(is_voice))) | ||
| else: | ||
| logger.warning("Skipping unsafe MEDIA directive path outside allowed roots") | ||
| logger.warning("Skipping unsafe MEDIA directive path: %s", _log_safe_path(raw)) |
| safe_paths.append(safe_path) | ||
| else: | ||
| logger.warning("Skipping unsafe local file path outside allowed roots") | ||
| logger.warning("Skipping unsafe local file path: %s", _log_safe_path(raw)) |
Contributor
Contributor
Author
|
@GodsBoy you're right, and thanks for flagging it. The original work in #33251 — all three findings — is yours (Dewaldt Huysamen, dhuysamen@gmail.com). The |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Salvages the three real fixes from #33251 in 35 LOC of production change. After this, operators can see which MEDIA path was dropped, generated artifacts under the canonical
~/.hermes/cache/{images,...}layout deliver, and a crafted~\x00path no longer aborts the whole attachment batch.Changes
gateway/platforms/base.py:MEDIA_DELIVERY_SAFE_ROOTS: add canonicalcache/{images,audio,videos,documents,screenshots}alongside the legacy*_cachedirs (Generated images under cache/images may not deliver via Telegram MEDIA, while image_cache works #31733).filter_media_delivery_paths/filter_local_delivery_paths: log the rejected path (was a blind "outside allowed roots") via_log_safe_path, which strips control chars + Unicode line separators (NEL/LS/PS) so a model-emitted path can't forge a second log line.validate_media_delivery_path+extract_media: guardos.path.expanduserso a~\x00path returns None / is skipped instead of raisingValueError("embedded null byte")and dropping every other attachment.Why slimmed from #33251
@GodsBoy's PR (780 LOC net) was correct but gold-plated. Dropped here:
_MEDIA_REASON_*taxonomy +_validate_..._with_reasonsplit — the logged path is the diagnostic; the reason vocabulary was for a deferredmedia_rejected[]return field that doesn't exist yet._redact_path_for_logparts-elision engine → one-line_log_safe_path._FINAL_RESPONSE_*_EXTSmodule hoist — unrelated test-convenience refactor.Also closes a gap the original version missed: the
extract_mediaingestion site (upstream of the filter loops) was unguarded.Validation
~\x00in a batchcache/imagesscripts/run_tests.sh tests/gateway/test_platform_base.py→ 116 passed, 0 failed (5 new behavioral tests, no reason-string snapshots).Closes #33251. Original work and all three findings by @GodsBoy (Dewaldt Huysamen). The merge commit's Co-authored-by trailer was incorrectly written as @wysie during salvage — that was our error; full credit belongs to @GodsBoy.
Infographic