Defensive: validate tar members in parakeet NeMo conversion (CWE-22 / Zip Slip)#46118
Closed
JAE0Y2N wants to merge 1 commit into
Closed
Defensive: validate tar members in parakeet NeMo conversion (CWE-22 / Zip Slip)#46118JAE0Y2N wants to merge 1 commit into
JAE0Y2N wants to merge 1 commit into
Conversation
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: parakeet |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While checking
src/transformers/models/parakeet/convert_nemo_to_hf.pyI noticedunzip(zip_path, dest_dir)at line 692 passes the .nemo archive throughtarfile.extractall(dest_dir)without any per-entry path validation. Python'starfile.extractalldoesn't reject entries with..segments or absolute paths, so a malicious.nemoarchive could write outsidedest_dir— classic Zip Slip / CWE-22, but tar-flavored.Threat model is defense-in-depth: the conversion script is meant to be run by users importing NeMo models, and the realistic-exploit precondition is a malicious
.nemodistributed through whatever channel the user trusts. That's an upstream-trust event larger than this script. But the fix is one helper and matches Python's ownPEP 706data-filter direction, so it's cheap defense-in-depth at the helper boundary.What the patch does
_safe_extractall(tar, dest_dir)runs beforeextractalland rejects any member whose normalized path would land outsidedest_dir:os.path.realpath(os.path.join(dest_real, member.name))is compared againstos.path.realpath(dest_dir)viaos.path.commonpath. If the resolved entry is not under the destination, raiseValueError.This is the same hardening I added recently to
langgraphzip-handling (#7873) — same pattern, tar instead of zip.Test plan
.tarwith one valid file and one entry named../escape.txt, point_safe_extractallat a temp dir, confirm it raisesValueError. Done locally before sending this PR.convert_nemo_to_hf.pyis unchanged for the well-formed NeMo archives that hf-hub-distributed models ship.Happy to add a unit test under the right test module if you point me at one — I don't see existing unit tests for this conversion script.