Skip to content

Defensive: validate tar members in parakeet NeMo conversion (CWE-22 / Zip Slip)#46118

Closed
JAE0Y2N wants to merge 1 commit into
huggingface:mainfrom
JAE0Y2N:fix/cwe22-zipslip-parakeet-nemo-extract
Closed

Defensive: validate tar members in parakeet NeMo conversion (CWE-22 / Zip Slip)#46118
JAE0Y2N wants to merge 1 commit into
huggingface:mainfrom
JAE0Y2N:fix/cwe22-zipslip-parakeet-nemo-extract

Conversation

@JAE0Y2N

@JAE0Y2N JAE0Y2N commented May 20, 2026

Copy link
Copy Markdown

While checking src/transformers/models/parakeet/convert_nemo_to_hf.py I noticed unzip(zip_path, dest_dir) at line 692 passes the .nemo archive through tarfile.extractall(dest_dir) without any per-entry path validation. Python's tarfile.extractall doesn't reject entries with .. segments or absolute paths, so a malicious .nemo archive could write outside dest_dir — classic Zip Slip / CWE-22, but tar-flavored.

Threat model is defense-in-depth: the conversion script is meant to be run by users importing NeMo models, and the realistic-exploit precondition is a malicious .nemo distributed through whatever channel the user trusts. That's an upstream-trust event larger than this script. But the fix is one helper and matches Python's own PEP 706 data-filter direction, so it's cheap defense-in-depth at the helper boundary.

What the patch does

_safe_extractall(tar, dest_dir) runs before extractall and rejects any member whose normalized path would land outside dest_dir:

  • Members with absolute paths or drive-prefixed paths are rejected outright.
  • For every other member, os.path.realpath(os.path.join(dest_real, member.name)) is compared against os.path.realpath(dest_dir) via os.path.commonpath. If the resolved entry is not under the destination, raise ValueError.
  • Behavior for well-formed archives: unchanged.

This is the same hardening I added recently to langgraph zip-handling (#7873) — same pattern, tar instead of zip.

Test plan

  • Manual reproducer: build a 2-entry .tar with one valid file and one entry named ../escape.txt, point _safe_extractall at a temp dir, confirm it raises ValueError. Done locally before sending this PR.
  • convert_nemo_to_hf.py is unchanged for the well-formed NeMo archives that hf-hub-distributed models ship.

Happy to add a unit test under the right test module if you point me at one — I don't see existing unit tests for this conversion script.

@github-actions

Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: parakeet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants