fix(cli): refuse to extract zip entries that escape the destination (CWE-22 / Zip Slip)#7870
Closed
Jaeyoung Yun (JAE0Y2N) wants to merge 1 commit into
Closed
Conversation
…CWE-22)
`_download_repo_with_requests` in `langgraph_cli/templates.py` passes the
downloaded template archive straight through `ZipFile.extractall(path)`.
Python's `extractall` does not validate per-entry paths, so a malicious
archive containing entries like `../../etc/passwd` or `/tmp/anywhere` will
write to whatever the running user can write to — outside the project
directory the operator selected.
Today the templates are fetched from langchain-ai/* repos over HTTPS, so
the realistic precondition is a supply-chain compromise of one of those
template repos or a TLS MITM between `request.urlopen` and github.com.
Both are unlikely in normal use — but defense-in-depth is cheap here and
mirrors the standard hardening that other Python project scaffolders
have already adopted.
This patch introduces `_safe_extract`, which:
* rejects entries with absolute paths or drive-prefixed paths outright
* resolves each entry against the destination via `os.path.realpath`
and refuses any entry whose resolved path is not within the
destination (`os.path.commonpath` check)
* surfaces a clear error and exits non-zero if a malformed archive is
seen, rather than silently writing files outside the project
The `*-main` cleanup loop is unchanged — it only runs after extraction
succeeds and only touches entries inside `path`.
No behavior change for well-formed archives (which is every archive the
helper currently sees in production); this only refuses inputs that
would have escaped the destination today.
Refs: CWE-22 (Path Traversal), specifically the Zip-Slip variant
(https://snyk.io/research/zip-slip-vulnerability).
Contributor
|
This PR has been automatically closed because you are not assigned to the linked issue. External contributors must be assigned to an issue before opening a PR for it. Please:
Maintainers: reopen this PR or remove the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #7871.
While reviewing
langgraph_cli.templates._download_repo_with_requestsI noticed it passes the downloaded template archive straight throughZipFile.extractall(path). Python'sextractalldoesn't validate per-entry paths, so an archive containing entries like../../etc/passwd(or an absolute path) writes outside the destination the user selected — classic Zip Slip / CWE-22.In normal use the templates come from
langchain-ai/*template repos over HTTPS, so the realistic precondition for an exploit is a supply-chain compromise of one of those template repos or TLS MITM againstgithub.com. Both are unlikely, and I confirmed there's no user-controlled URL injection vector (the--templateflag is hard-rejected againstTEMPLATE_ID_TO_CONFIG, andTEMPLATESis a hardcoded dict). So this is defense-in-depth rather than an active vulnerability.That said, the fix is one small helper and matches the standard hardening other Python scaffolders have already adopted (npm
tarv7+,pnpx create-X-appflows, etc.). Refusing malformed archives at the helper boundary is cheap to add and protects future code paths that might pull from less-trusted sources (community templates, mirrors, internal proxies, test fixtures).What the patch does
_safe_extract(zip_file, path)runs beforeextractalland rejects any entry whose normalized path would land outsidepath:os.path.realpath(os.path.join(real_dest, member))and compared toos.path.realpath(path)viaos.path.commonpath. If the resolved entry is not under the destination, raise.No behavior change for well-formed archives.
Test plan
python3 -c "import ast; ast.parse(open('libs/cli/langgraph_cli/templates.py').read())"— parses cleanly.../escape.txt, point_safe_extractat a temp dir, confirm it raisesValueErrorand exits non-zero. (Happy to add a unit test underlibs/cli/tests/if you'd like — let me know which test module is the right home; I don't see existing tests fortemplates.py.)_download_repo_with_requestsis unchanged for the well-formed path the production templates always take.