fix(package): check dirtiness of symlinks source files#14981
Merged
epage merged 6 commits intorust-lang:masterfrom Dec 31, 2024
Merged
fix(package): check dirtiness of symlinks source files#14981epage merged 6 commits intorust-lang:masterfrom
epage merged 6 commits intorust-lang:masterfrom
Conversation
Collaborator
weihanglo
commented
Dec 24, 2024
src/cargo/ops/cargo_package.rs
Outdated
| // TODO: Should we warn users if there are like thousands of symlinks, | ||
| // which may hurt the performance? | ||
| if repo.status_file(&rel_path)? != git2::Status::CURRENT { | ||
| dirty_symlinks.insert(workdir.join(rel_path)); |
Member
Author
There was a problem hiding this comment.
We could potentially cache git status for each repo then. That leaves to future improvement if this one has got merged.
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 25, 2024
### What does this PR try to resolve? Do nothing but move code around * `cargo_package.rs` -> `cargo_package/mod.rs` * Extract vcs info helpers to `cargo_package/vcs.rs` * Extract verification helpers to `cargo_package/verify.rs` Doing so because I realized how big `cargo_package.rs` has grown. Like, the vcs helpers will continue growing with #14981 and potentially other optimizations. ### How should we test and review this PR? All tests pass.
767408a to
451c06d
Compare
weihanglo
added a commit
to weihanglo/cargo
that referenced
this pull request
Dec 26, 2024
It may not be worthy for the majority. Single file `git status` is generally fast. On some slow file systems or lower-end machines, skipping file system access might be helpful. However, slow file system access usually happen on Window. And we'll only have large amount of `git status` when rust-lang#14981 merges and the repo contains lots of symlinks. Given symlink handling story is crazy in real world Windows, I doubt anybody will hit the performance issue without this.
This comment was marked as resolved.
This comment was marked as resolved.
451c06d to
19acb93
Compare
Member
Author
This has been fixed. |
19acb93 to
6baf1f5
Compare
epage
reviewed
Dec 31, 2024
epage
reviewed
Dec 31, 2024
epage
reviewed
Dec 31, 2024
epage
reviewed
Dec 31, 2024
epage
reviewed
Dec 31, 2024
73a57f4 to
95bb4ed
Compare
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 31, 2024
### What does this PR try to resolve? `cargo package` will warn users when git `core.symlinks` is `false` and some symlinks were checked out as plain files during packaging. Git config [`core.symlinks`] defaults to true when unset. In git-for-windows (and git as well), the config should be set to false explicitly when the repo was created, if symlink support wasn't detected [^1]. We assume the config was always set at creation time and never changed. So, if it is true, we don't bother users with any warning. [^1]: https://github.com/git-for-windows/git/blob/f1241afcc7956918d5da33ef74abd9cbba369247/setup.c#L2394-L2403 [`core.symlinks`]: https://git-scm.com/docs/git-config#Documentation/git-config.txt-coresymlinks ### How should we test and review this PR? CI passes. This shares two commits 42dc4ef and c8c8223 with #14981. I didn't commit to fix all symlink issues all at once. This PR demonstrates how we could leverage metadata in `PathEntry`. Maybe later we can really follow plain-text symlinks and resolve the issue. ### Additional information cc #5664
So `path` and `base` are able to accept different types
`abs_path` and `workdir.join(rel_path)` are the same at that point.
This comment has been minimized.
This comment has been minimized.
This is helpful for VCS status check. Paths emitted by PathSource are always under package root, We lose the track of file type info of paths under symlink dirs, so we need this extra bit of information.
This show that a regular file under a symlink directory is also not tarcked by the current vcs dirtiness check.
This adds a special case for checking source files are symlinks and have been modified when under a VCS control This is required because those paths may link to a file outside the current package root, but still under the git workdir, affecting the final packaged `.crate` file. This may have potential performance issue. If a package contains thousands of symlinks, Cargo will fire `git status` for each of them.
metdata path fields may point to a dirty symlilnk and cause duplicate report. This commit combines `dirty_metadata_paths` and `dirty_symlinks` into one so is able to de-duplicate them.
95bb4ed to
24dd205
Compare
weihanglo
added a commit
to weihanglo/cargo
that referenced
this pull request
Dec 31, 2024
It may not be worthy for the majority. Single file `git status` is generally fast. On some slow file systems or lower-end machines, skipping file system access might be helpful. However, slow file system access usually happen on Window. And we'll only have large amount of `git status` when rust-lang#14981 merges and the repo contains lots of symlinks. Given symlink handling story is crazy in real world Windows, I doubt anybody will hit the performance issue without this.
epage
approved these changes
Dec 31, 2024
This was referenced Jan 1, 2025
bors
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Jan 4, 2025
Update cargo 9 commits in d73d2caf9e41a39daf2a8d6ce60ec80bf354d2a7..fd784878cfa843e3e29a6654ecf564c62fae6735 2024-12-31 20:51:21 +0000 to 2025-01-03 20:06:26 +0000 - chore: bump gix-lock to remove thiserror@1 from `cargo` (rust-lang/cargo#15012) - refactor(manifest): Clean up field -> env var handling (rust-lang/cargo#15008) - chore(deps): update rust crate thiserror to v2 (rust-lang/cargo#14998) - test(git): Clean up shallow fetch tests (rust-lang/cargo#15002) - fix(schema): Correct and update the JSON Schema (rust-lang/cargo#15000) - chore(deps): update rust crate itertools to 0.14.0 (rust-lang/cargo#14996) - fix: env table config can't trigger rebuild with `rerun-if-env-changed`. (rust-lang/cargo#14756) - chore(deps): update alpine docker tag to v3.21 (rust-lang/cargo#14995) - fix(package): check dirtiness of symlinks source files (rust-lang/cargo#14981)
weihanglo
added a commit
to weihanglo/cargo
that referenced
this pull request
Apr 10, 2025
# This is the 1st commit message: test(package): show symlink to submodule failed the check This is a regression introduced by rust-lang#14981 # The commit message #2 will be skipped: # WIP
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR try to resolve?
This adds a special case for checking source files are symlinks
and have been modified when under a VCS control
This is required because those paths may link to a file outside the
current package root, but still under the git workdir, affecting the
final packaged
.cratefile.How should we test and review this PR?
Pretty similar to #14966, as a part of #14967.
This may have potential performance issue. If a package contains thousands of symlinks, Cargo will fire
git statusfor each of them. Not sure if we want to do anything proactively now.The introduction of the
PathEntrystruct gives us more room for storing file metadata to satisfiy use cases in the future. For instances,cargo package --list, for example JSON output modecargo package --list#11666includeandexcludeinformation to metadata #13331cargo package --listis not good for workspaces #13953