Skip to content

Vendor mode: move the external repo instead of copying#22668

Closed
meteorcloudy wants to merge 4 commits intobazelbuild:masterfrom
meteorcloudy:vendor_by_moving
Closed

Vendor mode: move the external repo instead of copying#22668
meteorcloudy wants to merge 4 commits intobazelbuild:masterfrom
meteorcloudy:vendor_by_moving

Conversation

@meteorcloudy
Copy link
Copy Markdown
Member

@meteorcloudy meteorcloudy commented Jun 7, 2024

This drastically improves the speed of vendoring external repositories.

Related: #19563

@meteorcloudy meteorcloudy requested a review from Wyverald as a code owner June 7, 2024 17:56
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Jun 7, 2024
FileSystemUtils.moveFile(markerUnderExternal, tMarker);
// 3. Move the external repo to vendor dir. It's fine if this step fails or is interrupted, because the marker
// file under external is gone anyway.
FileSystemUtils.moveTreesBelow(repoUnderExternal, repoUnderVendor);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this behave if a repo symlinks files from another repo and one is vendored while the other is not? It looks like it may be necessary to follow relative symlinks but not absolute symlinks.

Copy link
Copy Markdown
Member Author

@meteorcloudy meteorcloudy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The moveTreesBelow doesn't follow any symlinks. Judging from the code here, it's actually impossible to create relative symlink with the ctx.symlink API.

I tested with

ctx.symlink("/tmp/foo", "path_abs")
ctx.symlink("data", "path_rel")
ctx.symlink(ctx.path(Label("@bar//:data")), "path_bar")
ctx.symlink("../_main~ext~bar~/data", "path_bar_2")

and it resulted

path_abs@ -> /tmp/foo
path_bar@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~bar/data
path_bar_2@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~bar~/data
path_rel@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~foo/data

in both external and vendor dir.

This is fine if only foo is vendored, since eventually <output_base>/external/_main~ext~bar would exist and point to the right location. However, I noticed there is problem if output base is changed after vendoring.

Copy link
Copy Markdown
Collaborator

@fmeum fmeum Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the current behavior, wouldn't we have to change it so that symlinks in vendored repos do not contain absolute paths? I think there was another issue about this filed recently.

Copy link
Copy Markdown
Member Author

@meteorcloudy meteorcloudy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

I have an experimental implementation in meteorcloudy@bf0ec69, which results

$ ll vendor_src/_bazel-external
lrwxr-xr-x  1 pcloudy  primarygroup  73 Jun 10 15:17 vendor_src/_bazel-external@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external
pcloudy@pcloudy-macbookpro2:~/workspace/my_tests/simple_cpp_test (master)
$ ll vendor_src/_main~ext~foo/
total 8
drwxr-xr-x  9 pcloudy  primarygroup  288 Jun 10 15:17 ./
drwxr-xr-x  7 pcloudy  primarygroup  224 Jun 10 15:17 ../
-rwxr-xr-x  1 pcloudy  wheel           0 Jun 10 15:17 BUILD*
-rwxr-xr-x  1 pcloudy  wheel           0 Jun 10 15:17 REPO.bazel*
-rwxr-xr-x  1 pcloudy  wheel          15 Jun 10 15:17 data*
lrwxr-xr-x  1 pcloudy  wheel           8 Jun 10 15:17 path_abs@ -> /tmp/foo
lrwxr-xr-x  1 pcloudy  primarygroup   37 Jun 10 15:17 path_bar@ -> ../_bazel-external/_main~ext~bar/data
lrwxr-xr-x  1 pcloudy  primarygroup   38 Jun 10 15:17 path_bar_2@ -> ../_bazel-external/_main~ext~bar/data2
lrwxr-xr-x  1 pcloudy  primarygroup   37 Jun 10 15:17 path_rel@ -> ../_bazel-external/_main~ext~foo/data

Please let me know what you think, and preferably I'll do it in another PR.
/cc @Wyverald @fmeum

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the current behavior, wouldn't we have to change it so that symlinks in vendored repos do not contain absolute paths? I think there was another issue about this filed recently.

#22303, probably

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

this is quite clever! but what will version-control systems do with this special symlink? Usually people put bazel-* symlinks in the workspace root in .gitignore, so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Either way, some sort of symlink rewriting will need to happen, and we'll probably need to do something similar for the true repo cache.

Copy link
Copy Markdown
Member Author

@meteorcloudy meteorcloudy Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Yes, I also think it should be gitignored since it's machine specific. And we can just always re-create the symlink since it's quite cheap to keep the code simple.

Copy link
Copy Markdown
Member

@Wyverald Wyverald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, just nits!

FileSystemUtils.moveFile(markerUnderExternal, tMarker);
// 3. Move the external repo to vendor dir. It's fine if this step fails or is interrupted, because the marker
// file under external is gone anyway.
FileSystemUtils.moveTreesBelow(repoUnderExternal, repoUnderVendor);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

this is quite clever! but what will version-control systems do with this special symlink? Usually people put bazel-* symlinks in the workspace root in .gitignore, so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Either way, some sort of symlink rewriting will need to happen, and we'll probably need to do something similar for the true repo cache.

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Jun 11, 2024
meteorcloudy added a commit to meteorcloudy/bazel that referenced this pull request Jun 18, 2024
This drastically improves the speed of vendoring external repositories.

Related: bazelbuild#19563

Closes bazelbuild#22668.

PiperOrigin-RevId: 642338030
Change-Id: Idcba16c491711cf8fa6637d1e9c42cfc65e87599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants