Skip to content

manifest install: Cache compiler information for change detection#362

Open
autoantwort wants to merge 42 commits into
microsoft:mainfrom
autoantwort:check-stamp
Open

manifest install: Cache compiler information for change detection#362
autoantwort wants to merge 42 commits into
microsoft:mainfrom
autoantwort:check-stamp

Conversation

@autoantwort

@autoantwort autoantwort commented Feb 17, 2022

Copy link
Copy Markdown
Contributor

Fixes microsoft/vcpkg#20549 (comment), microsoft/vcpkg#14025 and microsoft/vcpkg#29857.

The noop case now needs 110 ms instead of 2.2 seconds on my arm mac.
And from 2.0 sec to 18 ms when build as release

Comment thread include/vcpkg/vcpkgpaths.h Outdated
Comment thread src/vcpkg/install.cpp Outdated
Comment thread src/vcpkg/install.cpp Outdated
Comment thread src/vcpkg/install.cpp Outdated
@ras0219-msft

ras0219-msft commented Feb 23, 2022

Copy link
Copy Markdown
Collaborator

A few independent thoughts:

MSBuild parity

I think we should consider changing our CMake integration to use the same algorithm as MSBuild. Currently, MSBuild considers:

  1. Timestamp check on vcpkg.json
  2. Timestamp check on vcpkg-configuration.json
  3. Triplet name
  4. Host triplet name

We should add "full vcpkg command line" to that list in both cases. My concern would be only that this will not pick up changes to overlay port contents or to the triplet files.

Locking

Right now, vcpkg generally performs a bunch of file locking in manifest mode that will cause this "lightweight" check to be globally serialized. For CMake usage this is less important since it's often called only once for the whole repo, but for more complex scenarios and other buildsystems it could be problematic.

Avoid hashing file contents

If the intent of this is to be a lightweight check, we should avoid hashing file contents altogether. Instead, I would suggest:

  1. For every file under consideration, compare their modified timestamp to the stamp's modified timestamp.
  2. If the timestamps disagree (a file is newer than the stamp), report a mismatch.
  3. For every file under consideration, hash the path. This ensures deleting files will be tracked correctly. (In practice I think this is easiest by hashing the list of paths)
  4. Read the stamp contents and compare against the paths hash. If not matching, report a mismatch.

If there's a mismatch, perform the install. Once the install completes, write the new paths hash to the stamp (which updates the timestamp).

We could potentially go even further here by instead having vcpkg actually report out the set of files considered as part of the build, a la /showIncludes/-M -MF. This would enable buildsystems to perform most of this check on their own without spawning a vcpkg process at all (very good for MSBuild!). One caveat with this would be correctly handling our "consider all files in this folder" cases.

@autoantwort

Copy link
Copy Markdown
Contributor Author

My concern would be only that this will not pick up changes to overlay port contents or to the triplet files.

Yeah I think we should also check the timestamp of the triplet files and the overlay port files.

Right now, vcpkg generally performs a bunch of file locking in manifest mode that will cause this "lightweight" check to be globally serialized.

For msbuild this is a good thing if we still use the builtin msbuild change detection. If you for example have 10 projects in your solution and a vcpkg related change is detected, vcpkg runs 10 times in parallel. If the check is done after the lock, the first vcpkg instance will install the dependencies and then all the others will detect that there are no changes and will skip installation. If we would do the check before the locking, every vcpkg instance will detect a change and tries to install everything.
If we don't want to use the builtin msbuild check, we optimally need two checks. One before the file locking and one after. But imho for now it is enough to do the check after the locking.

For every file under consideration, hash the path. This ensures deleting files will be tracked correctly. (In practice I think this is easiest by hashing the list of paths)

We can simply check the modified date of the folders. The modified date of a folder is updated if a file is deleted, added or renamed.

Otherwise sounds good :)

@autoantwort

Copy link
Copy Markdown
Contributor Author

@ras0219-msft Are you happy with the new changes? :)

@ras0219-msft ras0219-msft self-requested a review May 10, 2022 19:58
# Conflicts:
#	include/vcpkg/commands.setinstalled.h
#	locales/messages.en.json
#	locales/messages.json
#	src/vcpkg/commands.setinstalled.cpp
#	src/vcpkg/install.cpp
# Conflicts:
#	locales/messages.en.json
#	locales/messages.json
# Conflicts:
#	src/vcpkg/install.cpp
Comment thread include/vcpkg/installedpaths.h Outdated
@zig-for

zig-for commented Jul 6, 2022

Copy link
Copy Markdown

It would be really neat if this were merged, this is really slowing down my builds.

# Conflicts:
#	include/vcpkg/commands.setinstalled.h
#	locales/messages.en.json
#	locales/messages.json
#	src/vcpkg/base/strings.cpp
#	src/vcpkg/commands.setinstalled.cpp
#	src/vcpkg/install.cpp
# Conflicts:
#	include/vcpkg/commands.setinstalled.h
#	locales/messages.en.json
#	locales/messages.json
#	src/vcpkg/commands.porthistory.cpp
#	src/vcpkg/commands.setinstalled.cpp
#	src/vcpkg/install.cpp
@autoantwort

Copy link
Copy Markdown
Contributor Author

@BillyONeal Any news? :)

@BillyONeal

Copy link
Copy Markdown
Member

I think for some reason I had in my mind that someone else specific was to review this one. Sorry, I don't have updated news for you.

@autoantwort

Copy link
Copy Markdown
Contributor Author

Thank you for the update :) Maybe you can poke the person.
This change would mean a big improvement in usability for cmake users (100 times faster configure runs).

@BillyONeal

Copy link
Copy Markdown
Member

Thank you for the update :) Maybe you can poke the person.

To clarify, I had in my mind that someone was supposed to look at this but that doesn't appear to be the case

@PatrickKa

Copy link
Copy Markdown

Any update/news on this?

@BillyONeal

BillyONeal commented Sep 25, 2024

Copy link
Copy Markdown
Member

Any update/news on this?

Given that the discussion in #362 (comment) isn't resolved I don't see progress happening here.

I did some more testing and it looks like the majority of the time is not SHA1-ing the compiler binaries, but is instead CMake detecting which compiler to use. Note that I intentionally used clang in this example, which is a 116MB binary, rather than cl.exe which is like 3MB:

PS D:\vcpkg> Measure-Command { cmake.exe -DCURRENT_PORT_DIR=D:\vcpkg\scripts\detect_compiler -DCURRENT_BUILDTREES_DIR=D:\vcpkg\buildtrees -DCURRENT_PACKAGES_DIR=D:\vcpkg\packages\detect_compiler_x64-windows -D_HOST_TRIPLET=x64-windows -DCMD=BUILD -DDOWNLOADS=D:\vcpkg-downloads -DTARGET_TRIPLET=x64-windows -DTARGET_TRIPLET_FILE=D:\vcpkg\triplets\x64-windows.cmake -DVCPKG_ROOT_DIR=D:\vcpkg -DPACKAGES_DIR=D:\vcpkg\packages -DBUILDTREES_DIR=D:\vcpkg\buildtrees -D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed -DVCPKG_MANIFEST_INSTALL=OFF -DVCPKG_BASE_VERSION=2999-12-12 -DDO_HASH=OFF -P .\scripts\ports.cmake }
[1/1] "C:/Program Files/CMake/bin/cmake.exe" -E chdir ".." "C:/Program Files/CMake/bin/cmake.exe" "D:/vcpkg/scripts/detect_compiler" "-G" "Ninja" "-DCMAKE_BUILD_TYPE=Release" "-DCMAKE_INSTALL_PREFIX=D:\vcpkg\packages\detect_compiler_x64-windows" "-DDO_HASH=OFF" "-DCMAKE_MAKE_PROGRAM=D:/vcpkg-downloads/tools/ninja/1.10.2-windows/ninja.exe" "-DBUILD_SHARED_LIBS=ON" "-DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=D:/vcpkg/scripts/toolchains/windows.cmake" "-DVCPKG_TARGET_TRIPLET=x64-windows" "-DVCPKG_SET_CHARSET_FLAG=ON" "-DVCPKG_PLATFORM_TOOLSET=" "-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_SYSTEM_PACKAGE_REGISTRY=ON" "-DCMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_SKIP=TRUE" "-DCMAKE_VERBOSE_MAKEFILE=ON" "-DVCPKG_APPLOCAL_DEPS=OFF" "-DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake" "-DCMAKE_ERROR_ON_ABSOLUTE_INSTALL_DESTINATION=ON" "-DVCPKG_CXX_FLAGS=" "-DVCPKG_CXX_FLAGS_RELEASE=" "-DVCPKG_CXX_FLAGS_DEBUG=" "-DVCPKG_C_FLAGS=" "-DVCPKG_C_FLAGS_RELEASE=" "-DVCPKG_C_FLAGS_DEBUG=" "-DVCPKG_CRT_LINKAGE=dynamic" "-DVCPKG_LINKER_FLAGS=" "-DVCPKG_LINKER_FLAGS_RELEASE=" "-DVCPKG_LINKER_FLAGS_DEBUG=" "-DVCPKG_TARGET_ARCHITECTURE=x64" "-DCMAKE_INSTALL_LIBDIR:STRING=lib" "-DCMAKE_INSTALL_BINDIR:STRING=bin" "-D_VCPKG_ROOT_DIR=D:\vcpkg" "-DZ_VCPKG_ROOT_DIR=D:\vcpkg" "-D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed" "-DVCPKG_MANIFEST_INSTALL=OFF"
-- The C compiler identification is Clang 18.1.6 with GNU-like command-line
-- The CXX compiler identification is Clang 18.1.6 with GNU-like command-line
#COMPILER_HASH#
#COMPILER_C_HASH#
#COMPILER_C_VERSION#18.1.6
#COMPILER_C_ID#Clang
#COMPILER_C_PATH#C:/Program Files/LLVM/bin/clang.exe
#COMPILER_CXX_HASH#
#COMPILER_CXX_VERSION#18.1.6
#COMPILER_CXX_ID#Clang
#COMPILER_CXX_PATH#C:/Program Files/LLVM/bin/clang++.exe
-- Configuring done (1.2s)
-- Generating done (0.0s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_SHARED_LIBS
    CMAKE_INSTALL_BINDIR
    CMAKE_INSTALL_LIBDIR
    _VCPKG_ROOT_DIR


-- Build files have been written to: D:/vcpkg/buildtrees/x64-windows-rel


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 332
Ticks             : 13323311
TotalDays         : 1.54204988425926E-05
TotalHours        : 0.000370091972222222
TotalMinutes      : 0.0222055183333333
TotalSeconds      : 1.3323311
TotalMilliseconds : 1332.3311

PS D:\vcpkg> Measure-Command { cmake.exe -DCURRENT_PORT_DIR=D:\vcpkg\scripts\detect_compiler -DCURRENT_BUILDTREES_DIR=D:\vcpkg\buildtrees -DCURRENT_PACKAGES_DIR=D:\vcpkg\packages\detect_compiler_x64-windows -D_HOST_TRIPLET=x64-windows -DCMD=BUILD -DDOWNLOADS=D:\vcpkg-downloads -DTARGET_TRIPLET=x64-windows -DTARGET_TRIPLET_FILE=D:\vcpkg\triplets\x64-windows.cmake -DVCPKG_ROOT_DIR=D:\vcpkg -DPACKAGES_DIR=D:\vcpkg\packages -DBUILDTREES_DIR=D:\vcpkg\buildtrees -D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed -DVCPKG_MANIFEST_INSTALL=OFF -DVCPKG_BASE_VERSION=2999-12-12 -DDO_HASH=ON -P .\scripts\ports.cmake }
[1/1] "C:/Program Files/CMake/bin/cmake.exe" -E chdir ".." "C:/Program Files/CMake/bin/cmake.exe" "D:/vcpkg/scripts/detect_compiler" "-G" "Ninja" "-DCMAKE_BUILD_TYPE=Release" "-DCMAKE_INSTALL_PREFIX=D:\vcpkg\packages\detect_compiler_x64-windows" "-DDO_HASH=ON" "-DCMAKE_MAKE_PROGRAM=D:/vcpkg-downloads/tools/ninja/1.10.2-windows/ninja.exe" "-DBUILD_SHARED_LIBS=ON" "-DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=D:/vcpkg/scripts/toolchains/windows.cmake" "-DVCPKG_TARGET_TRIPLET=x64-windows" "-DVCPKG_SET_CHARSET_FLAG=ON" "-DVCPKG_PLATFORM_TOOLSET=" "-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON" "-DCMAKE_FIND_PACKAGE_NO_SYSTEM_PACKAGE_REGISTRY=ON" "-DCMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_SKIP=TRUE" "-DCMAKE_VERBOSE_MAKEFILE=ON" "-DVCPKG_APPLOCAL_DEPS=OFF" "-DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake" "-DCMAKE_ERROR_ON_ABSOLUTE_INSTALL_DESTINATION=ON" "-DVCPKG_CXX_FLAGS=" "-DVCPKG_CXX_FLAGS_RELEASE=" "-DVCPKG_CXX_FLAGS_DEBUG=" "-DVCPKG_C_FLAGS=" "-DVCPKG_C_FLAGS_RELEASE=" "-DVCPKG_C_FLAGS_DEBUG=" "-DVCPKG_CRT_LINKAGE=dynamic" "-DVCPKG_LINKER_FLAGS=" "-DVCPKG_LINKER_FLAGS_RELEASE=" "-DVCPKG_LINKER_FLAGS_DEBUG=" "-DVCPKG_TARGET_ARCHITECTURE=x64" "-DCMAKE_INSTALL_LIBDIR:STRING=lib" "-DCMAKE_INSTALL_BINDIR:STRING=bin" "-D_VCPKG_ROOT_DIR=D:\vcpkg" "-DZ_VCPKG_ROOT_DIR=D:\vcpkg" "-D_VCPKG_INSTALLED_DIR=D:\vcpkg\installed" "-DVCPKG_MANIFEST_INSTALL=OFF"
-- The C compiler identification is Clang 18.1.6 with GNU-like command-line
-- The CXX compiler identification is Clang 18.1.6 with GNU-like command-line
#COMPILER_HASH#0f4a11f3a8b3c6eee0390888b2ca585daca373eb
#COMPILER_C_HASH#cf5d8e7225d21ffe029bdf032edaaf8a66ea0594
#COMPILER_C_VERSION#18.1.6
#COMPILER_C_ID#Clang
#COMPILER_C_PATH#C:/Program Files/LLVM/bin/clang.exe
#COMPILER_CXX_HASH#cf5d8e7225d21ffe029bdf032edaaf8a66ea0594
#COMPILER_CXX_VERSION#18.1.6
#COMPILER_CXX_ID#Clang
#COMPILER_CXX_PATH#C:/Program Files/LLVM/bin/clang++.exe
-- Configuring done (1.6s)
-- Generating done (0.0s)
CMake Warning:
  Manually-specified variables were not used by the project:

    BUILD_SHARED_LIBS
    CMAKE_INSTALL_BINDIR
    CMAKE_INSTALL_LIBDIR
    _VCPKG_ROOT_DIR


-- Build files have been written to: D:/vcpkg/buildtrees/x64-windows-rel


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 1
Milliseconds      : 753
Ticks             : 17538336
TotalDays         : 2.0299E-05
TotalHours        : 0.000487176
TotalMinutes      : 0.02923056
TotalSeconds      : 1.7538336
TotalMilliseconds : 1753.8336

PS D:\vcpkg>

So the hashing is approximately (1.33-1.75)/1.33 ~= -31% of the time. Not nothing, but not the earthshattering difference in the original PR description.

Given that @ras0219-msft does not want to accept any change that skips the CMake compiler detection over the potential to damage binary caches, I am unsure what changes could be made to get this PR into a state we could merge.

@BillyONeal

Copy link
Copy Markdown
Member

To clarify, the problematic scenario is:

  1. Run vcpkg with this setting turned on.
  2. Change something about the environment that causes a different compiler to be selected.
  3. Run vcpkg with this setting turned on again. It sees that the compiler that was detected before has not changed, so it skips running the compiler detection step and uses the old compiler's SHAs.
  4. Binary caching uploads bits built with a different compiler as if they were built with the old compiler.
  5. Future binary cache hits with the old compiler are now poisoned by the new compiler.

so we would have to have high confidence that anything that could possibly change which compiler CMake could pick invalidated this detection, and it seems unlikely to me that we will ever reach that level of confidence given how lots of how CMake does this is a black box.

@autoantwort

autoantwort commented Sep 25, 2024

Copy link
Copy Markdown
Contributor Author

So the hashing is approximately (1.33-1.75)/1.33 ~= -31% of the time. Not nothing, but not the earthshattering difference in the original PR description.
I have tested it on my mac:
Baseline noop: 1.4s
Baseline noop without hashing: 0.5s
This PR noop: ~0.1s

Given that @ras0219-msft does not want to accept any change that skips the CMake compiler detection over the potential to damage binary caches, I am unsure what changes could be made to get this PR into a state we could merge.

I already sketched a solution but got no real response. If this is the only thing blocking this PR I will start implementing that :)

@BillyONeal

Copy link
Copy Markdown
Member

I already sketched a solution but got no real response.

Sorry I missed that, can you point specifically?

@autoantwort

Copy link
Copy Markdown
Contributor Author

#362 (comment)
Especially:

So a solution could be to only use --cache-compiler-information to detect changes. If there are changes, rerun compiler detection and install packages. This would fix point 7.

@BillyONeal

Copy link
Copy Markdown
Member

#362 (comment) Especially:

So a solution could be to only use --cache-compiler-information to detect changes. If there are changes, rerun compiler detection and install packages. This would fix point 7.

I'm not sure how such changes are going to be possible in a way that truly sells that it's safe.

@BillyONeal

Copy link
Copy Markdown
Member

If the meat of this is saying the hash algorithm is too slow, we don't need to skip the cmake detection to cache the hash. We can have a database file saying "this path was this filesize / last mod time / sha".

Would you be willing to do just this part? It would fix your Mac test and is clearly less controversial..

# Conflicts:
#	include/vcpkg/commands.build.h
#	include/vcpkg/installedpaths.h
#	src/vcpkg/commands.build.cpp
Comment on lines +254 to +271
if (used_cached_compiler_info)
{
if (!is_action_plan_fulfilled(action_plan, status_db))
{
// we have to install something, so get fresh compiler infos
for (auto& install_action : action_plan.install_actions)
{
install_action.abi_info.clear();
}
compute_all_abis(paths, action_plan, cmake_vars, {});
used_cached_compiler_info = false;
}
}
adjust_action_plan_to_status_db(action_plan, status_db);
if (used_cached_compiler_info)
{
Checks::check_exit(VCPKG_LINE_INFO, action_plan.empty());
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the cached compiler info is only used to detect if there is something to install, if something gets installed, fresh compiler info are used.
(if you update your compiler the cache is not used)

@miss-programgamer

Copy link
Copy Markdown

At the risk of annoying everyone who's posted here before: any update on this? It's been more than three years since this pull request was created...

@BillyONeal

Copy link
Copy Markdown
Member

At the risk of annoying everyone who's posted here before: any update on this? It's been more than three years since this pull request was created...

You should see the majority of the benefit after microsoft/vcpkg#42994 and #1558

@autoantwort

autoantwort commented Mar 13, 2025

Copy link
Copy Markdown
Contributor Author

You should see the majority of the benefit after

I wouldn't say that. This PR vs latest release on windows (8x faster) and mac (14x faster):
image
image

For cmake configure from 0.850 ➜ 0.126 on my mac:

Console Output

time cmake -S . -B build -GNinja -DCMAKE_TOOLCHAIN_FILE="vcpkg/scripts/buildsystems/vcpkg.cmake" 
-- Running vcpkg install
Detecting compiler hash for triplet arm64-osx...
Compiler found: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
All requested packages are currently installed.
Total install time: 66.5 us
The package fmt provides CMake targets:

    find_package(fmt CONFIG REQUIRED)
    target_link_libraries(main PRIVATE fmt::fmt)

    # Or use the header-only version
    find_package(fmt CONFIG REQUIRED)
    target_link_libraries(main PRIVATE fmt::fmt-header-only)

-- Running vcpkg install - done
-- Configuring done (0.8s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/leanderSchulten/git_projekte/cmake-test/build
cmake -S . -B build -GNinja   0.44s user 0.30s system 87% cpu 0.850 totalcp $(which vcpkg) vcpkgtime cmake -S . -B build -GNinja -DCMAKE_TOOLCHAIN_FILE="vcpkg/scripts/buildsystems/vcpkg.cmake" 
-- Running vcpkg install
All requested packages are currently installed.
Total install time: 63.4 us
The package fmt provides CMake targets:

    find_package(fmt CONFIG REQUIRED)
    target_link_libraries(main PRIVATE fmt::fmt)

    # Or use the header-only version
    find_package(fmt CONFIG REQUIRED)
    target_link_libraries(main PRIVATE fmt::fmt-header-only)

-- Running vcpkg install - done
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/leanderSchulten/git_projekte/cmake-test/build
cmake -S . -B build -GNinja   0.06s user 0.04s system 79% cpu 0.126 total

@ras0219-msft

ras0219-msft commented Mar 14, 2025

Copy link
Copy Markdown
Collaborator

In absolute terms: saves 2s on Windows and ~0.5s on MacOS, which is the majority of dry-run operations.

@autoantwort

Copy link
Copy Markdown
Contributor Author

Is this blocked because of a technical issue or by your sparse review time?

@autoantwort

Copy link
Copy Markdown
Contributor Author

Would you consider merging it when it is hidden behind a feature flag that is disabled by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants