opam update: load only changed opam files #6614
Conversation
c0d8002 to
ba9b12a
Compare
2cb1720 to
3136926
Compare
|
Updated simplfying a bunch of things. |
I'm not sure to follow, pre-processing isn't done for http/local (per apply_repo_update) |
You're right, my bad. So we can reuse the diffs avoiding |
src/state/opamRepositoryState.ml
Outdated
| | Some opams -> opams | ||
| | None -> OpamPackage.Map.empty | ||
| in | ||
| let process_file acc file ~is_removal = |
There was a problem hiding this comment.
I'm not sure if it's done elsewhere or something, but this function seems to assume every files that can change are opam files. What about changes from their files directory or other non-opam file related changes?
There was a problem hiding this comment.
The function purpose is to load opam files to update their definitions in rt, all other files are ignored as it was already the case with original load_opams_from_dir function. Any other extra files are read by OpamFileTools.read_repo_opam and stored through metadata
6b19d06 to
74dbeb6
Compare
| (slog OpamFilename.to_string) f; | ||
| let preprocess = | ||
| match repo.repo_url.OpamUrl.backend with | ||
| | `http | `rsync -> false |
There was a problem hiding this comment.
i don't think we should start running the the preprocessing for local and http repositories. Could you bring the ?preprocess argument back?
There was a problem hiding this comment.
Maybe this PR depends on the completion of #6599 first
There was a problem hiding this comment.
We are not, this is naturally done by the fact that OpamSystem.parse_patch_file is only called in OpamVCS (in OpamAction.prepare_package_build as well, but that's a different story), for local and http repositories we reuse the diffs directly.
| OpamProcess.Job.with_text text @@ | ||
| OpamRepository.update r repo_root @@+ fun has_changes -> | ||
| let has_changes = if redirect then `Changes else has_changes in | ||
| let has_changes = if redirect then `Changes [] else has_changes in |
There was a problem hiding this comment.
I think it's needed to update the repo config on redirection even though there are no actual changes.
There was a problem hiding this comment.
I don't remember, I need to investigate for that
kit-ty-kate
left a comment
There was a problem hiding this comment.
lgtm overall, however it's a hard PR to review without its final clean commit structure so i'd rather wait for whenever this is finished.
I also think this PR should be queued on a PR adding a benchmark test for opam update as we currently don't have any. It would ideal to have a test for three kinds of cases:
- no diff
- small diff (just a few files)
- large diff (e.g. from ocaml/opam-repository#2025-01-before-archiving-phase1 to now)
with both local path and local git repositories
kit-ty-kate
left a comment
There was a problem hiding this comment.
lgtm, i've fixed some of the shell files, moved the last commit adding the opam-rt test up the stack, and added a "tmp" commit to show that doing an operation on an extra file does not read the opam file more than once and read it even if the opam file itself isn't modified (for debug reason that we went over together)
I left that commit as "tmp" just because in my opinion the 4 first commits can be squashed into one, i think having several commits is already redundant with the different sections that are already in the file itself, so if you're alright with that, feel free to squash it and rename the commit as you think it should be
fab1644 to
a1d9f53
Compare
d6b57b7 to
7dfc112
Compare
kit-ty-kate
left a comment
There was a problem hiding this comment.
All good to go. #6715 should be merged first as it fixes the CI failure on Windows
7dfc112 to
324f545
Compare
f26d730 to
6038872
Compare
Co-authored-by: Raja Boujbel <raja.boujbel@ocamlpro.com> Co-authored-by: Kate <kit-ty-kate@exn.st>
6038872 to
a707c01
Compare
|
Updated & rebased |
Process only changed files during opam update
a707c01 to
6159ab2
Compare
|
Thanks a lot! |
Queued on:
This PR leverages the fact that we already compute a diff/patch during update to determine exactly which files have changed and it what way. Instead of reloading the entire repository we:
Patch.operationinfo fromOpamFilename.patchOpamRepositoryState.load_opams_incrementalOn Windows the
OpamRepositoryState.load_opams_from_dirinOpamUpdate.repositorycurrently takes ~10s. This PR brings it down to ~0.01s (the time it takes to load the opam files of a usual repository change).On unix, it is a bit less dramatic, but still going from ~3.5s to 1ms
Current benchmarks ( added in #6681 ) :

So this is naturally very noticable for
opam updateprocessing small diffs.closes #5824