Skip to content

opam update: load only changed opam files #6614

Merged
kit-ty-kate merged 3 commits intoocaml:masterfrom
arozovyk:update_read_opams_incremental
Oct 3, 2025
Merged

opam update: load only changed opam files #6614
kit-ty-kate merged 3 commits intoocaml:masterfrom
arozovyk:update_read_opams_incremental

Conversation

@arozovyk
Copy link
Copy Markdown
Collaborator

@arozovyk arozovyk commented Jul 22, 2025

Queued on:

This PR leverages the fact that we already compute a diff/patch during update to determine exactly which files have changed and it what way. Instead of reloading the entire repository we:

  • Propagate Patch.operation info from OpamFilename.patch
  • Add incremental loading via OpamRepositoryState.load_opams_incremental
  • Maintain existing opam files that haven't changed, avoiding unnecessary I/O

On Windows the OpamRepositoryState.load_opams_from_dir in OpamUpdate.repository currently takes ~10s. This PR brings it down to ~0.01s (the time it takes to load the opam files of a usual repository change).
On unix, it is a bit less dramatic, but still going from ~3.5s to 1ms

Current benchmarks ( added in #6681 ) :
update_benchmarks
So this is naturally very noticable for opam update processing small diffs.

closes #5824

@kit-ty-kate kit-ty-kate added this to the 2.5.0~alpha1 milestone Jul 22, 2025
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch 3 times, most recently from c0d8002 to ba9b12a Compare July 23, 2025 15:04
@arozovyk arozovyk force-pushed the update_read_opams_incremental branch 2 times, most recently from 2cb1720 to 3136926 Compare July 30, 2025 13:12
@arozovyk
Copy link
Copy Markdown
Collaborator Author

Updated simplfying a bunch of things.
We can't avoid parsing for diffs in case of Http and local since we have to first preprocess it, so I ended up reusing the result of Patch.parse from OpamSystem.patch in general for all backends.

@kit-ty-kate
Copy link
Copy Markdown
Member

We can't avoid parsing for diffs in case of Http and local since we have to first preprocess it

I'm not sure to follow, pre-processing isn't done for http/local (per apply_repo_update)

@arozovyk
Copy link
Copy Markdown
Collaborator Author

pre-processing isn't done for http/local

You're right, my bad. So we can reuse the diffs avoiding Patch.parse for http/local see f0a731c. Not sure it's worth it though, we propagte the full diffs (we need them up until applying in OpamSystem.internal_patch) that, as you said, can be quite large. So memory overhead for to-be-determined speed gain?

Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of review. I haven't looked at everything yet but i think that's enough for now

| Some opams -> opams
| None -> OpamPackage.Map.empty
in
let process_file acc file ~is_removal =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's done elsewhere or something, but this function seems to assume every files that can change are opam files. What about changes from their files directory or other non-opam file related changes?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function purpose is to load opam files to update their definitions in rt, all other files are ignored as it was already the case with original load_opams_from_dir function. Any other extra files are read by OpamFileTools.read_repo_opam and stored through metadata

@arozovyk arozovyk force-pushed the update_read_opams_incremental branch 3 times, most recently from 6b19d06 to 74dbeb6 Compare September 2, 2025 13:25
@arozovyk arozovyk requested a review from kit-ty-kate September 2, 2025 13:26
(slog OpamFilename.to_string) f;
let preprocess =
match repo.repo_url.OpamUrl.backend with
| `http | `rsync -> false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we should start running the the preprocessing for local and http repositories. Could you bring the ?preprocess argument back?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this PR depends on the completion of #6599 first

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not, this is naturally done by the fact that OpamSystem.parse_patch_file is only called in OpamVCS (in OpamAction.prepare_package_build as well, but that's a different story), for local and http repositories we reuse the diffs directly.

OpamProcess.Job.with_text text @@
OpamRepository.update r repo_root @@+ fun has_changes ->
let has_changes = if redirect then `Changes else has_changes in
let has_changes = if redirect then `Changes [] else has_changes in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems unnecessary. @rjbou do you know why this line was added in #5146 ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's needed to update the repo config on redirection even though there are no actual changes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, I need to investigate for that

@arozovyk arozovyk requested a review from kit-ty-kate September 5, 2025 16:23
Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall, however it's a hard PR to review without its final clean commit structure so i'd rather wait for whenever this is finished.

I also think this PR should be queued on a PR adding a benchmark test for opam update as we currently don't have any. It would ideal to have a test for three kinds of cases:

with both local path and local git repositories

Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, i've fixed some of the shell files, moved the last commit adding the opam-rt test up the stack, and added a "tmp" commit to show that doing an operation on an extra file does not read the opam file more than once and read it even if the opam file itself isn't modified (for debug reason that we went over together)

I left that commit as "tmp" just because in my opinion the 4 first commits can be squashed into one, i think having several commits is already redundant with the different sections that are already in the file itself, so if you're alright with that, feel free to squash it and rename the commit as you think it should be

@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from fab1644 to a1d9f53 Compare October 1, 2025 09:38
@kit-ty-kate kit-ty-kate force-pushed the update_read_opams_incremental branch 2 times, most recently from d6b57b7 to 7dfc112 Compare October 2, 2025 03:48
@kit-ty-kate kit-ty-kate added the PR: QUEUED Pending pull request, waiting for other work to be merged or closed label Oct 2, 2025
Copy link
Copy Markdown
Member

@kit-ty-kate kit-ty-kate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to go. #6715 should be merged first as it fixes the CI failure on Windows

@arozovyk arozovyk force-pushed the update_read_opams_incremental branch from 7dfc112 to 324f545 Compare October 2, 2025 14:07
@kit-ty-kate kit-ty-kate force-pushed the update_read_opams_incremental branch from f26d730 to 6038872 Compare October 2, 2025 18:31
Co-authored-by: Raja Boujbel <raja.boujbel@ocamlpro.com>
Co-authored-by: Kate <kit-ty-kate@exn.st>
@rjbou rjbou removed the PR: QUEUED Pending pull request, waiting for other work to be merged or closed label Oct 3, 2025
@rjbou rjbou force-pushed the update_read_opams_incremental branch from 6038872 to a707c01 Compare October 3, 2025 18:15
@rjbou
Copy link
Copy Markdown
Collaborator

rjbou commented Oct 3, 2025

Updated & rebased

Copy link
Copy Markdown
Collaborator

@rjbou rjbou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go for me!

@kit-ty-kate kit-ty-kate force-pushed the update_read_opams_incremental branch from a707c01 to 6159ab2 Compare October 3, 2025 19:39
@kit-ty-kate
Copy link
Copy Markdown
Member

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opam update completely reloads the repository

3 participants