Skip to content

Optimise repository loading#3752

Merged
AltGr merged 5 commits intoocaml:masterfrom
OCamlPro:fast-repos
Mar 12, 2019
Merged

Optimise repository loading#3752
AltGr merged 5 commits intoocaml:masterfrom
OCamlPro:fast-repos

Conversation

@AltGr
Copy link
Copy Markdown
Member

@AltGr AltGr commented Feb 13, 2019

Repositories stored in ~/.opam/repo contain thousand of tiny files
and directories, and can take a huge amount of time to load on HDDs,
networked or old filesystems. This is not visible in normal use
because there is a marshalled cache, but can cause opam update to be
extremely slow (it now needs to compute a diff on the files, which
requires re-loading the repository drom disk), or big lags when
changing your opam version (and the full file tree is not yet in the
OS memory cache).

The solution proposed here is extremely pragmatic, yet quite
efficient: we store the repository contents as .tar.gz files in
~/.opam/repo instead. Rather than resorting to a complex in-memory
structure, we just untar them to /tmp when they need to be read, and
re-tar them after modification (opam update, or format upgrade
only). Then we let the OS disk cache do the job: in normal operation,
the tree never needs to be flushed to disk, and loading the .tar.gz
is orders of magnitude faster than loading the individual files.

Note that this is done even for rsync or git repositories, which
is not particularly clean, but works. In the case of git, it would
be possible to just store a bare repository, and use git to extract
the individual files (this could even be done explicitely, directly to
memory, see how Camelus performs). But it does not seem worth the
additional implementation cost at the moment.

@AltGr
Copy link
Copy Markdown
Member Author

AltGr commented Feb 13, 2019

Broken for packages which use extra-files at the moment

@AltGr
Copy link
Copy Markdown
Member Author

AltGr commented Feb 13, 2019

Note that, on existing opam installs, this will work transparently and tar-gzip the repo dirs on the next opam update

@AltGr AltGr force-pushed the fast-repos branch 2 times, most recently from cb2c91e to 278bd72 Compare February 25, 2019 10:06
@rjbou
Copy link
Copy Markdown
Collaborator

rjbou commented Feb 28, 2019

LGTM! thanks!
Related to #3721

@rjbou rjbou added this to the 2.1.0 milestone Feb 28, 2019
AltGr added 5 commits March 6, 2019 10:42
Makes things *much* faster on older HDDs by keeping local repositories
as .tar.gz instead of thousands of scattered files. The implementation
isn't very nice at the moment though.
keep a closer look on the extracted dir lifespan, they are generally not
needed so only expand/keep it when required.
Allowing for loading extra files or raw opam files whenever needed.
Cleanup is done when releasing the repository state, or on exit.
We need to be more careful since they now require a finaliser to clean
up their associated temporary directory.
@AltGr AltGr merged commit 03d6870 into ocaml:master Mar 12, 2019
@dra27 dra27 mentioned this pull request Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants