Improve performance by disabling repo tarring#5015
Conversation
|
I just tried #4586 (comment) on this PR and I get the exact same behaviour: |
|
Actually, the tar call appears a bit after the first archives retrivals if that helps. |
kit-ty-kate
left a comment
There was a problem hiding this comment.
As mentioned in #4586 (comment) I think #3752 should be scraped entirely instead. An opt-in variable doesn’t make much sense in my opinion.
|
From dev meeting: not arrived a name yet, but for 2.2 let's consider flipping this around the other way and disabling the optimisation by default and having the variable to opt in. Some benchmarking comparing the two on NVMe/SSD (i.e. "modern") systems before committing this would be handy. |
b635713 to
8af59ab
Compare
|
Updated! |
| 009e00fa | ||
| ### OPAMYES=1 | ||
| ### tar xzf ${OPAMROOT}/repo/default.tar.gz | ||
| ### cp -R ${OPAMROOT}/repo/default . |
There was a problem hiding this comment.
Should we untie this test from opam repo? @AltGr
kit-ty-kate
left a comment
There was a problem hiding this comment.
So fast!!! ❤️
I did some timings and here is what i got locally on my macOS system:
For opam update default (which is a local git repository):
master(5311f9b) takes 17 seconds- This PR takes 14 secondes
- This PR (+ some manual tasks, see below) takes 6 seconds!!
In the third case I had to do some manual clean up tasks to get it to properly work and I think this is a bug in the PR:
$ ls ~/.opam/repo
alpha.tar.gz
default.tar.gz
lock
repos-config
state-205E9585.cache
$ opam update default
$ ls ~/.opam/repo
alpha.tar.gz
default.tar.gz
lock
repos-config
state-205E9585.cache
Here we can see that with this PR, opam does not remove the .tar.gz and simply use them as before (with some consistant improvements in time somehow). However as soon as I removed the .tar.gz files there, I got a really big x2 speedup and no tar.gz were reintroduced.
There must be something missing in this PR in its current state for opam to clean up the tar.gz if they already exist.
|
The test also fails on Windows: |
770c0ae to
f01cc2d
Compare
|
Rebased for CI to start |
src/state/opamRepositoryState.ml
Outdated
| repositories | ||
| else | ||
| OpamRepositoryName.Map.iter (fun _name repo -> | ||
| OpamFilename.remove (OpamRepositoryPath.tar gt.root repo.repo_name)) |
There was a problem hiding this comment.
I’m a bit surprised to see the remove being in lock instead of OpamUpdate.
I would expect only opam update to actually do anything.
I haven’t tried locally yet but unless I’ve missed something, I would expect such code to make opam crash for users calling opam install <pkg> just after having upgraded from opam 2.1 because they would have to get the content of the repository but it wouldn’t be available anywhere anymore.
But it’s just a hunch from reading the code, I’ll try locally later.
There was a problem hiding this comment.
Yep it’s making it fail:
opam@feac57c1cc94:~/opam$ opam-2.1 install ocamlfind
The following actions will be performed:
- install ocamlfind 1.9.2
<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
-> retrieved ocamlfind.1.9.2 (http://download.camlcity.org/download/findlib-1.9.2.tar.gz)
-> installed ocamlfind.1.9.2
Done.
opam@feac57c1cc94:~/opam$ ls ~/.opam/repo
beta default lock repos-config state-0287C1DB.cache
opam@feac57c1cc94:~/opam$ opam-2.1 update
This development version of opam requires an update to the layout of /home/opam/.opam from version 2.0 to version 2.1, which can't be reverted.
You may want to back it up before going further.
Continue? [Y/n] y
Format upgrade done.
<><> Updating package repositories ><><><><><><><><><><><><><><><><><><><><><><>
[beta] no changes from git+https://github.com/ocaml/ocaml-beta-repository
[default] synchronised from file:///home/opam/opam-repository
<><> Synchronising development packages <><><><><><><><><><><><><><><><><><><><>
opam@feac57c1cc94:~/opam$ ls ~/.opam/repo
beta.tar.gz default.tar.gz lock repos-config state-0287C1DB.cache
opam@feac57c1cc94:~/opam$ opam-2.1 remove ocamlfind
The following actions will be performed:
- remove ocamlfind 1.9.2
<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
-> removed ocamlfind.1.9.2
Done.
opam@feac57c1cc94:~/opam$ ls ~/.opam/repo
beta.tar.gz default.tar.gz lock repos-config state-0287C1DB.cache
opam@feac57c1cc94:~/opam$ ./opam install ocamlfind
[ERROR] No package named ocamlfind found.
opam@feac57c1cc94:~/opam$ ls ~/.opam/repo
lock repos-config state-0287C1DB.cache state-030998F6.cache
I believe it’s also making the opam-rt testsuite fail
There was a problem hiding this comment.
Indeed, It is the wrong place
|
I had a shot at fixing it: 1907fee I pushed it in your branch, feel free to remove it if you prefer another solution. as well as its corresonding steps 11 and 13: |
tests/reftests/repository.test
Outdated
| + /usr/bin/tar "xfz" "${BASEDIR}/OPAM/repo/tarred.tar.gz" "-C" "${OPAMTMP}" | ||
| + /usr/bin/diff "-ruaN" "tarred" "tarred.new" (CWD=${OPAMTMP}) | ||
| + /usr/bin/tar "xfz" "${BASEDIR}/OPAM/repo/tarred.tar.gz" "-C" "${BASEDIR}/OPAM/repo/tarred" |
There was a problem hiding this comment.
It’s a bit wasteful to extract twice. Maybe we can do better
There was a problem hiding this comment.
To me it looks good to merge as-is (once CI is green of course)
I did some more tests in docker. Using this dockerfile:
FROM ocaml/opam:debian-11-ocaml-4.14
RUN git clone https://github.com/rjbou/opam -b disable-repo-optim
WORKDIR opam
RUN opam exec -- ./configure && opam exec -- make
RUN opam-2.1 update
RUN sudo apt install -yy time
RUN time opam-2.1 update
RUN ./opam update
RUN time ./opam update
- With opam-2.1,
opam updatetakes: 1m16s - With this PR,
opam updatetakes: 45s
I’ve also tested on bare metal on a linux machine with /tmp mounted as tmpfs (15G) and I’m also getting a x2 performance improvement:
- With opam 2.1,
opam updatetakes: 16s - With this PR,
opam updatetakes: 8s
(all the machines tested here have SSDs/NVMe)
🎉
85f4f6a to
e7f7b0d
Compare
|
Fly-by comment: Cygwin's |
60f3049 to
031930d
Compare
|
All green! Thanks a lot @dra27! |
…ent variable `OPAMREPOSITORYTARRING'
…`OPAMREPOSITORYTARRING' is given This can be much faster on filesystems that don't cope well with scanning large trees but have good caching in /tmp. However this is slower in the general case Co-authored-by: Raja Boujbel <raja.boujbel@ocamlpro.com>
Co-authored-by: Kate <kit.ty.kate@disroot.org>
c946ae2 to
8e42855
Compare
|
Thanks all! |
that need a new name, ftm it is hideously namedOPAMNOREPOTARRING/cc @AltGr
fix #4586
edit:
By default disabled, environment variable
OPAMREPOSITORYTARRINGpermit to enable it.