Conversation
|
So I cherry picked these commit to the 4.14 branch to have meaningful benchmarks. As usual I used the Irmin codebase as example.
This seems to indicate that there is no perceptible build time increase by performing the indexing. Should I introduce a flag to enable indexation ? And maybe another flag that don't store the typedtree in the cmt files ? |
0d55900 to
5143818
Compare
|
@lpw25 I was able to fix the sharing issue (there was a very silly mistake in my previous attempt). I updated the results in the previous comment. On the whole Irmin codebase (65K lines in ml files), the indexed data adds 30M (from 330M to 360M), that is a 10% increase. But the index is incomplete right now, the only nodes I consider right now are:
Do you have in mind what other nodes would be important to consider for this first version ? |
|
I guess the potential candidates are every occurrence of |
Right, I am not sure were we should put the cursor for the first iteration, I will make a list and start filling cases. What do you think about making the indexation opt-in with a flag ? |
|
I think it's worth avoiding a flag if possible. Before add one I think it would be worth investigating the sources of the size increase -- in case there are simple things that can be done to improve things. It would also be worth benchmarking it on a different project -- my understanding is that Irmin has particularly large shapes for some reason. |
|
I would expect the size increase to be directly related to the shapes we store. This does mean that Irmin should be more impacted than other projects. Since I have switches with all Irmin dependencies I can simply check the sizes of the cmts in these switches:
It looks quite good ! The switch contains 207 packages, including Irmin itself. Full package listalcotest alcotest-lwt angstrom arp asn1-combinators astring awa awa-mirage base base-bigarray base-bytes base-threads base-unix base64 bentov bheap bigarray-compat bigstringaf bisect_ppx bos ca-certs ca-certs-nss carton carton-git carton-lwt cf cf-lwt checkseum cmdliner cohttp cohttp-lwt cohttp-lwt-unix conduit conduit-lwt conduit-lwt-unix conf-gmp conf-gmp-powm-sec conf-gnuplot conf-libffi conf-pkg-config cppo crunch csexp cstruct cstruct-lwt cstruct-sexp cstruct-unix ctypes ctypes-foreign decompress digestif dispatch dns dns-client dns-client-lwt dns-client-mirage domain-name duff dune dune-configurator duration either emile encore eqaf ethernet faraday fmt fpath fsevents fsevents-lwt functoria-runtime git git-mirage git-paf git-unix gmap graphql graphql-cohttp graphql-lwt graphql_parser h2 happy-eyeballs happy-eyeballs-lwt happy-eyeballs-mirage hex hkdf hpack httpaf hxd index integers ipaddr ipaddr-cstruct ipaddr-sexp irmin irmin-bench irmin-chunk irmin-cli irmin-containers irmin-fs irmin-git irmin-graphql irmin-http irmin-mirage irmin-mirage-git irmin-mirage-graphql irmin-pack irmin-test irmin-tezos irmin-tezos-utils irmin-watcher jsonm ke libirmin logs lru lwt lwt-dllist macaddr macaddr-cstruct magic-mime menhir menhirLib menhirSdk metrics metrics-unix mimic mimic-happy-eyeballs mirage-clock mirage-clock-unix mirage-crypto mirage-crypto-ec mirage-crypto-pk mirage-crypto-rng mirage-crypto-rng-lwt mirage-flow mirage-kv mirage-net mirage-random mirage-runtime mirage-time mirage-unix mtime notty num ocaml ocaml-base-compiler ocaml-compiler-libs ocaml-config ocaml-options-vanilla ocaml-syntax-shims ocamlbuild ocamlfind ocamlgraph ocplib-endian optint paf parsexp pbkdf pecu ppx_cstruct ppx_derivers ppx_deriving ppx_irmin ppx_repr ppx_sexp_conv ppxlib printbox printbox-text progress psq ptime qcheck-alcotest qcheck-core randomconv re repr result rresult rusage semaphore-compat seq sexplib sexplib0 stdlib-shims stringext tcpip terminal tezos-base58 tls tls-lwt tls-mirage topkg uri uri-sexp uucp uuidm uutf vector webmachine x509 yaml yojson zarith |
495f2bd to
3438624
Compare
3438624 to
bbf671a
Compare
|
Superseded by #12508 |
This is a prototype for option 3 in #11983 (comment)
Context:
The quick-fix proposed in #11983 of storing full shape information in the cmt file is expensive in term of space.
Another approach is taken in that pr: to have the compiler index values by their locally-reduced shape directly in the cmt file. Two optimizations are performed:
I need to rebase that PR to 500 to evaluate its cost in terms of cmt size and compilation time, I will un-draft it when I have some results. This PR also includes changes from #11782, only the last four commits are new.
cc @lpw25