[shapes] Add support for project-wide occurrences#12508
[shapes] Add support for project-wide occurrences#12508Octachron merged 59 commits intoocaml:trunkfrom
Conversation
trefis
left a comment
There was a problem hiding this comment.
I gave this a first look, mostly it looks good.
I left a few comments here and there, and I'm leaving some of the review work for others (sharing is caring).
Looking forward to seeing those benchmarks you mentioned.
|
I made some benchmarks: Compiler
I ran the benchmark multiple times and duration was always a bit longer with these changes, although it's mostly in the error margin.
That's a 11% increase in size. Irmin 3.7I backported the changed to 5.1+trunk to be able to test them on real-world projects.
Did that benchmark multiple times, no perceptible build time difference.
A 2% increase in size. Size of the opam switches' cmt filesThe opam switches have all Irmin's dependencies installed. Package listalcotestalcotest-lwt angstrom arp asn1-combinators astring awa awa-mirage base base-bigarray base-bytes base-domains base-nnp base-threads base-unix base64 bentov bheap bigarray-compat bigstringaf bos ca-certs ca-certs-nss carton carton-git carton-lwt cf cf-lwt checkseum cmdliner cohttp cohttp-lwt cohttp-lwt-unix conduit conduit-lwt conduit-lwt-unix conf-gmp conf-gmp-powm-sec conf-gnuplot conf-libffi conf-pkg-config cppo crunch csexp cstruct cstruct-lwt cstruct-unix ctypes ctypes-foreign decompress digestif dispatch dns dns-client dns-client-lwt dns-client-mirage domain-name duff dune dune-configurator duration either emile encore eqaf ethernet faraday fmt fpath fsevents fsevents-lwt functoria-runtime git git-mirage git-paf git-unix gmap graphql graphql-cohttp graphql-lwt graphql_parser h2 happy-eyeballs happy-eyeballs-lwt happy-eyeballs-mirage hex hkdf hpack httpaf hxd index integers ipaddr ipaddr-cstruct ipaddr-sexp irmin-watcher jsonm ke logs lru lwt lwt-dllist macaddr macaddr-cstruct magic-mime menhir menhirLib menhirSdk metrics metrics-unix mimic mimic-happy-eyeballs mirage-clock mirage-clock-unix mirage-crypto mirage-crypto-ec mirage-crypto-pk mirage-crypto-rng mirage-crypto-rng-lwt mirage-flow mirage-kv mirage-net mirage-random mirage-runtime mirage-time mirage-unix mtime notty num ocaml ocaml-compiler-libs ocaml-config ocaml-syntax-shims ocaml-variants ocamlbuild ocamlfind ocamlgraph ocplib-endian optint paf parsexp pbkdf pecu ppx_derivers ppx_deriving ppx_repr ppx_sexp_conv ppxlib printbox printbox-text progress psq ptime randomconv re repr result rresult rusage semaphore-compat seq sexplib sexplib0 stdlib-shims stringext tcpip terminal tezos-base58 tls tls-lwt tls-mirage topkg uri uri-sexp uucp uuidm uutf vector webmachine x509 yaml yojson zarith
A 10.5% size increase of the cmt files. DiscussionI expect the additional size to come from the index' list of shapes and locations. Partially reduced shapes might not appear elsewhere in the cmt files and the list can be quite large so there is some administrative cost to it. (However the locs themselves should be entirely shared with the typedtree's ones.) Irmin's cmts might already contain more of the shape information that is added by the index, explaining the smaller size increase. All around, the 10.5% size increase is probably a good idea of the new cmt sizes after this PR. A 10% artifact size increase is not unreasonable when one expect its project to be indexed for full occurrences search. The total switch since increase is much less (1.5%), but most of the time we won't need this additional information. (it might be useful for opam-scale analysis, but not for individual projects). Maybe we should introduce a flag to trigger the indexation, so that the new indexing information would not be a part of the installed libraries but could be easily enabled (and probably by default) by the various build systems for user projects? (cc @lpw25) |
|
We discussed the performances with @trefis. We decided to introduce (in 456bf83) a flag to enable indexation: even if the overhead is reasonable, it is wasteful to build and store the index for every installed package. The flag make it simple for build systems to actually enable indexation of projects in developpement mode and not in release mode. |
| | Some Class_type -> (in_printing_env @@ Env.find_cltype path).clty_loc | ||
| | Some (Extension_constructor|Value) | None -> Location.none | ||
| | Some (Extension_constructor|Value|Constructor|Label) | None -> | ||
| Location.none |
There was a problem hiding this comment.
I'm not sure that this is the correct behaviour (though I guess this new code is unused, so it's going to be hard to check).
Ping @Octachron ? 🙂
There was a problem hiding this comment.
This is the "correct" behavior for item kinds whose paths cannot appear in type error messages (and thus we are never trying to disambiguate).
|
I had a meeting on Thursday September 28th with @voodoos, @trefis and @Ekdohibs to discuss this PR. @voodoos presented the design, the new documentation on shapes (thanks!), explained the various iterations and the current state. @Ekdohibs explained some bugs left over in the implementation of deep reduction (in particular in the corner case of terms whose reduction is aborted without reaching a normal form, due to fuel limit) and proposed various ways to fix the idea of "weak reductions" and improve the system. We reviewed the interface / documentation, and we decided to simplify the "No Fuel Left" case by removing the not-quite-reduced term in this case, simplifying @Ekdohibs' implementation. My current understanding:
(A minor point: the "remove |
9393609 to
033f075
Compare
|
An update as I been working mostly offline on this: I am updating Merlin to actually use this PR's changes to answer locate and occurrences queries. This is a way to check that nothing is missing / wrong with the PR with regards to its intended use-case. That led me to make a few changes and fixes that I will add to the PR before cleaning and rebasing it for another round of review. |
c23ab14 to
787a4bf
Compare
|
After discussing with @trefis we decided to reshape types and constructors that are now struct that can contain constructors or labels. This allows a correct model of inline records in types, type extensions and exceptions. These changes are done in d759cfd (@trefis , if you want to have a look: I had to handle more cases than we first thought of for type extensions and exceptions). I also rebased the PR on latest trunk and squashed all the changes into meaningful commits that should be reviewable one by one. I tried to please the CI as much as possible but:
@gasche you mentioned that you might review at some point, if that is still the case I think the PR is now ready for that :-) |
787a4bf to
eb2d7e1
Compare
|
For the check-typo issue, I would propose to simply disable the column width counting for lines that contain "https://". |
gasche
left a comment
There was a problem hiding this comment.
I looked at the change introducing constructor and labels in shapes (good idea!) and I had a few drive-by comments.
d14e449 to
bbdc93b
Compare
- Fix typo and reword uids' description. - Reword the description of shapes' structures that differ slightly from the meaning they usually have in the compiler.
Ident stamps often change when the compiler change, they made the tests a bit fragile.
84b5885 to
6d2e9d5
Compare
6d2e9d5 to
e71de27
Compare
gasche
left a comment
There was a problem hiding this comment.
Minor comments on the changelog entry.
Changes
Outdated
| locations. Add an option -bin-annot-occurrences to the driver which triggers | ||
| the indexing of values, modules, types, labels and constructors usages and | ||
| store that index to the cmt files. Document shapes, make the interface more | ||
| precise and share memoization tables to improve reduction perofrmance. |
There was a problem hiding this comment.
| precise and share memoization tables to improve reduction perofrmance. | |
| precise and share memoization tables to improve reduction performance. |
There was a problem hiding this comment.
The attribution part below is fine, but the description of the change below is too low-level: the main readers are people interested in understanding new features of an OCaml release. To record development choices we use nice commit messages.
I propose the following:
Add compiler-side support for project-wide occurrences in Merlin,
by generating index tables of all identifier occurrences.
This extra data in .cmt files is only added when the new flag
-bin-annot-occurrences
is passed.
| "-bin-annot", Arg.Unit f, " Save typedtree in <filename>.cmt" | ||
|
|
||
| let mk_store_occurrences f = | ||
| "-bin-annot-occurrences", Arg.Unit f, |
There was a problem hiding this comment.
I would prefer if the name of the option (as a variable) matched the name of the option (as a flag): mk_binannot_occurences for example.
file_formats/cmt_format.ml
Outdated
| | Texp_send _ | ||
| | Texp_letmodule _ | Texp_letexception _ | Texp_assert _ | Texp_lazy _ | ||
| | Texp_object _ | Texp_pack _ | Texp_letop _ | Texp_unreachable | ||
| | Texp_extension_constructor _ | Texp_open _ -> ()); |
There was a problem hiding this comment.
There is small missing occurrence here: Texp_extension_constructor represents [%extension_constructor Not_found] and thus may hide an occurrence of an extension constructor.
There was a problem hiding this comment.
Good catch, I just pushed a test showing the missing occurrence and a fix.
|
Congratulations everyone for the merge :-) |
[shapes] Add support for project-wide occurrences (cherry picked from commit a682d51)
|
Cherry-picked to 5.2 in 02b3970 . Thanks ! |
This PR implements required changes to provide
project-wide-occurrencesin OCaml projects.It is structured in 4 parts:
Shapes:tag. This was also taken as an opportunity to add some documentation to the mli of theShapemodule.cmtfile. This requires the changes made in 1. and makes PR Store uids' declarations instead of node locations #11782 obsolete. This is done in commit fda848b.cmtfile associated with their locally-reduced shape.Grouping all these changes together in the same branch was more convenient to iterate on the design, but if this feels too much for one PR I can split it.
List of the shape changes
About the indexation
Indexing values (or types, etc) requires reducing the shape of every usage of a value to obtain its definition. This process requires access to the full typing environment which contains all intermediary shapes. Since only a stripped down
environment is stored in the CMT file this processing cannot be done by an external tool with only the Typedtree and environment stored in the cmt files.
This was discussed in PR #11983 and we decided to perform (partial) reduction of the shape associated with each usage of a value in the compiler and store the result in the cmt file. Shapes that relies on external compilation units are not reduced completely to preserve separate compilation.
Project-wide occurrences
With the changes proposed in this PR, one should be able to build an index of all values in a project by following these steps:
-bin-annot.cmtfile and finish the reduction of the shapes in thecmt_usages_indexlists to get the definitions corresponding to the listed values. This step might require loading othercmtfiles.cmtfile.Size increase and performance discussion
It is expected that:
cmtshould not take too much additionalspace due to sharing with the already-present typedtree.
I will prepare some benchmarks next week to verify these assertions.