[WIP] Better stripping of environments in .cmt files#9039
[WIP] Better stripping of environments in .cmt files#9039alainfrisch wants to merge 2 commits intoocaml:trunkfrom
Conversation
|
I appreciate the idea of trying to make serialized Envs more compact, but I really don't like the idea of making it mutable, especially for merely a 3% gain |
|
I'm also a bit wary of making the whole environment mutable. Maybe you could split the For the |
|
@alainfrisch what do you think of @lpw25 suggestion of splitting the PR? The bugfix part seems like something we should definitely merge. |
I won't have time to work on that soon, but if someone else wants to give it a try, why not. |
Each Typedtree node marshaled in .cmt file contains an
Env.tvalue, where only the "summary" field is kept (it can be used to rebuild the fullEnv.twhen needed). This PR changes this "reduction" process as follows:Instead of mapping over the Typedtree and rewriting Env.t values, the Env.t values are modified in place to keep only the summary field, and then restored to their initial content after the .cmt file has been written. In particular, this preserves (physical) sharing between different environments in the Typedtree. Previously, only "adjacent" equal environments were kept identifical (adjacent considering the depth-first traversal of the Typedtree).
Since Refactor the construction of the initial environment (fixes #7841) #2041, the summary also contains an entry for each available "persistent module" found in the load path. This means that .cmt file will depend on .cmi files available at the time the module was compiled, which makes the build less deterministic. In this PR, the summary for the initial environment (which is a "tail" shared by all other environments in the same typedtree) is trimmed of entries corresponding to .cmi files which haven't been loaded, thus restoring a more deterministic behavior.
With these two changes, the size of .cmt files is reduced, and any tool that need to unmarshal many .cmt file (e.g. global refactoring tools) will use less RAM (and RAM can be a bottleneck for such project-wide tools). For instance, typecore.cmt produced by
ocamlc -annotgoes from 2296736 bytes on trunk to 2230616 bytes with this PR (a 2.9% gain).Note: the .cmt are slightly different when produced with ocamlc, ocamlc.opt and ocamlopt; I suspect the difference is due the difference in sharing only.
In addition, this PR also removes support for OCAML_BINANNOT_WITHENV (an env variable used to keep the "full environment" in .cmt files), which I suspect has never been used.