Skip to content

[WIP] Better stripping of environments in .cmt files#9039

Open
alainfrisch wants to merge 2 commits intoocaml:trunkfrom
alainfrisch:afrisch_strip_envs
Open

[WIP] Better stripping of environments in .cmt files#9039
alainfrisch wants to merge 2 commits intoocaml:trunkfrom
alainfrisch:afrisch_strip_envs

Conversation

@alainfrisch
Copy link
Copy Markdown
Contributor

Each Typedtree node marshaled in .cmt file contains an Env.t value, where only the "summary" field is kept (it can be used to rebuild the full Env.t when needed). This PR changes this "reduction" process as follows:

  • Instead of mapping over the Typedtree and rewriting Env.t values, the Env.t values are modified in place to keep only the summary field, and then restored to their initial content after the .cmt file has been written. In particular, this preserves (physical) sharing between different environments in the Typedtree. Previously, only "adjacent" equal environments were kept identifical (adjacent considering the depth-first traversal of the Typedtree).

  • Since Refactor the construction of the initial environment (fixes #7841) #2041, the summary also contains an entry for each available "persistent module" found in the load path. This means that .cmt file will depend on .cmi files available at the time the module was compiled, which makes the build less deterministic. In this PR, the summary for the initial environment (which is a "tail" shared by all other environments in the same typedtree) is trimmed of entries corresponding to .cmi files which haven't been loaded, thus restoring a more deterministic behavior.

With these two changes, the size of .cmt files is reduced, and any tool that need to unmarshal many .cmt file (e.g. global refactoring tools) will use less RAM (and RAM can be a bottleneck for such project-wide tools). For instance, typecore.cmt produced by ocamlc -annot goes from 2296736 bytes on trunk to 2230616 bytes with this PR (a 2.9% gain).

Note: the .cmt are slightly different when produced with ocamlc, ocamlc.opt and ocamlopt; I suspect the difference is due the difference in sharing only.

In addition, this PR also removes support for OCAML_BINANNOT_WITHENV (an env variable used to keep the "full environment" in .cmt files), which I suspect has never been used.

@Drup
Copy link
Copy Markdown
Contributor

Drup commented Oct 16, 2019

I appreciate the idea of trying to make serialized Envs more compact, but I really don't like the idea of making it mutable, especially for merely a 3% gain

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 17, 2019

I'm also a bit wary of making the whole environment mutable. Maybe you could split the Env_initial bit from the other bit. The first is basically a bugfix whilst the second is an optimisation.

For the Env_initial part I wonder if it would be better to just have Env_initial with no arguments and then keep the (stripped) initial environment in another field of the cmt file. Obviously that would mean that env_of_only_summary would need to take an initial environment.

@ghost
Copy link
Copy Markdown

ghost commented Feb 19, 2020

@alainfrisch what do you think of @lpw25 suggestion of splitting the PR? The bugfix part seems like something we should definitely merge.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

@alainfrisch what do you think of @lpw25 suggestion of splitting the PR? The bugfix part seems like something we should definitely merge.

I won't have time to work on that soon, but if someone else wants to give it a try, why not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants