Reduce installation size by another ~20MiB#12467
Conversation
For reasons of safety in _user_ programs, ocamlcommon.cma and ocamlcommon.cmxa set -linkall. This is not necessary for the distributions programs themselves, and makes the size of various tools (ocamldep, etc.) considerably larger than they need to be. The compiler's build now uses ocamlcommon-private, which is linked without -linkall, but it installs the same ocamlcommon as before. The compilers use all the modules in ocamlcommon, so this results in identical compilers, but smaller tools.
Various of the tools binaries explicitly linked the compiler modules they needed, partly from legacy and partly because they substantially increased in size when linked with the -linkall version of ocamlcommon. Now that ocamlcommon-private exists, use it to simplify the build.
|
If we go in this direction, we could also have a tool that copies a .cma or .cmxa while adding the -linkall flag, and use it at installation time the way we use tools/stripdebug to tweak bytecode executables at installation time. I'm still thinking of ways to refine |
Not so fond of the idea to have a grouping mecanism into an already grouping mecanism. Fundamentally except for the names it introduce you can already simply split your cma into multiple cmas to achieve this. Wouldn't it be sufficient and simpler to have a "linkme" boolean on |
We have it already! But if you put The mechanism I propose would let you say "if any of Typecore, Typeclass and Typemod is needed, link all three at the same time" (and drag in all of the typechecker). But if none of these three modules is needed, nothing gets dragged.
Yes, but you're making it more difficult for end users. If ocamlcommon.cma is split into ocamlbase.cma, ocamlparsing.cma, ocamltyping.cma and a few others, end users (incl. PPX writers?) need to link with those they need (but not all of them otherwise -linkall drags in too much code), in the right order. In other words: the current .cma/.cmxa mechanisms conflates two different needs: 1- grouping of a bunch of object files so that end users have only one library name to remember and pass to the linker; 2- determining which object files must be linked and which object files can be omitted. Two levels of grouping would separate these needs. |
Not really. You can make an empty library that depends on the sub libraries, in the right order and let users link on that library instead.
Then the question becomes how often do you really need that ? Is it worth adding the complexity because of this use case, especially since it can likely be worked around without fuss for end-users ? Do other projects also have that need ? Also I witnessed a lot of side effecting code in the compiler which may have made sense 20 years ago on more constrained machines but which look rather odd and contrived nowadays (e.g. treating the |
But then all sub-libraries are given to the OCaml linker on the command line, and those marked -linkall will be linked even if unused and drag too much code in. Can you please trust me when I tell you there's a subtle problem here that only a more sophisticated linking model can solve?
One of the problems that hit the OCaml compiler is recursive functions spread across multiple modules: User code may reference I expect this idiom to occur in user code too. There may be other, more gratuitous uses of mutable global state in the OCaml compiler, but such state tends to be initialized in the same module that uses it, so it doesn't complicate linking. |
That doesn't mean I need to accept the solution you propose. Proofs by authority don't work with me and I don't trust you more than anyone else to come up with good solutions. Sorry. Back to the point. You are proposing a solution which effectively accounts to add yet another module level grouping mechanism in the system. This complexifies the linking mental model for all users and does so at a layer that makes it more difficult to diagnose problems or understand what is happening during linking. Personally I don't think the forward reference pattern is widespread enough to make the linking model much less obvious than what it is now and I think we should seek a solution that does not entail adding a new grouping mechanism at the module level. For example a solution compatible with the way – which just fell short – I suggested in my preceding message would be an all or nothing mechanism at the archive level: if no compilation unit is referenced, nothing get linked, if one is, all are. That gives us three linking modes for archives: |
|
The problem could also be solved in the source directly, at the cost of a small naming overhead. |
|
Another refinement of |
|
The current discussion about extensions/generalizations of @xavierleroy proposed a slightly more elegant approach instead of linking
@dra27 do you think this could work? if yes, could you give it a try? |
|
Rather than extending the features of cma files, or creating two slightly different version of ocamlcommon.cma, would it not be simpler in this case to add an entry point module to the typechecker that enforces that the typechecker is correctly initialized when the entry module is linked? |
|
We just discussed this during the triaging meeting and agreed we My take is that since it does improve the situaiton we should take it, But then @Octachron expressed a concern about having two versions of |
|
But then why not solve the root of the problem an sanitize compiler libs to follow the pattern @hhugo or @Octachron mentions so that |
|
This PR adds more complexity in the build system, for benefits that seem fairly thin to me. If informed people cannot make a decision, I would be happy to roleplay the uninformed masses and suggest that we close and move along. |
Adding a namespace for compiler-libs (like what was done with the Stdlib) would probably help. Was it ever considered/discussed? |
|
I was wondering the same recently. Currently compiler libs reserves a lot of useful toplevel names (eg `Trace`) and makes them unavailable to libraries that care about being usable interactively. Namespacing would help.
|
|
It seems #9694 was shutdown after discussions during a developer's meeting but there is no explanation for why that is. Can anyone explain the blockers ?
|
As far as I remember (but it was a long time ago) the PR itself was not really discussed during the developer meeting; it was more a discussion about PRs that get stuck and are left open... rotting... which is not very healthy for the project as a whole. For the PR in question, the main issue with #2218 and #9694 is that they were rather invasive in the build system, increasing complexity considerably. I think everyone agrees that having a namespaced compiler-libs would be a good thing, but not everyone is convinced that complicating the build system like that is the right approach... Perhaps there are simpler ways to achieve the same thing; perhaps the recent and ongoing simplifications of the build system by @shindere can also help; not sure... |
Good opportunity for upstream to finally skip over the library linking proposal's first step and bring it to its logical conclusion which was: namespaces !1 In any case @hhugo's comment reminded me that when I was playing with data driven link dependency discovery I came to the conclusion that any implicit link order could be made explicit at the source level using various statically or non-statically enforced schemes and it was likely better to have that in code (self-describing sources) rather than in the build system . So if namespacing is not an option then simply stop using and instruct Footnotes
|
|
In case it is useful to anyone, I've been playing with tracking forward references in lambda and typing to understand the issue better. The code is in https://github.com/hhugo/ocaml/tree/forward-refs. Here is a generated list of all forward references and where they are initialized. |
ocamlcommon.cmasets the-linkallflag (see #53) to ensure that users ofocamlcommon.cmadon't accidentally use modules in an unsound way. Unpicking that, so that-linkallis not needed, remains non-trivial.While working on something else, I had cause to want a version of
ocamlcommon.cmato use in the build system and another to install. This was a "cheap trick" to ensure that a module would cease to be used in the codebase, but still installed for compatibility, but that's a story for another day. It occurred me that the "trick" of having two versions ofocamlcommon.cmawould allow us to have a version in the build which doesn't use-linkall(trusting ourselves to link the modules correctly) but still install a safe one.There are two benefits: it allows more of the executables in
tools/to just link againstocamlcommonrather than tediously listing the modules they needed (and requiring these lists to be updated any time the module graph of ocamlcommon is changed) and, more importantly, it reduces the size ofocamlcmt,ocamldep.byte/ocamldep.optandocamlobjinfo.byte/ocamlobjinfo.optby ~20MiB on my Ubuntu system. This trick seems cleaner than trying to add yet more flags to control-linkall(although it'd be nice to do that for the debugger, so that it could just use ocamlbytecomp.cma...).