Reduce installation size by another ~20MiB by dra27 · Pull Request #12467 · ocaml/ocaml

dra27 · 2023-08-04T19:07:05Z

ocamlcommon.cma sets the -linkall flag (see #53) to ensure that users of ocamlcommon.cma don't accidentally use modules in an unsound way. Unpicking that, so that -linkall is not needed, remains non-trivial.

While working on something else, I had cause to want a version of ocamlcommon.cma to use in the build system and another to install. This was a "cheap trick" to ensure that a module would cease to be used in the codebase, but still installed for compatibility, but that's a story for another day. It occurred me that the "trick" of having two versions of ocamlcommon.cma would allow us to have a version in the build which doesn't use -linkall (trusting ourselves to link the modules correctly) but still install a safe one.

There are two benefits: it allows more of the executables in tools/ to just link against ocamlcommon rather than tediously listing the modules they needed (and requiring these lists to be updated any time the module graph of ocamlcommon is changed) and, more importantly, it reduces the size of ocamlcmt, ocamldep.byte/ocamldep.opt and ocamlobjinfo.byte/ocamlobjinfo.opt by ~20MiB on my Ubuntu system. This trick seems cleaner than trying to add yet more flags to control -linkall (although it'd be nice to do that for the debugger, so that it could just use ocamlbytecomp.cma...).

For reasons of safety in _user_ programs, ocamlcommon.cma and ocamlcommon.cmxa set -linkall. This is not necessary for the distributions programs themselves, and makes the size of various tools (ocamldep, etc.) considerably larger than they need to be. The compiler's build now uses ocamlcommon-private, which is linked without -linkall, but it installs the same ocamlcommon as before. The compilers use all the modules in ocamlcommon, so this results in identical compilers, but smaller tools.

Various of the tools binaries explicitly linked the compiler modules they needed, partly from legacy and partly because they substantially increased in size when linked with the -linkall version of ocamlcommon. Now that ocamlcommon-private exists, use it to simplify the build.

xavierleroy · 2023-08-07T09:08:17Z

If we go in this direction, we could also have a tool that copies a .cma or .cmxa while adding the -linkall flag, and use it at installation time the way we use tools/stripdebug to tweak bytecode executables at installation time.

I'm still thinking of ways to refine -linkall into a mechanism that lets users of ocamlc -a and ocamlopt -a specify groups of compilation units that must be linked all at the same time (or not at all), at a finer granularity than -linkall.

dbuenzli · 2023-08-07T09:27:34Z

lets users of ocamlc -a and ocamlopt -a specify groups of compilation units that must be linked all at the same time (or not at all), at a finer granularity than -linkall.

Not so fond of the idea to have a grouping mecanism into an already grouping mecanism. Fundamentally except for the names it introduce you can already simply split your cma into multiple cmas to achieve this.

Wouldn't it be sufficient and simpler to have a "linkme" boolean on cmo units ?

xavierleroy · 2023-08-07T09:42:19Z

Wouldn't it be sufficient and simpler to have a "linkme" boolean on cmo units ?

We have it already! But if you put -linkall on, say, Typemod, with the meaning that typemod.cmo must always be linked, it just drags in all of the typechecker plus a buch of other modules from ocamlcommon.cma in your bytecode executable, which remains way too big.

The mechanism I propose would let you say "if any of Typecore, Typeclass and Typemod is needed, link all three at the same time" (and drag in all of the typechecker). But if none of these three modules is needed, nothing gets dragged.

you can already simply split your cma into multiple cmas to achieve this.

Yes, but you're making it more difficult for end users. If ocamlcommon.cma is split into ocamlbase.cma, ocamlparsing.cma, ocamltyping.cma and a few others, end users (incl. PPX writers?) need to link with those they need (but not all of them otherwise -linkall drags in too much code), in the right order.

In other words: the current .cma/.cmxa mechanisms conflates two different needs: 1- grouping of a bunch of object files so that end users have only one library name to remember and pass to the linker; 2- determining which object files must be linked and which object files can be omitted. Two levels of grouping would separate these needs.

dbuenzli · 2023-08-07T10:16:27Z

Yes, but you're making it more difficult for end users.

Not really. You can make an empty library that depends on the sub libraries, in the right order and let users link on that library instead.

In other words: the current .cma/.cmxa mechanisms conflates two different needs: 1- grouping of a bunch of object files so that end users have only one library name to remember and pass to the linker; 2- determining which object files must be linked and which object files can be omitted. Two levels of grouping would separate these needs.

Then the question becomes how often do you really need that ? Is it worth adding the complexity because of this use case, especially since it can likely be worked around without fuss for end-users ? Do other projects also have that need ?

Also I witnessed a lot of side effecting code in the compiler which may have made sense 20 years ago on more constrained machines but which look rather odd and contrived nowadays (e.g. treating the Bytelibrarian module as a stateful object rather than a module that operates on a structure). Can't maybe the library be refactored so that it fits more naturally in the compiler machinery we already have ?

xavierleroy · 2023-08-07T14:32:50Z

You can make an empty library that depends on the sub libraries, in the right order and let users link on that library instead.

But then all sub-libraries are given to the OCaml linker on the command line, and those marked -linkall will be linked even if unused and drag too much code in.

Can you please trust me when I tell you there's a subtle problem here that only a more sophisticated linking model can solve?

Then the question becomes how often do you really need that ? Is it worth adding the complexity because of this use case, especially since it can likely be worked around without fuss for end-users ? Do other projects also have that need ?

One of the problems that hit the OCaml compiler is recursive functions spread across multiple modules:

A.ml:   let forward_f = ref (fun _ -> assert false)
        let g x = ... !forward_f y ...

B.ml:  let f x = ...  
       let () = A.forward_f := f

User code may reference A.g only, yet B must be linked in, otherwise forward_f has the wrong value.

I expect this idiom to occur in user code too.

There may be other, more gratuitous uses of mutable global state in the OCaml compiler, but such state tends to be initialized in the same module that uses it, so it doesn't complicate linking.

dbuenzli · 2023-08-07T15:56:48Z

Can you please trust me when I tell you there's a subtle problem here that only a more sophisticated linking model can solve?

That doesn't mean I need to accept the solution you propose. Proofs by authority don't work with me and I don't trust you more than anyone else to come up with good solutions. Sorry.

Back to the point. You are proposing a solution which effectively accounts to add yet another module level grouping mechanism in the system. This complexifies the linking mental model for all users and does so at a layer that makes it more difficult to diagnose problems or understand what is happening during linking.

Personally I don't think the forward reference pattern is widespread enough to make the linking model much less obvious than what it is now and I think we should seek a solution that does not entail adding a new grouping mechanism at the module level.

For example a solution compatible with the way – which just fell short – I suggested in my preceding message would be an all or nothing mechanism at the archive level: if no compilation unit is referenced, nothing get linked, if one is, all are. That gives us three linking modes for archives: link-by-need (default), linkall and linkall-by-need which I find easy to explain and understand.

hhugo · 2023-08-07T16:33:09Z

The problem could also be solved in the source directly, at the cost of a small naming overhead.
Below, a reference to Exposed_api.A.g would force both A_internal and B_internal to be linked.

A_internal.ml:
    let forward_f = ref (fun _ -> assert false)
    let g x = ... !forward_f y ...

B_internal.ml:
    let f x = ...  
    let () = A_internal.forward_f := f

Exposed_api.ml:
    module A = struct include A_internal end
    module B = struct include B_internal end

gasche · 2023-08-07T18:32:01Z

Another refinement of -linkall that would tolerate the recursive-module kludge would be let users specify "weak" dependencies among compilation units, that must be present in the linked units but not necessarily in dependency order.

nojb · 2023-08-08T05:54:25Z

The current discussion about extensions/generalizations of -linkall is rather interesting, but the proposal in this PR is low-hanging fruit that can be accepted independently.

@xavierleroy proposed a slightly more elegant approach instead of linking ocamlcommon twice:

If we go in this direction, we could also have a tool that copies a .cma or .cmxa while adding the -linkall flag, and use it at installation time the way we use tools/stripdebug to tweak bytecode executables at installation time.

@dra27 do you think this could work? if yes, could you give it a try?

Octachron · 2023-08-08T09:31:01Z

Rather than extending the features of cma files, or creating two slightly different version of ocamlcommon.cma, would it not be simpler in this case to add an entry point module to the typechecker that enforces that the typechecker is correctly initialized when the entry module is linked?

shindere · 2023-09-20T13:54:17Z

We just discussed this during the triaging meeting and agreed we
must make a decision on whether we want to merge or not.

My take is that since it does improve the situaiton we should take it,
because it's already there, rather than waiting for something even better
but which has the (small!) disadvantage of not existing yet.

But then @Octachron expressed a concern about having two versions of
ocamlcommon, although only one of them gets installed as before and the
change is thus invisible for users.

dbuenzli · 2023-09-20T14:10:49Z

But then why not solve the root of the problem an sanitize compiler libs to follow the pattern @hhugo or @Octachron mentions so that linkall can be dropped once an for all ? Not sure that's much more work, makes the whole thing future proof and keeps your build system more pedestrian (always a win in my opinion). And there's no need to extend the linking model in any way.

gasche · 2023-09-20T14:15:38Z

This PR adds more complexity in the build system, for benefits that seem fairly thin to me. If informed people cannot make a decision, I would be happy to roleplay the uninformed masses and suggest that we close and move along.

hhugo · 2023-10-11T12:00:02Z

But then why not solve the root of the problem an sanitize compiler libs to follow the pattern @hhugo or @Octachron mentions so that linkall can be dropped once an for all ? Not sure that's much more work, makes the whole thing future proof and keeps your build system more pedestrian (always a win in my opinion). And there's no need to extend the linking model in any way.

Adding a namespace for compiler-libs (like what was done with the Stdlib) would probably help. Was it ever considered/discussed?

c-cube · 2023-10-11T12:32:53Z

I was wondering the same recently. Currently compiler libs reserves a lot of useful toplevel names (eg `Trace`) and makes them unavailable to libraries that care about being usable interactively. Namespacing would help.

dbuenzli · 2023-10-11T12:33:04Z

Was it ever considered/discussed?

@nojb tried a few times see #2218 and #9694

hhugo · 2023-10-12T08:15:31Z

It seems #9694 was shutdown after discussions during a developer's meeting but there is no explanation for why that is. Can anyone explain the blockers ?

Following discussion in the developer's meeting and in the spirit of trying to exercise a bit of PR discipline, I'm shutting down this one down, since it is not clear that it has a viable path forward.

nojb · 2023-10-12T08:25:07Z

It seems #9694 was shutdown after discussions during a developer's meeting but there is no explanation for why that is. Can anyone explain the blockers ?

As far as I remember (but it was a long time ago) the PR itself was not really discussed during the developer meeting; it was more a discussion about PRs that get stuck and are left open... rotting... which is not very healthy for the project as a whole.

For the PR in question, the main issue with #2218 and #9694 is that they were rather invasive in the build system, increasing complexity considerably. I think everyone agrees that having a namespaced compiler-libs would be a good thing, but not everyone is convinced that complicating the build system like that is the right approach...

Perhaps there are simpler ways to achieve the same thing; perhaps the recent and ongoing simplifications of the build system by @shindere can also help; not sure...

dbuenzli · 2023-10-12T10:26:35Z

I think everyone agrees that having a namespaced compiler-libs would be a good thing, but not everyone is convinced that complicating the build system like that is the right approach...

Good opportunity for upstream to finally skip over the library linking proposal's first step and bring it to its logical conclusion which was: namespaces !¹

In any case @hhugo's comment reminded me that when I was playing with data driven link dependency discovery I came to the conclusion that any implicit link order could be made explicit at the source level using various statically or non-statically enforced schemes and it was likely better to have that in code (self-describing sources) rather than in the build system .

So if namespacing is not an option then simply stop using -linkall, add a new module with an init function like:

let init () = 
  Sys.opaque_identity @@ fun () -> 
  let module Link_A = A in
  let module Link_B = B in 
  ()

and instruct compiler-libs users to either call that function before using the library or use -linkall themselves if they need to support earlier versions of the compiler.

Rant. At this point we can safely say that namespacing through module aliases is a hack and will remain so. It complicates the whole compilation model with elaborate build logics, adds dubious names (we never have enough of them in OCaml) that need to be special cased in various contexts (e.g. doc generation) and does not bring the promise it claims that you will never see those; the number of times I see error messages which include modules with __ identifiers is non-zero, including from the Stdlib. Stop the madness ! Give us a simple, usable and non-confusing system ! ↩

hhugo · 2024-11-25T12:42:12Z

In case it is useful to anyone, I've been playing with tracking forward references in lambda and typing to understand the issue better. The code is in https://github.com/hhugo/ocaml/tree/forward-refs.

Here is a generated list of all forward references and where they are initialized.

Forward refs:
File "lambda/translcore.ml", line 44, characters 14-21 INITIALIZED by File "lambda/translclass.ml", line 1287, characters 23-30
File "lambda/translcore.ml", line 39, characters 14-21 INITIALIZED by File "lambda/translmod.ml", line 758, characters 18-25
File "typing/typemod.ml", line 195, characters 16-23 INITIALIZED by File "typing/typemod.ml", line 3021, characters 18-25
File "typing/typeclass.ml", line 119, characters 19-26 INITIALIZED by File "typing/typemod.ml", line 3020, characters 18-25
File "typing/typecore.ml", line 239, characters 14-21 INITIALIZED by File "typing/typeclass.ml", line 1926, characters 23-30
File "typing/typecore.ml", line 235, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3019, characters 18-25
File "typing/typecore.ml", line 230, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3018, characters 18-25
File "typing/typecore.ml", line 224, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3016, characters 18-25
File "typing/typecore.ml", line 215, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3013, characters 18-25
File "typing/typetexp.ml", line 400, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3017, characters 18-25
File "typing/typetexp.ml", line 340, characters 54-61 INITIALIZED by File "typing/typemod.ml", line 731, characters 18-25
File "typing/typetexp.ml", line 339, characters 33-40 INITIALIZED by File "typing/typemod.ml", line 3015, characters 18-25
File "typing/typetexp.ml", line 338, characters 43-50 INITIALIZED by File "typing/typemod.ml", line 3014, characters 18-25
File "typing/ctype.ml", line 2600, characters 34-41 INITIALIZED by File "typing/typemod.ml", line 2082, characters 18-25
File "typing/ctype.ml", line 2599, characters 31-38 INITIALIZED by File "typing/ctype.ml", line 5459, characters 25-32
File "typing/ctype.ml", line 1521, characters 14-21 INITIALIZED by File "typing/ctype.ml", line 3314, characters 25-32
File "typing/ctype.ml", line 786, characters 14-21 INITIALIZED by File "typing/ctype.ml", line 1711, characters 25-32
File "typing/env.ml", line 3528, characters 57-64 INITIALIZED by File "typing/printtyp.ml", line 58, characters 32-39
File "typing/env.ml", line 782, characters 14-21 INITIALIZED by File "typing/mtype.ml", line 129, characters 30-37
File "typing/env.ml", line 770, characters 14-21 INITIALIZED by File "typing/includemod.ml", line 1099, characters 18-25
File "typing/env.ml", line 765, characters 14-21 INITIALIZED by File "typing/env.ml", line 2182, characters 18-25
File "typing/env.ml", line 760, characters 14-21 INITIALIZED by File "typing/env.ml", line 2183, characters 18-25
File "typing/env.ml", line 678, characters 43-50 INITIALIZED by File "typing/typemod.ml", line 410, characters 18-25
File "typing/env.ml", line 676, characters 30-37 INITIALIZED by File "typing/ctype.ml", line 5631, characters 25-32
File "typing/env.ml", line 30, characters 44-51 INITIALIZED by File "typing/typecore.ml", line 7112, characters 18-25
File "typing/persistent_env.ml", line 24, characters 49-56 INITIALIZED by File "typing/typecore.ml", line 7111, characters 18-25

dra27 added 2 commits August 4, 2023 16:14

dra27 added the no-change-entry-needed label Aug 4, 2023

dra27 mentioned this pull request Aug 21, 2023

Merge ocamltest/Makefile into the root Makefile #12321

Merged

Conversation

dra27 commented Aug 4, 2023

Uh oh!

xavierleroy commented Aug 7, 2023

Uh oh!

dbuenzli commented Aug 7, 2023

Uh oh!

xavierleroy commented Aug 7, 2023

Uh oh!

dbuenzli commented Aug 7, 2023

Uh oh!

xavierleroy commented Aug 7, 2023

Uh oh!

dbuenzli commented Aug 7, 2023

Uh oh!

hhugo commented Aug 7, 2023

Uh oh!

gasche commented Aug 7, 2023

Uh oh!

nojb commented Aug 8, 2023

Uh oh!

Octachron commented Aug 8, 2023

Uh oh!

shindere commented Sep 20, 2023

Uh oh!

dbuenzli commented Sep 20, 2023

Uh oh!

gasche commented Sep 20, 2023

Uh oh!

hhugo commented Oct 11, 2023

Uh oh!

c-cube commented Oct 11, 2023 via email

Uh oh!

dbuenzli commented Oct 11, 2023

Uh oh!

hhugo commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojb commented Oct 12, 2023

Uh oh!

dbuenzli commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

hhugo commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

hhugo commented Oct 12, 2023 •

edited

Loading

dbuenzli commented Oct 12, 2023 •

edited

Loading