Skip to content

Reduce installation size by another ~20MiB#12467

Open
dra27 wants to merge 2 commits intoocaml:trunkfrom
dra27:reduce-size
Open

Reduce installation size by another ~20MiB#12467
dra27 wants to merge 2 commits intoocaml:trunkfrom
dra27:reduce-size

Conversation

@dra27
Copy link
Copy Markdown
Member

@dra27 dra27 commented Aug 4, 2023

ocamlcommon.cma sets the -linkall flag (see #53) to ensure that users of ocamlcommon.cma don't accidentally use modules in an unsound way. Unpicking that, so that -linkall is not needed, remains non-trivial.

While working on something else, I had cause to want a version of ocamlcommon.cma to use in the build system and another to install. This was a "cheap trick" to ensure that a module would cease to be used in the codebase, but still installed for compatibility, but that's a story for another day. It occurred me that the "trick" of having two versions of ocamlcommon.cma would allow us to have a version in the build which doesn't use -linkall (trusting ourselves to link the modules correctly) but still install a safe one.

There are two benefits: it allows more of the executables in tools/ to just link against ocamlcommon rather than tediously listing the modules they needed (and requiring these lists to be updated any time the module graph of ocamlcommon is changed) and, more importantly, it reduces the size of ocamlcmt, ocamldep.byte/ocamldep.opt and ocamlobjinfo.byte/ocamlobjinfo.opt by ~20MiB on my Ubuntu system. This trick seems cleaner than trying to add yet more flags to control -linkall (although it'd be nice to do that for the debugger, so that it could just use ocamlbytecomp.cma...).

dra27 added 2 commits August 4, 2023 16:14
For reasons of safety in _user_ programs, ocamlcommon.cma and
ocamlcommon.cmxa set -linkall. This is not necessary for the
distributions programs themselves, and makes the size of various tools
(ocamldep, etc.) considerably larger than they need to be.

The compiler's build now uses ocamlcommon-private, which is linked
without -linkall, but it installs the same ocamlcommon as before. The
compilers use all the modules in ocamlcommon, so this results in
identical compilers, but smaller tools.
Various of the tools binaries explicitly linked the compiler modules
they needed, partly from legacy and partly because they substantially
increased in size when linked with the -linkall version of ocamlcommon.
Now that ocamlcommon-private exists, use it to simplify the build.
@xavierleroy
Copy link
Copy Markdown
Contributor

If we go in this direction, we could also have a tool that copies a .cma or .cmxa while adding the -linkall flag, and use it at installation time the way we use tools/stripdebug to tweak bytecode executables at installation time.

I'm still thinking of ways to refine -linkall into a mechanism that lets users of ocamlc -a and ocamlopt -a specify groups of compilation units that must be linked all at the same time (or not at all), at a finer granularity than -linkall.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Aug 7, 2023

lets users of ocamlc -a and ocamlopt -a specify groups of compilation units that must be linked all at the same time (or not at all), at a finer granularity than -linkall.

Not so fond of the idea to have a grouping mecanism into an already grouping mecanism. Fundamentally except for the names it introduce you can already simply split your cma into multiple cmas to achieve this.

Wouldn't it be sufficient and simpler to have a "linkme" boolean on cmo units ?

@xavierleroy
Copy link
Copy Markdown
Contributor

Wouldn't it be sufficient and simpler to have a "linkme" boolean on cmo units ?

We have it already! But if you put -linkall on, say, Typemod, with the meaning that typemod.cmo must always be linked, it just drags in all of the typechecker plus a buch of other modules from ocamlcommon.cma in your bytecode executable, which remains way too big.

The mechanism I propose would let you say "if any of Typecore, Typeclass and Typemod is needed, link all three at the same time" (and drag in all of the typechecker). But if none of these three modules is needed, nothing gets dragged.

you can already simply split your cma into multiple cmas to achieve this.

Yes, but you're making it more difficult for end users. If ocamlcommon.cma is split into ocamlbase.cma, ocamlparsing.cma, ocamltyping.cma and a few others, end users (incl. PPX writers?) need to link with those they need (but not all of them otherwise -linkall drags in too much code), in the right order.

In other words: the current .cma/.cmxa mechanisms conflates two different needs: 1- grouping of a bunch of object files so that end users have only one library name to remember and pass to the linker; 2- determining which object files must be linked and which object files can be omitted. Two levels of grouping would separate these needs.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Aug 7, 2023

Yes, but you're making it more difficult for end users.

Not really. You can make an empty library that depends on the sub libraries, in the right order and let users link on that library instead.

In other words: the current .cma/.cmxa mechanisms conflates two different needs: 1- grouping of a bunch of object files so that end users have only one library name to remember and pass to the linker; 2- determining which object files must be linked and which object files can be omitted. Two levels of grouping would separate these needs.

Then the question becomes how often do you really need that ? Is it worth adding the complexity because of this use case, especially since it can likely be worked around without fuss for end-users ? Do other projects also have that need ?

Also I witnessed a lot of side effecting code in the compiler which may have made sense 20 years ago on more constrained machines but which look rather odd and contrived nowadays (e.g. treating the Bytelibrarian module as a stateful object rather than a module that operates on a structure). Can't maybe the library be refactored so that it fits more naturally in the compiler machinery we already have ?

@xavierleroy
Copy link
Copy Markdown
Contributor

You can make an empty library that depends on the sub libraries, in the right order and let users link on that library instead.

But then all sub-libraries are given to the OCaml linker on the command line, and those marked -linkall will be linked even if unused and drag too much code in.

Can you please trust me when I tell you there's a subtle problem here that only a more sophisticated linking model can solve?

Then the question becomes how often do you really need that ? Is it worth adding the complexity because of this use case, especially since it can likely be worked around without fuss for end-users ? Do other projects also have that need ?

One of the problems that hit the OCaml compiler is recursive functions spread across multiple modules:

A.ml:   let forward_f = ref (fun _ -> assert false)
        let g x = ... !forward_f y ...

B.ml:  let f x = ...  
       let () = A.forward_f := f

User code may reference A.g only, yet B must be linked in, otherwise forward_f has the wrong value.

I expect this idiom to occur in user code too.

There may be other, more gratuitous uses of mutable global state in the OCaml compiler, but such state tends to be initialized in the same module that uses it, so it doesn't complicate linking.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Aug 7, 2023

Can you please trust me when I tell you there's a subtle problem here that only a more sophisticated linking model can solve?

That doesn't mean I need to accept the solution you propose. Proofs by authority don't work with me and I don't trust you more than anyone else to come up with good solutions. Sorry.

Back to the point. You are proposing a solution which effectively accounts to add yet another module level grouping mechanism in the system. This complexifies the linking mental model for all users and does so at a layer that makes it more difficult to diagnose problems or understand what is happening during linking.

Personally I don't think the forward reference pattern is widespread enough to make the linking model much less obvious than what it is now and I think we should seek a solution that does not entail adding a new grouping mechanism at the module level.

For example a solution compatible with the way – which just fell short – I suggested in my preceding message would be an all or nothing mechanism at the archive level: if no compilation unit is referenced, nothing get linked, if one is, all are. That gives us three linking modes for archives: link-by-need (default), linkall and linkall-by-need which I find easy to explain and understand.

@hhugo
Copy link
Copy Markdown
Contributor

hhugo commented Aug 7, 2023

The problem could also be solved in the source directly, at the cost of a small naming overhead.
Below, a reference to Exposed_api.A.g would force both A_internal and B_internal to be linked.

A_internal.ml:
    let forward_f = ref (fun _ -> assert false)
    let g x = ... !forward_f y ...

B_internal.ml:
    let f x = ...  
    let () = A_internal.forward_f := f

Exposed_api.ml:
    module A = struct include A_internal end
    module B = struct include B_internal end

@gasche
Copy link
Copy Markdown
Member

gasche commented Aug 7, 2023

Another refinement of -linkall that would tolerate the recursive-module kludge would be let users specify "weak" dependencies among compilation units, that must be present in the linked units but not necessarily in dependency order.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Aug 8, 2023

The current discussion about extensions/generalizations of -linkall is rather interesting, but the proposal in this PR is low-hanging fruit that can be accepted independently.

@xavierleroy proposed a slightly more elegant approach instead of linking ocamlcommon twice:

If we go in this direction, we could also have a tool that copies a .cma or .cmxa while adding the -linkall flag, and use it at installation time the way we use tools/stripdebug to tweak bytecode executables at installation time.

@dra27 do you think this could work? if yes, could you give it a try?

@Octachron
Copy link
Copy Markdown
Member

Rather than extending the features of cma files, or creating two slightly different version of ocamlcommon.cma, would it not be simpler in this case to add an entry point module to the typechecker that enforces that the typechecker is correctly initialized when the entry module is linked?

@shindere
Copy link
Copy Markdown
Contributor

We just discussed this during the triaging meeting and agreed we
must make a decision on whether we want to merge or not.

My take is that since it does improve the situaiton we should take it,
because it's already there, rather than waiting for something even better
but which has the (small!) disadvantage of not existing yet.

But then @Octachron expressed a concern about having two versions of
ocamlcommon, although only one of them gets installed as before and the
change is thus invisible for users.

@dbuenzli
Copy link
Copy Markdown
Contributor

But then why not solve the root of the problem an sanitize compiler libs to follow the pattern @hhugo or @Octachron mentions so that linkall can be dropped once an for all ? Not sure that's much more work, makes the whole thing future proof and keeps your build system more pedestrian (always a win in my opinion). And there's no need to extend the linking model in any way.

@gasche
Copy link
Copy Markdown
Member

gasche commented Sep 20, 2023

This PR adds more complexity in the build system, for benefits that seem fairly thin to me. If informed people cannot make a decision, I would be happy to roleplay the uninformed masses and suggest that we close and move along.

@hhugo
Copy link
Copy Markdown
Contributor

hhugo commented Oct 11, 2023

But then why not solve the root of the problem an sanitize compiler libs to follow the pattern @hhugo or @Octachron mentions so that linkall can be dropped once an for all ? Not sure that's much more work, makes the whole thing future proof and keeps your build system more pedestrian (always a win in my opinion). And there's no need to extend the linking model in any way.

Adding a namespace for compiler-libs (like what was done with the Stdlib) would probably help. Was it ever considered/discussed?

@c-cube
Copy link
Copy Markdown
Contributor

c-cube commented Oct 11, 2023 via email

@dbuenzli
Copy link
Copy Markdown
Contributor

Was it ever considered/discussed?

@nojb tried a few times see #2218 and #9694

@hhugo
Copy link
Copy Markdown
Contributor

hhugo commented Oct 12, 2023

It seems #9694 was shutdown after discussions during a developer's meeting but there is no explanation for why that is. Can anyone explain the blockers ?

Following discussion in the developer's meeting and in the spirit of trying to exercise a bit of PR discipline, I'm shutting down this one down, since it is not clear that it has a viable path forward.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Oct 12, 2023

It seems #9694 was shutdown after discussions during a developer's meeting but there is no explanation for why that is. Can anyone explain the blockers ?

As far as I remember (but it was a long time ago) the PR itself was not really discussed during the developer meeting; it was more a discussion about PRs that get stuck and are left open... rotting... which is not very healthy for the project as a whole.

For the PR in question, the main issue with #2218 and #9694 is that they were rather invasive in the build system, increasing complexity considerably. I think everyone agrees that having a namespaced compiler-libs would be a good thing, but not everyone is convinced that complicating the build system like that is the right approach...

Perhaps there are simpler ways to achieve the same thing; perhaps the recent and ongoing simplifications of the build system by @shindere can also help; not sure...

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Oct 12, 2023

I think everyone agrees that having a namespaced compiler-libs would be a good thing, but not everyone is convinced that complicating the build system like that is the right approach...

Good opportunity for upstream to finally skip over the library linking proposal's first step and bring it to its logical conclusion which was: namespaces !1

In any case @hhugo's comment reminded me that when I was playing with data driven link dependency discovery I came to the conclusion that any implicit link order could be made explicit at the source level using various statically or non-statically enforced schemes and it was likely better to have that in code (self-describing sources) rather than in the build system .

So if namespacing is not an option then simply stop using -linkall, add a new module with an init function like:

let init () = 
  Sys.opaque_identity @@ fun () -> 
  let module Link_A = A in
  let module Link_B = B in 
  ()

and instruct compiler-libs users to either call that function before using the library or use -linkall themselves if they need to support earlier versions of the compiler.

Footnotes

  1. Rant. At this point we can safely say that namespacing through module aliases is a hack and will remain so. It complicates the whole compilation model with elaborate build logics, adds dubious names (we never have enough of them in OCaml) that need to be special cased in various contexts (e.g. doc generation) and does not bring the promise it claims that you will never see those; the number of times I see error messages which include modules with __ identifiers is non-zero, including from the Stdlib. Stop the madness ! Give us a simple, usable and non-confusing system !

@hhugo
Copy link
Copy Markdown
Contributor

hhugo commented Nov 25, 2024

In case it is useful to anyone, I've been playing with tracking forward references in lambda and typing to understand the issue better. The code is in https://github.com/hhugo/ocaml/tree/forward-refs.

Here is a generated list of all forward references and where they are initialized.

Forward refs:
File "lambda/translcore.ml", line 44, characters 14-21 INITIALIZED by File "lambda/translclass.ml", line 1287, characters 23-30
File "lambda/translcore.ml", line 39, characters 14-21 INITIALIZED by File "lambda/translmod.ml", line 758, characters 18-25
File "typing/typemod.ml", line 195, characters 16-23 INITIALIZED by File "typing/typemod.ml", line 3021, characters 18-25
File "typing/typeclass.ml", line 119, characters 19-26 INITIALIZED by File "typing/typemod.ml", line 3020, characters 18-25
File "typing/typecore.ml", line 239, characters 14-21 INITIALIZED by File "typing/typeclass.ml", line 1926, characters 23-30
File "typing/typecore.ml", line 235, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3019, characters 18-25
File "typing/typecore.ml", line 230, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3018, characters 18-25
File "typing/typecore.ml", line 224, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3016, characters 18-25
File "typing/typecore.ml", line 215, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3013, characters 18-25
File "typing/typetexp.ml", line 400, characters 14-21 INITIALIZED by File "typing/typemod.ml", line 3017, characters 18-25
File "typing/typetexp.ml", line 340, characters 54-61 INITIALIZED by File "typing/typemod.ml", line 731, characters 18-25
File "typing/typetexp.ml", line 339, characters 33-40 INITIALIZED by File "typing/typemod.ml", line 3015, characters 18-25
File "typing/typetexp.ml", line 338, characters 43-50 INITIALIZED by File "typing/typemod.ml", line 3014, characters 18-25
File "typing/ctype.ml", line 2600, characters 34-41 INITIALIZED by File "typing/typemod.ml", line 2082, characters 18-25
File "typing/ctype.ml", line 2599, characters 31-38 INITIALIZED by File "typing/ctype.ml", line 5459, characters 25-32
File "typing/ctype.ml", line 1521, characters 14-21 INITIALIZED by File "typing/ctype.ml", line 3314, characters 25-32
File "typing/ctype.ml", line 786, characters 14-21 INITIALIZED by File "typing/ctype.ml", line 1711, characters 25-32
File "typing/env.ml", line 3528, characters 57-64 INITIALIZED by File "typing/printtyp.ml", line 58, characters 32-39
File "typing/env.ml", line 782, characters 14-21 INITIALIZED by File "typing/mtype.ml", line 129, characters 30-37
File "typing/env.ml", line 770, characters 14-21 INITIALIZED by File "typing/includemod.ml", line 1099, characters 18-25
File "typing/env.ml", line 765, characters 14-21 INITIALIZED by File "typing/env.ml", line 2182, characters 18-25
File "typing/env.ml", line 760, characters 14-21 INITIALIZED by File "typing/env.ml", line 2183, characters 18-25
File "typing/env.ml", line 678, characters 43-50 INITIALIZED by File "typing/typemod.ml", line 410, characters 18-25
File "typing/env.ml", line 676, characters 30-37 INITIALIZED by File "typing/ctype.ml", line 5631, characters 25-32
File "typing/env.ml", line 30, characters 44-51 INITIALIZED by File "typing/typecore.ml", line 7112, characters 18-25
File "typing/persistent_env.ml", line 24, characters 49-56 INITIALIZED by File "typing/typecore.ml", line 7111, characters 18-25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants