Skip to content

Emancipate dynlink from compilerlibs#11996

Merged
shindere merged 5 commits intoocaml:trunkfrom
shindere:emancipate-dynlink-from-compilerlibs
Jun 18, 2024
Merged

Emancipate dynlink from compilerlibs#11996
shindere merged 5 commits intoocaml:trunkfrom
shindere:emancipate-dynlink-from-compilerlibs

Conversation

@shindere
Copy link
Copy Markdown
Contributor

@shindere shindere commented Feb 7, 2023

As is well-known, currently the dynlink other library depends on compilerlibs. More precisely, it has to embed a copy of compilerlibs, meaning that compilerlibs has to be compiled twice, its dynlink-embedded version being compiled with -for-pack.

This PR, based on a suggestion by @stedolan, is a sequence of changes that emancipate dynlink from compilerlibs, making it depend on a smaller module, CamlinternalDynlink, which is added to the standard library.

Roughly speaking, the involved changes are the following ones:

  • Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals
  • Changing the way identifiers are stored in cmo files, to use only standard types rather than their representation as used by the type-checker
  • Putting the definition of magic numbers under configure's control, to make it possible and easy to use it at several places in the build system without dupplicating code
  • Roughly speaking, moving code around and sometimes splitting modules into two parts (e.g. the Dll module), one part used for compilation and static linkingthat remains in bytecomp and the other used for dynamic loadin gof objects by both dynlink and the toplevels, moved to the CamlinternalDynlink standard library module.

Each of the two first changes has required a compielr bootstrap. After these two changes, the types used to represent CMO and CMA files only use standard OCaml types and do not rely on any type internal to the compiler.

Currently, the PR works, meaning that itpasses the testsuite in both lambda and flambda modes.

On my machine, not compiling compilerlibs twice saves ~6s of total build time, which goes down from 4m58,261s to 4m52,251s. The size of dynlink.cma is reduced from 4,1M to 112K and the size of dynlink.cmxa is divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).

I didn't undertake any diff testing so far because I wanted to gather some input first and make sure the version I'd use for diff testing would be as close as possible to what peope would agree to merge, especially given that rebasing this on 5.0 is non-trivial and could introduce errors, in particular because of the changes to the bytepackager module that happened after 5.0 had been branched.

Among the aspects that will IMO require some discussion, there is the way the Symtable module has been split. Currently, much of its code is duplicated and it'd certainly be good to avoid such a duplication as much as possible, given that, as far as symbol table is concerned, it may not be possible to make the toplevel as independent of compilerlibs as dynlink because it more heavily relies on the Lambda module, thus oon compiler internals.

Obviously this PR is best approached in a per-commit way.

@shindere shindere force-pushed the emancipate-dynlink-from-compilerlibs branch from a3cec01 to 4e61d24 Compare February 7, 2023 10:08
@nojb
Copy link
Copy Markdown
Contributor

nojb commented Feb 7, 2023

the size of dynlink.cmxa is divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).

This file does not contain any code, you should look at the size of dynlink.a.

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented Feb 7, 2023

Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals.

This change sounds like something that would be very useful for camlboot. I would be interested in considering it as a separate buildup PR that could be discussed and reviewed separately.

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023

Of course, one possibility is to split this PR in several smaller / easier to chew ones. The aim of this PR is to provide a complete story, it does not necessarily mean it has to be merged all at once, assuming it's appreciated.

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Feb 7, 2023

  • Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals
  • Changing the way identifiers are stored in cmo files, to use only standard types rather than their representation as used by the type-checker

I think these two changes (at least) should be split off into separate PRs since they have independent interest and can be reviewed in a self-contained manner.

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023 via email

@Octachron
Copy link
Copy Markdown
Member

What is the motivation for adding an internal dynlink module to the standard library? This move seems to go in the opposite direction of the rest of the PR. This is an internal module to the compiler that exposes internal implementation details of the compiler, ,which is not used by other modules in the standard library. Do we really want to make this module visible to all users of the standard library and start offering backward-compatibility guarantee for this module?

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023 via email

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023 via email

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Feb 7, 2023

start offering backward-compatibility guarantee for this module?

I thought we didn't offer any guarantees at all for the camlinternal* modules... Their API is not even documented anywhere.

when starting the running program. *)
val init_toplevel: string -> unit

val reset: unit -> unit
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that it is unused doesn't mean it should be removed. I think we have a reset function every time there is a mutable global state.

| Const_block(tag, fields) ->
let block = Obj.new_block tag (List.length fields) in
let pos = ref 0 in
List.iter
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List.iteri ?

tools/dumpobj.ml Outdated
Reloc_literal sc -> print_obj sc; printf "\n"
| Reloc_getglobal id -> printf "require %s\n" (Ident.name id)
| Reloc_setglobal id -> printf "provide %s\n" (Ident.name id)
| Reloc_getglobal id | Reloc_getpredef id ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a newline between the or pattern so that the printf can be aligned with others and we can check the number of spaces is correct.


let unsafe_get_global_value ~bytecode_or_asm_symbol =
let id = Ident.create_persistent bytecode_or_asm_symbol in
let id = bytecode_or_asm_symbol in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this binding is no longer useful

let fold_initial_units ~init ~f =
List.fold_left (fun acc (comp_unit, interface) ->
let id = Ident.create_persistent comp_unit in
let id = comp_unit in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This binding is no longer useful

type bytecode
external reify_bytecode :
bytes array -> Instruct.debug_event list array -> string option ->
bytes array -> 'a -> string option ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should try again to use an abstract type and resolve the issue in toplevel.ml using the newType.Equal

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023

uJust submitted #11997 for the translation of structured constants bit.

Since changing the representation of identifiers as done here touches
code which is also changed by #11997, I plan to wait until we are done
with #11997 and it (hopefully) gets merged before opening the PR
about identifiers.

@Octachron
Copy link
Copy Markdown
Member

@nojb , we don't offer support for the internal runtime API either, this still doesn't stop us from caring about eventual breakage there. Moreover, for a module with function sthat look useful in the standard library, there is a non-zero risk that people start to use it independently on how scary is the module name.

@shindere: if the purpose was just to make it is easier to build a proof-of-concept with the module in the standard library, this is perfectly fine. However, the module should really be moved to its own separate library beyond the proof-of-concept stage.

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023 via email

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 7, 2023 via email

@shindere
Copy link
Copy Markdown
Contributor Author

Now that #11997 has been merged, the present PR has been rebased on top of
it to keep track of the
complete not-yet-merged emancipation of dynlink from compilerlibs history.

Simultaneously, #12031 has just been opened to propose a second bit
of this history that may be worth being discussed and reviewed in isolation.

@shindere shindere force-pushed the emancipate-dynlink-from-compilerlibs branch 2 times, most recently from c13e7fd to 1f80a01 Compare February 23, 2023 07:40
@lthls
Copy link
Copy Markdown
Contributor

lthls commented Feb 24, 2023

I've had an occasion to discuss with @shindere on this subject, and for clarification I'll restate my position here: I'm 100% in favour of reducing the dependencies of Dynlink, but I think that as a library that manipulates artifacts produced by the compiler it will always naturally depend on (a subset of) compiler-libs, and we shouldn't try to hide that.
If the only thing we want is to stop building the special copy of compiler-libs for dynlink, we could reconsider linking against compiler-libs normally. Most users should have namespaced their own modules correctly by now, so I think the risks of conflicts are low. And we could always consider namespacing compiler-libs ourselves in a more systematic way (like we did for the stdlib) if we still want to support users with conflicting modules.

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 24, 2023 via email

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Feb 24, 2023

I'd just like to register my disagreement with Vincent. The dependency on compiler-libs has always been a huge pain, and seems entirely unnecessary. I strongly support this PR.

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Feb 24, 2023 via email

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented May 16, 2024 via email

@gasche
Copy link
Copy Markdown
Member

gasche commented May 16, 2024

Yes, my position is that reviewing this stuff is a pain, and that we should limit the number of people that we send do it. Either @damiendoligez is concerned with something and he should clarify, or he actually believes that the code is fine and he should approve. It is a waste to send someone completely new to do all the work of getting into the water again.

On the other hand, personally I find the request to have an explicit comment about code being duplicated rather reasonable -- this sounds like something that could be useful later on, to improve our chances of thinking of synchronizing the code. If this is too cumbersome for @shindere, would someone else be willing to do it?

@shindere shindere force-pushed the emancipate-dynlink-from-compilerlibs branch 2 times, most recently from 911539f to 8d35c9f Compare May 27, 2024 15:33
@shindere
Copy link
Copy Markdown
Contributor Author

I have just rebased this PR on latest trunk to resolve the (trivial)
conflicts in configure and otherlibs/dynlink/Makefile.

I also took this opportunity to update the associated Changes entry by
moving it to the right place (that is, from 5.2 to 5.3) and crediting
the reviewers.

As the discussion is quite long, I may very well have forgotten somebody and
would appreciate to be told, so that all the reviewers can be
credited as deserved.

@gasche
Copy link
Copy Markdown
Member

gasche commented May 27, 2024

You listed me in the entry but I don't remember reviewing the current iteration of this PR. It started as a much larger PR and I reviewed part of it (#12031), but not the current version/PR, you should remove the mention of me. Looking at the discussion again, I only see recognizable review activity from @hhugo and @damiendoligez.

@shindere shindere force-pushed the emancipate-dynlink-from-compilerlibs branch from 8d35c9f to 97e00cd Compare May 27, 2024 17:16
@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented May 27, 2024 via email

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Jun 6, 2024

I just merged this PR from @Octachron. It annotates the Dynlink_symtable module with #line directives showing where the code comes from and has a script to regenerate it from its source modules, which mechanically proves that it only duplicates code.

Big thanks to Florian for this very nice contribution.

With this addition, I hope all the blockers have been removed and this can (finally) be considered for merging. :)

@Octachron
Copy link
Copy Markdown
Member

Note that some functions of the module Dynlink_symtable are specialized version of functions from the Dll and Symtable, so there is a bit a new code, but those specialization are now commented.

This commit introduces a dedicated configured Dynlink_config module that
contains only the configuration values needed by dynlink.

Co-authored-by: Stephen Dolan <sdolan@janestreet.com>
@shindere shindere force-pushed the emancipate-dynlink-from-compilerlibs branch from 249023a to 13571f8 Compare June 17, 2024 10:40
Copy link
Copy Markdown
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a briefing from the PR authors, then reviewed again, this time with confidence that I understand what's going on.

A few small changes and this is good to go.

else begin
seek_in ic compunit.cu_debug;
[| (Compression.input_value ic : Instruct.debug_event list) |]
[| (Compression.input_value ic) |]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unmarshalling without a type constraint is a big red flag, and it's easy enough to add in this case.

Suggested change
[| (Compression.input_value ic) |]
[| (Compression.input_value ic : instruct_debug_event list) |]

(* *)
(* Florian Angeletti, projet Cambium, Inria Paris *)
(* *)
(* Copyright 2021 Institut National de Recherche en Informatique et *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* Copyright 2021 Institut National de Recherche en Informatique et *)
(* Copyright 2024 Institut National de Recherche en Informatique et *)

(* *)
(* Florian Angeletti, projet Cambium, Inria Paris *)
(* *)
(* Copyright 2021 Institut National de Recherche en Informatique et *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(* Copyright 2021 Institut National de Recherche en Informatique et *)
(* Copyright 2024 Institut National de Recherche en Informatique et *)

@Octachron
Copy link
Copy Markdown
Member

Before merging, there are few line directives which should be updated, and the symtable.ml might use a very small bit of refactoring to reduce the imprint of the formatting module used. (I have discussed those changes with @shindere this afternoon.)

shindere and others added 4 commits June 18, 2024 08:08
Co-authored-by: Stephen Dolan <sdolan@janestreet.com>
This commit, kindly provided by Florian Angeletti as part of his review
of PR ocaml#11996, adds line directives to

  otherlibs/dynlink/byte/dynlink_symtable.ml

to make it possible to verify where each fragment of that file comes from.

(See next commit for tooling to automate this verification.)
Also kindly provided by Florian Angeletti as part of his review of
PR ocaml#11996, this commit automates the verification that
  otherlibs/dynlink/byte/dynlink_symtable.ml
can indeed be re-generated from its bytecomp sources.

Run the verification with:

  make sync_dynlink
@shindere shindere force-pushed the emancipate-dynlink-from-compilerlibs branch from f8ac549 to 1244e1a Compare June 18, 2024 06:21
@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Jun 18, 2024 via email

@shindere
Copy link
Copy Markdown
Contributor Author

shindere commented Jun 18, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants