Emancipate dynlink from compilerlibs#11996
Conversation
a3cec01 to
4e61d24
Compare
This file does not contain any code, you should look at the size of |
|
Nicolás Ojeda Bär (2023/02/07 02:11 -0800):
> the size of dynlink.cmxa is divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).
This file does not contain any code, you should look at the size of
`dynlink.a`.
My bad, sorry! Then the decrease in size is from 4,3M to 128K.
|
This change sounds like something that would be very useful for camlboot. I would be interested in considering it as a separate buildup PR that could be discussed and reviewed separately. |
|
Of course, one possibility is to split this PR in several smaller / easier to chew ones. The aim of this PR is to provide a complete story, it does not necessarily mean it has to be merged all at once, assuming it's appreciated. |
I think these two changes (at least) should be split off into separate PRs since they have independent interest and can be reviewed in a self-contained manner. |
|
Gabriel Scherer (2023/02/07 02:21 -0800):
> Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals.
This change sounds like something that would be very useful for
[camlboot](https://github.com/Ekdohibs/camlboot).
Oh! Nice if it has another positive outcome!
I would be
interested in considering it as a separate buildup PR that could be
discussed and reviewed separately.
Shall I go ahead and split that out right now or rather wait a bit?
One thing to keep in mind about this change is that it has an impact on
`ocamlobjinfo`, more precisely on the way constants are displayed.
Indeed, structured constants carry some type information which are used
to display constants in a nice way. With this change, this type
information is lost and the way constants are printed becomes slightly
less friendly (well, in other words, untyped).
|
|
What is the motivation for adding an internal dynlink module to the standard library? This move seems to go in the opposite direction of the rest of the PR. This is an internal module to the compiler that exposes internal implementation details of the compiler, ,which is not used by other modules in the standard library. Do we really want to make this module visible to all users of the standard library and start offering backward-compatibility guarantee for this module? |
|
Nicolás Ojeda Bär (2023/02/07 02:32 -0800):
> - Translating the structured constants into their final representation at compile-time rather than at runtime, to simplify the way they are stored in cmo files and make their stored representaiton more independent from the compilers' internals
> - Changing the way identifiers are stored in cmo files, to use only standard types rather than their representation as used by the type-checker
I think these two changes (at least) should be split off into separate
PRs since they have independent interest and can be reviewed in a
self-contained manner.
Okay, will do.
|
|
Florian Angeletti (2023/02/07 02:34 -0800):
What is the motivation for adding an internal dynlink module to the
standard library? This move seems to go in the opposite direction of
the rest of the PR. This is an internal module to the compiler that
exposes internal implementation details of the compiler, ,which is not
used by other modules in the standard library. Do we really want to
make this module visible to all users of the standard library and
start offering backward-compatibility guarantee for this module?
Well, I think the high level goal was to have a place from which it's
easy to share code.
It was part of Stephen's suggestion to add thisshared module to the
standard library but it may be possible to add it to the compiler
itself, I guess it would be enough to achieve the purpose.
|
I thought we didn't offer any guarantees at all for the |
| when starting the running program. *) | ||
| val init_toplevel: string -> unit | ||
|
|
||
| val reset: unit -> unit |
There was a problem hiding this comment.
The fact that it is unused doesn't mean it should be removed. I think we have a reset function every time there is a mutable global state.
bytecomp/symtable.ml
Outdated
| | Const_block(tag, fields) -> | ||
| let block = Obj.new_block tag (List.length fields) in | ||
| let pos = ref 0 in | ||
| List.iter |
tools/dumpobj.ml
Outdated
| Reloc_literal sc -> print_obj sc; printf "\n" | ||
| | Reloc_getglobal id -> printf "require %s\n" (Ident.name id) | ||
| | Reloc_setglobal id -> printf "provide %s\n" (Ident.name id) | ||
| | Reloc_getglobal id | Reloc_getpredef id -> |
There was a problem hiding this comment.
Maybe a newline between the or pattern so that the printf can be aligned with others and we can check the number of spaces is correct.
otherlibs/dynlink/byte/dynlink.ml
Outdated
|
|
||
| let unsafe_get_global_value ~bytecode_or_asm_symbol = | ||
| let id = Ident.create_persistent bytecode_or_asm_symbol in | ||
| let id = bytecode_or_asm_symbol in |
There was a problem hiding this comment.
this binding is no longer useful
otherlibs/dynlink/byte/dynlink.ml
Outdated
| let fold_initial_units ~init ~f = | ||
| List.fold_left (fun acc (comp_unit, interface) -> | ||
| let id = Ident.create_persistent comp_unit in | ||
| let id = comp_unit in |
There was a problem hiding this comment.
This binding is no longer useful
bytecomp/meta.mli
Outdated
| type bytecode | ||
| external reify_bytecode : | ||
| bytes array -> Instruct.debug_event list array -> string option -> | ||
| bytes array -> 'a -> string option -> |
There was a problem hiding this comment.
I think you should try again to use an abstract type and resolve the issue in toplevel.ml using the newType.Equal
|
@nojb , we don't offer support for the internal runtime API either, this still doesn't stop us from caring about eventual breakage there. Moreover, for a module with function sthat look useful in the standard library, there is a non-zero risk that people start to use it independently on how scary is the module name. @shindere: if the purpose was just to make it is easier to build a proof-of-concept with the module in the standard library, this is perfectly fine. However, the module should really be moved to its own separate library beyond the proof-of-concept stage. |
|
Many thanks for all your feedback Hugo and for having taken the time to
review and done so in such a prompt way.
All your suggestions make a lot of sense to me and I will make sure to
integrate them as it goes and to give feedback for each of them (not all
of them have been integratedyet).
hhugo (2023/02/07 03:17 -0800):
+(* Translate structured constants *)
+
+let rec transl_const = function
+ Const_base(Const_int i) -> Obj.repr i
+ | Const_base(Const_char c) -> Obj.repr c
+ | Const_base(Const_string (s, _, _)) -> Obj.repr s
+ | Const_base(Const_float f) -> Obj.repr (float_of_string f)
+ | Const_base(Const_int32 i) -> Obj.repr i
+ | Const_base(Const_int64 i) -> Obj.repr i
+ | Const_base(Const_nativeint i) -> Obj.repr i
+ | Const_immstring s -> Obj.repr s
+ | Const_block(tag, fields) ->
+ let block = Obj.new_block tag (List.length fields) in
+ let pos = ref 0 in
+ List.iter
`List.iteri` ?
Yes, absolutely, you are right. Thanks. As you can see this has been
integrated to #11997.
|
|
Florian Angeletti (2023/02/07 04:59 -0800):
@shindere: if the purpose was just to make it is easier to build a
proof-of-concept with the module in the standard library, this is
perfectly fine. However, the module should really be moved to its own
separate library beyond the proof-of-concept stage.
What I understand is that it's important that allthe clients use the
very same symbol table. I think it is for this reasoon that Stephen has
suggested a module in the standard library. I, personnally, am not
attached to any particular solution so whatever will work and make
people happy will be okay for me.
|
4e61d24 to
c4dc1d0
Compare
|
Now that #11997 has been merged, the present PR has been rebased on top of Simultaneously, #12031 has just been opened to propose a second bit |
c13e7fd to
1f80a01
Compare
|
I've had an occasion to discuss with @shindere on this subject, and for clarification I'll restate my position here: I'm 100% in favour of reducing the dependencies of Dynlink, but I think that as a library that manipulates artifacts produced by the compiler it will always naturally depend on (a subset of) compiler-libs, and we shouldn't try to hide that. |
|
Many thanks for having taken the time to state your position publicly,
Vincent.
I don't know how exactly we should proceed from here, whether e.g. we
should organise a short dedicated online meeting, but what I can
tell is that it would help if we could make a decision, something I
can be reasonably sure we all agree on before continuing to invest
time in the project at the risk of doing it in a direction which
ends up not being the right one.
|
|
I'd just like to register my disagreement with Vincent. The dependency on compiler-libs has always been a huge pain, and seems entirely unnecessary. I strongly support this PR. |
|
Leo White (2023/02/24 09:21 -0800):
I'd just like to register my disagreement with Vincent. The dependency
on compiler-libs has always been a huge pain, and seems entirely
unnecessary. I strongly support this PR.
Many thanks, Leo.
The devil is in the details, so I am fully ocnvinced that some parts of
the history as it currently is are ugly and will need to be improved,
that's where I'd really appreciate any feedback I can receive.
|
|
Leo White (2024/04/26 07:34 -0700):
Can we just press merge now, or is there something else we are waiting
for?
The PR got discussed during yesterday's triaging meeting.
There was the request for a review to make sure that the refactoring
does preserve the previous behaviour.
It has also been suggested that the added module somehow embeds
references to the code fragments it duplicates, either as comments or as
`#line` directives.
I felt slightly reluctant to do so, first because i am slightly
unconvinced of the interest and exhausted of working on this PR, second
because the actual author of this portion of the code is @stedolan.
During the triaging meeting, @gasche also pushed for @damiendoligez to
do the review, rather than us finding another reviewer.
After having slept on all that (notmuch, though), I came to the idea
that the best way to go might be that @damiendoligez and @stedolan go
over the code together, in a synchronous way.
Anyway, thanks for your interest, Leo!
|
|
Yes, my position is that reviewing this stuff is a pain, and that we should limit the number of people that we send do it. Either @damiendoligez is concerned with something and he should clarify, or he actually believes that the code is fine and he should approve. It is a waste to send someone completely new to do all the work of getting into the water again. On the other hand, personally I find the request to have an explicit comment about code being duplicated rather reasonable -- this sounds like something that could be useful later on, to improve our chances of thinking of synchronizing the code. If this is too cumbersome for @shindere, would someone else be willing to do it? |
911539f to
8d35c9f
Compare
|
I have just rebased this PR on latest trunk to resolve the (trivial) I also took this opportunity to update the associated Changes entry by As the discussion is quite long, I may very well have forgotten somebody and |
|
You listed me in the entry but I don't remember reviewing the current iteration of this PR. It started as a much larger PR and I reviewed part of it (#12031), but not the current version/PR, you should remove the mention of me. Looking at the discussion again, I only see recognizable review activity from @hhugo and @damiendoligez. |
8d35c9f to
97e00cd
Compare
|
I wanted to be generous in crediting and mention all those who helped the discussion to progress in one way or another but I went with your suggestion.
|
|
I just merged this PR from @Octachron. It annotates the Big thanks to Florian for this very nice contribution. With this addition, I hope all the blockers have been removed and this can (finally) be considered for merging. :) |
|
Note that some functions of the module |
This commit introduces a dedicated configured Dynlink_config module that contains only the configuration values needed by dynlink. Co-authored-by: Stephen Dolan <sdolan@janestreet.com>
249023a to
13571f8
Compare
damiendoligez
left a comment
There was a problem hiding this comment.
I got a briefing from the PR authors, then reviewed again, this time with confidence that I understand what's going on.
A few small changes and this is good to go.
otherlibs/dynlink/byte/dynlink.ml
Outdated
| else begin | ||
| seek_in ic compunit.cu_debug; | ||
| [| (Compression.input_value ic : Instruct.debug_event list) |] | ||
| [| (Compression.input_value ic) |] |
There was a problem hiding this comment.
Unmarshalling without a type constraint is a big red flag, and it's easy enough to add in this case.
| [| (Compression.input_value ic) |] | |
| [| (Compression.input_value ic : instruct_debug_event list) |] |
tools/sync_dynlink.ml
Outdated
| (* *) | ||
| (* Florian Angeletti, projet Cambium, Inria Paris *) | ||
| (* *) | ||
| (* Copyright 2021 Institut National de Recherche en Informatique et *) |
There was a problem hiding this comment.
| (* Copyright 2021 Institut National de Recherche en Informatique et *) | |
| (* Copyright 2024 Institut National de Recherche en Informatique et *) |
tools/sync_dynlink.mli
Outdated
| (* *) | ||
| (* Florian Angeletti, projet Cambium, Inria Paris *) | ||
| (* *) | ||
| (* Copyright 2021 Institut National de Recherche en Informatique et *) |
There was a problem hiding this comment.
| (* Copyright 2021 Institut National de Recherche en Informatique et *) | |
| (* Copyright 2024 Institut National de Recherche en Informatique et *) |
|
Before merging, there are few line directives which should be updated, and the |
Co-authored-by: Stephen Dolan <sdolan@janestreet.com>
This commit, kindly provided by Florian Angeletti as part of his review of PR ocaml#11996, adds line directives to otherlibs/dynlink/byte/dynlink_symtable.ml to make it possible to verify where each fragment of that file comes from. (See next commit for tooling to automate this verification.)
Also kindly provided by Florian Angeletti as part of his review of PR ocaml#11996, this commit automates the verification that otherlibs/dynlink/byte/dynlink_symtable.ml can indeed be re-generated from its bytecomp sources. Run the verification with: make sync_dynlink
f8ac549 to
1244e1a
Compare
|
Thanks a lot for the approval!
And thanks a lot, too, for the very relevant suggestions which are taken
into account by the updated version of the PR.
The type annotation was not possible at the time when the initial change
was done, hence its mere removal but it's definitely better style to
restore it and it addresses a request by @hhugo earlier in the
discussion.
|
|
Florian Angeletti (2024/06/17 07:56 -0700):
Before merging, there are few line directives which should be updated,
and the `symtable.ml` might use a very small bit of refactoring to
reduce the imprint of the formatting module used. (I have discussed
those changes with @shindere this afternoon.)
This has been integrated, too.
|
As is well-known, currently the dynlink other library depends on compilerlibs. More precisely, it has to embed a copy of compilerlibs, meaning that compilerlibs has to be compiled twice, its dynlink-embedded version being compiled with
-for-pack.This PR, based on a suggestion by @stedolan, is a sequence of changes that emancipate dynlink from compilerlibs, making it depend on a smaller module,
CamlinternalDynlink, which is added to the standard library.Roughly speaking, the involved changes are the following ones:
bytecompand the other used for dynamic loadin gof objects by bothdynlinkand the toplevels, moved to theCamlinternalDynlinkstandard library module.Each of the two first changes has required a compielr bootstrap. After these two changes, the types used to represent CMO and CMA files only use standard OCaml types and do not rely on any type internal to the compiler.
Currently, the PR works, meaning that itpasses the testsuite in both lambda and flambda modes.
On my machine, not compiling compilerlibs twice saves ~6s of total build time, which goes down from
4m58,261sto4m52,251s. The size ofdynlink.cmais reduced from 4,1M to 112K and the size ofdynlink.cmxais divided by two, going down from 8K to 4K (these numbers feel surprisingly small to me).I didn't undertake any diff testing so far because I wanted to gather some input first and make sure the version I'd use for diff testing would be as close as possible to what peope would agree to merge, especially given that rebasing this on
5.0is non-trivial and could introduce errors, in particular because of the changes to thebytepackagermodule that happened after5.0had been branched.Among the aspects that will IMO require some discussion, there is the way the
Symtablemodule has been split. Currently, much of its code is duplicated and it'd certainly be good to avoid such a duplication as much as possible, given that, as far as symbol table is concerned, it may not be possible to make the toplevel as independent of compilerlibs as dynlink because it more heavily relies on theLambdamodule, thus oon compiler internals.Obviously this PR is best approached in a per-commit way.