Relocatable OCaml - Searching and Suffixing#14245
Conversation
shym
left a comment
There was a problem hiding this comment.
I won’t have more time this week to finish my review of this PR, I reviewed all the commits up to the bootstrap. It’s, again, a very impressive piece of work, congratulations!
I think I dug deeper than what I did for 14244, or maybe it’s just that I needed less context to grasp what was happening. And it always made perfect sense.
Apart from the following, my remarks are attached to the corresponding lines.
- 130074e commit message doesn’t do that commit justice: it contains a whole lot more than just a
configureupdate (which is pretty nice, by the way :-D)
bytecomp/bytelink.ml
Outdated
| | Config.Executable, bindir -> | ||
| Executable, bindir | ||
| | Config.Shebang sh, bindir -> | ||
| Shebang_bin_sh (Option.value ~default:"sh" sh), bindir |
There was a problem hiding this comment.
Reading the documentation for launch_method in Config.mli, I expected to find a case that would yield Shebang_runtime. Is that not the case because, when -launch-method is not specified, Shebang_runtime is the default launch method when possible (which is how I understand the logic in write_header)?
There was a problem hiding this comment.
This is legacy from #12751 (i.e. that -launch-method is just replacing some of the information which was read from the header before). Before, runtime-launch-info was parsed early in the process, mainly to report errors as soon as possible, but that doesn't matter now - I think this block can be folded into the logic at L472 and will consequently look a lot less unclear, yes! The evil part is the wildcard pattern in L488-489:
| _ ->
Executablewhich subtly encodes the fallback to the executable header if sh can't be found when needed.
utils/config.mli
Outdated
| | Shebang of string option (** Use shebang-style launcher, either directly to | ||
| the runtime, or via sh. The parameter if | ||
| specified is the full path to sh, otherwise the | ||
| linker searches for it. *) |
There was a problem hiding this comment.
Could the documentation be updated to specify when the runtime will be used? (Related to my comment on bytelink.ml)
There was a problem hiding this comment.
Yes, I see what you mean - I guess I was thinking of it in terms of just selecting between executable / shebang where shebang has a little bit of a extra metadata (the way to find sh, if it's needed) but it makes sense to make it explicit that that's what's used when the path to the runtime itself is not valid in a shebang.
utils/config.mli
Outdated
| type search_method = | ||
| | Absolute (** Check fixed absolute location only *) | ||
| | Absolute_then_search (** Check fixed absolute location, but perform a search | ||
| if that fails *) | ||
| | Search (** Always search for the interpreter *) |
There was a problem hiding this comment.
In my recurring fixation with names, I’d suggest to pick only one name in each of the pairs Config vs command-line:
Config.search_methodvs-runtime-searchConfig.Absolutevsdisable- etc.
An alternate proposition: type runtime_search = Always | Never | Lazily.
I would let the documentation in Config (and the commandline help, as far as it must be) expand what it all means.
There was a problem hiding this comment.
The fixation is most helpful, thank you! For the constructors vs the command line parameter, I think I was going with the idea that the name (enable, disable, fallback) specified user intent where the constructor (Absolute, Absolute_then_search, Search) specified implementation. That possibly made more sense when this was being read from runtime-launch-info and all the same file, but I guess it does make sense split across command-line parsing, Config, and so forth to be using one name.
But in which case, is using the CLI names OK - i.e. is -runtime-search enable|disable|fallback OK, and use those to drive the names of and in the type? (I prefer fallback to lazily, FWIW, though as ever I'm not super attached to names!)
There was a problem hiding this comment.
But in which case, is using the CLI names OK - i.e. is
-runtime-search enable|disable|fallbackOK, and use those to drive the names of and in the type?
Yes, I think that would be nice.
(I prefer fallback to lazily, FWIW, though as ever I'm not super attached to names!)
Yes, fallback is a good choice. Lazily is maybe a bit too λ-calculus-y 😅
bytecomp/bytelink.ml
Outdated
|
|
||
| (* Writes the shell script version of the bytecode launcher to outchan *) | ||
| let write_sh_launcher outchan bin_sh bindir search runtime = | ||
| let open struct type tag = D | A | E end in |
There was a problem hiding this comment.
write_sh_launcher is pretty fun to read, even though I must say I’m not entirely convinced that to keep the scripts factorised that way will be helpful for maintenance (with the l function and the extra comment, it might even run longer?). I would personally unshare at least D from A and E.
Anyway, changing the tags to
| let open struct type tag = D | A | E end in | |
| let open struct type tag = DAE | AE | E end in |
would have helped me decode the meaning of the tags and the generation code, I think.
There was a problem hiding this comment.
I can possibly dig out one of the old commits where these were separated - there was an intermediate state where they got updated a second time, and it was really unclear (that's why I ended up rewriting it this way).
I wasn't clear what you mean by "it might even run longer" - the code's physically longer, or it's slower? (I'm not sure I'm bothered in either case - the main thing was to be able to physically see the entire shells script).
There was a problem hiding this comment.
The revised constructor names are a great idea, ta!
There was a problem hiding this comment.
I wasn't clear what you mean by "it might even run longer" - the code's physically longer, or it's slower?
I meant physically longer, because to share the D’s 2 lines you need to define exec.
(I don’t expect anything noticeable regarding speed there)
bytecomp/bytelink.ml
Outdated
| if tag = D || tag = A && search <> Config.Absolute | ||
| || tag = E && search = Config.Absolute_then_search then begin | ||
| output_string outchan (String.trim s); | ||
| output_char outchan '\n' | ||
| end |
There was a problem hiding this comment.
I would find it easier to read that way, I think:
| if tag = D || tag = A && search <> Config.Absolute | |
| || tag = E && search = Config.Absolute_then_search then begin | |
| output_string outchan (String.trim s); | |
| output_char outchan '\n' | |
| end | |
| match (tag, search) with | |
| | D, _ | |
| | A, (Search | Absolute_then_search) | |
| | E, Absolute_then_search -> | |
| output_string stdout (String.trim s); | |
| output_char stdout '\n' | |
| | _ -> () |
(especially if the tag and search constructors were renamed to match)
Makefile
Outdated
| runtime/$(1)$(EXE) \ | ||
| "$(INSTALL_BINDIR)/$(call MANGLE_RUNTIME_NAME,$(1))" | ||
| ifeq "$(SUFFIXING)" "true" | ||
| cd "$(INSTALL_BINDIR)" && \ | ||
| $(LN) "$(TARGET)-$(1)-$(BYTECODE_RUNTIME_ID)$(EXE)" "$(1)$(EXE)" | ||
| cd "$(INSTALL_BINDIR)" && \ | ||
| $(LN) "$(TARGET)-$(1)-$(BYTECODE_RUNTIME_ID)$(EXE)" \ | ||
| "$(1)-$(ZINC_RUNTIME_ID)$(EXE)" |
There was a problem hiding this comment.
A couple of lines are not starting with a Tab. I didn’t know make was accepting continuation lines not starting with Tab, but that breaks visual indentation for me. Did you mean to write:
| runtime/$(1)$(EXE) \ | |
| "$(INSTALL_BINDIR)/$(call MANGLE_RUNTIME_NAME,$(1))" | |
| ifeq "$(SUFFIXING)" "true" | |
| cd "$(INSTALL_BINDIR)" && \ | |
| $(LN) "$(TARGET)-$(1)-$(BYTECODE_RUNTIME_ID)$(EXE)" "$(1)$(EXE)" | |
| cd "$(INSTALL_BINDIR)" && \ | |
| $(LN) "$(TARGET)-$(1)-$(BYTECODE_RUNTIME_ID)$(EXE)" \ | |
| "$(1)-$(ZINC_RUNTIME_ID)$(EXE)" | |
| runtime/$(1)$(EXE) \ | |
| "$(INSTALL_BINDIR)/$(call MANGLE_RUNTIME_NAME,$(1))" | |
| ifeq "$(SUFFIXING)" "true" | |
| cd "$(INSTALL_BINDIR)" && \ | |
| $(LN) "$(TARGET)-$(1)-$(BYTECODE_RUNTIME_ID)$(EXE)" "$(1)$(EXE)" | |
| cd "$(INSTALL_BINDIR)" && \ | |
| $(LN) "$(TARGET)-$(1)-$(BYTECODE_RUNTIME_ID)$(EXE)" \ | |
| "$(1)-$(ZINC_RUNTIME_ID)$(EXE)" |
There was a problem hiding this comment.
That was intentional, indeed - in fact, I think check-typo might reject your suggestion? The tab is only required at the start of the recipe line itself. It should work if tabstop is set to 2 😉
There was a problem hiding this comment.
Actually, looking again, it may have been intentional, but it's inconsistent! I've corrected them 🫣
stdlib/Makefile
Outdated
| @{ printf '$(if $(MANGLING),$(ZINC_RUNTIME_ID_HI),\000)'; \ | ||
| cat $^; } > $@ |
There was a problem hiding this comment.
Another missing Tab.
| @{ printf '$(if $(MANGLING),$(ZINC_RUNTIME_ID_HI),\000)'; \ | |
| cat $^; } > $@ | |
| @{ printf '$(if $(MANGLING),$(ZINC_RUNTIME_ID_HI),\000)'; \ | |
| cat $^; } > $@ |
There’s another in 3a625a9, but it vanishes post-bootstrap.
Makefile
Outdated
| runtime/lib$(1)_shared$(EXT_DLL) \ | ||
| "$(INSTALL_LIBDIR)/$(call MANGLE_RUNTIME_DLL_NAME,$(1),$(2))" | ||
| ifeq "$(SUFFIXING)" "true" | ||
| cd "$(INSTALL_LIBDIR)" && \ | ||
| $(LN) "$(call MANGLE_RUNTIME_DLL_NAME,$(1),$(2))" \ | ||
| "lib$(1)_shared$(EXT_DLL)" |
There was a problem hiding this comment.
Other missing Tabs.
| runtime/lib$(1)_shared$(EXT_DLL) \ | |
| "$(INSTALL_LIBDIR)/$(call MANGLE_RUNTIME_DLL_NAME,$(1),$(2))" | |
| ifeq "$(SUFFIXING)" "true" | |
| cd "$(INSTALL_LIBDIR)" && \ | |
| $(LN) "$(call MANGLE_RUNTIME_DLL_NAME,$(1),$(2))" \ | |
| "lib$(1)_shared$(EXT_DLL)" | |
| runtime/lib$(1)_shared$(EXT_DLL) \ | |
| "$(INSTALL_LIBDIR)/$(call MANGLE_RUNTIME_DLL_NAME,$(1),$(2))" | |
| ifeq "$(SUFFIXING)" "true" | |
| cd "$(INSTALL_LIBDIR)" && \ | |
| $(LN) "$(call MANGLE_RUNTIME_DLL_NAME,$(1),$(2))" \ | |
| "lib$(1)_shared$(EXT_DLL)" |
| if Config.suffixing then | ||
| [Misc.RuntimeID.shared_runtime Sys.Native] | ||
| else | ||
| ["-lasmrun_shared"] |
There was a problem hiding this comment.
I suppose this is intentionally not Load_path.find "libasmrun..." (I miss context to know whether that could make any difference but it seems to bring the asmlink closer to the bytelink).
There was a problem hiding this comment.
Both Bytelink and Asmlink are linking the _shared variant in the same way (with -l) - I've left well alone on the difference in load_path stuff, just because the history of why they're that way wasn't totally clear to me (and _shared is an abuse of -runtime-variant that I'd like to fix some day... static/shared linking should be an option for all the runtime variants!)
bytecomp/dll.mli
Outdated
| (* Extract the name of a DLLs from its external name (xxx.so or -lxxx) *) | ||
| val extract_dll_name: string -> string | ||
| val extract_dll_name: bool * string -> string |
There was a problem hiding this comment.
IIUC, the extension (.so) is no longer supported when in suffixed mode, aka when the bool is true. (It’s unclear why the suffix is no longer supported, though).
There was a problem hiding this comment.
The suffix wasn't embedded before either, I think? It's a nod to portability that we specify the name of the dll (e.g. dllunixbyt) and Unix appends .so and Windows appends .dll. In suffixed mode, I was figuring that you're really just giving the central portion and allowing the runtime to do all of the remaining demangling - the problem with allowing .so is that it means the runtime has to insert characters before the extension which seemed a bit odd?
There was a problem hiding this comment.
Sorry I wasn’t clear: I don’t think supporting .so extensions is useful either; but I’d suggest to update the comment to what is actually supported.
shym
left a comment
There was a problem hiding this comment.
End of my review, with only tiny extra comments.
testsuite/tools/harness.mli
Outdated
| (** Indicates whether bytecode executables in the compiler distribution | ||
| use a launcher that is capable of searching PATH to find ocamlrun. At | ||
| present, only native Windows has this behaviour. *) | ||
| a launcher that is capable of searching PATH to find ocamlrun. This |
There was a problem hiding this comment.
Slight over-removal in fee65df
| a launcher that is capable of searching PATH to find ocamlrun. This | |
| use a launcher that is capable of searching PATH to find ocamlrun. This |
bytecomp/bytelink.ml
Outdated
| l E {|fi |}; | ||
| l A {|if test -z "$c"; then |}; | ||
| l A {| echo 'This program requires an OCaml %s interpreter'>&2|} release; | ||
| l A {| echo "$r not found either with $0 or in \$PATH">&2 |}; |
There was a problem hiding this comment.
I vaguely wonder whether “with” might not be clear enough. Would “alongside” be better? Or explicitly “in the directory of”?
There was a problem hiding this comment.
"alongside" sounds better, indeed!
driver/main_args.ml
Outdated
| "<method> Specify the mechanism for the bytecode launcher:\n\ | ||
| \ exe - use the executable launcher in runtime-launch-info\n\ | ||
| \ sh - use a #!, using sh if the interpreter path cannot be used\n\ | ||
| \ /path/interpreter - use #!, or the given sh-compatible \ |
There was a problem hiding this comment.
| \ /path/interpreter - use #!, or the given sh-compatible \ | |
| \ /path/interpreter - use #!, or the given sh-compatible\n\ |
There was a problem hiding this comment.
I thought that this was formatting correctly without that one, but perhaps not? I'll double-check
There was a problem hiding this comment.
Oh no, indeed - corrected!
|
Thanks, @shym! I'll endeavour to rebase this and then put the various fixups inline as extra commits to ease checking (I'll double-check whether the tabs-in-Makefiles is definitely what I'd had in mind as well!) |
4cbc8b8 to
1bbf1ea
Compare
|
Rebased - review responses to follow |
1bbf1ea to
7018d70
Compare
|
How's that looking, @shym? I still need to update the man pages and manual for |
7018d70 to
abd3e4c
Compare
|
manpages and documentation now updated, too |
f4222b9 to
fa12746
Compare
fa12746 to
f43e35c
Compare
|
Surfacing a side-channel discussion with @shym - I've updated the code around |
shym
left a comment
There was a problem hiding this comment.
I went through the range-diff and the “Review” commits (thank you so much for that clean update!): I managed to find only tiny typos:
- in e808946 commit message: is there an “is” missing, in “Misc.RuntimeID is added…”?
- and the two inlined suggestions.
So this looks good to me (with, as you say, a preference for a D/F/E version)!
driver/main_args.ml
Outdated
| Printf.sprintf | ||
| " Control the way the bytecode header searches for the interpreter\n\ | ||
| \ The following settings are supported:\n\ | ||
| \ disable use a fixed absolute path to the nterpreter\n\ |
There was a problem hiding this comment.
| \ disable use a fixed absolute path to the nterpreter\n\ | |
| \ disable use a fixed absolute path to the interpreter\n\ |
utils/config.mli
Outdated
| | Disable | ||
| (** Interpreter searching disabled - check fixed absolute location only *) | ||
| | Enable | ||
| (** Check fixed absolute location first, but fallback to a search if that |
There was a problem hiding this comment.
Wordnet suggests “fallback” is only a noun, not a verb; but it may be overly conservative.
| (** Check fixed absolute location first, but fallback to a search if that | |
| (** Check fixed absolute location first, but fall back to a search if that |
damiendoligez
left a comment
There was a problem hiding this comment.
Approved after interactive review; a few changes pending.
1d44ac7 to
f6240ed
Compare
|
(rebased prior to addressing @damiendoligez's review comments) |
8e876f9 to
0d887f5
Compare
|
I messed up the renaming of the options for The changes made so far can be seen in here. |
Minor tweaks needed to allow configuring, say, for "$PWD/install'd here"
ocamlobjinfo now parses both RNTM and shebang lines in order to display the runtime being used by a bytecode executable.
When linking a normal bytecode executable, allows an explicit selection of either the executable or shebang header, regardless of the value in runtime-launch-info.
-launch-method encapsulates the first line of runtime-launch-info. The argument to -launch-method is extended slightly to encompass the second line, thus `-launch-method 'sh /usr/local/bin'` represents the default runtime-launch-info file on Unix. Additional fields are added to Config so that the installed compiler simply uses default values, rather than reading the two lines from runtime-launch-info. The build of the compiler itself explicitly uses `-launch-method`, which leaves only the executable launcher compiled from stdlib/header.c in runtime-launch-info.
dd4a619 to
7176a48
Compare
-runtime-search {disable|enable|always} adds new features to the
launcher used for bytecode executables which do not embed their own
runtime. By default, the header continues to behave as before - the
launcher will attempt to start the runtime using the absolute path which
the compiler was configured with.
The new search mode will then search for the runtime first in the
directory containing the running executable and then in PATH.
The configuration of a given version of OCaml is now described using a combination of the host triplet and a new ID value (documented in runtime/Mangling.md). Misc.RuntimeID added to manipulate these IDs and configure calculates the values required for the bytecode and native runtimes. These are then intended to be used to mangle filenames so that different configurations which store files in public search paths cease interfering with each other.
New option --disable-suffixing controls whether the build should use any of the computed values for mangling its own files.
New names for libcamlrun_shared.so and libasmrun_shared.so without the _shared suffix and using the target triplet and runtime ID. Both ocamlc and ocamlopt explicitly recognise `-runtime-variant _shared` and select the correct name. Symbolic links for libcamlrun_shared.so and libasmrun_shared.so to allow any C programs which linked against the the output of `-output-obj` to continue to work.
ocamlc -dllib-suffixed appends the runtime's host triplet and bytecode runtime ID to the supplied name when searching for the DLL, and records the base name only in .cma / executable files. ocamlmklib -suffixed instructs ocamlmklib to use -dllib-suffixed when generating .cma files instead of -dllib. The effect is that stub libraries built this way have names which will be unique for a given configuration of OCaml and so will be ignored by other runtimes.
boot/ocamlc now supports everything that the main compiler supports.
--enable-runtime-search controls the -runtime-search setting used to build the compiler's own bytecode executables; --enable-runtime-search-target controls the default value of -runtime-search that ocamlc itself uses.
7176a48 to
7e71861
Compare
|
Hopefully final rebase. Thank you @shym and @damiendoligez for the reviewing for this one as well! It's going through precheck#1089 and, assuming nothing gets thrown up by any of the final CI checks, let's be relocatable... |
|
Merged. Congratulations @dra27! |
This is the third of three PRs which implement Relocatable OCaml as proposed in ocaml/RFCs#53. Bytecode executables (including those produced when building the compiler distribution itself) usually contain an absolute path for the location of
ocamlrun, which is incompatible with Relocatable OCaml. The patches here provide an alternate mechanism for these executables to find the interpreter without needing its absolute location. This change is combined with a name mangling scheme which is used both for the bytecode interpreter executables' filenames and holistically to fix long-standing issues with the naming of shared libraries (both the shared runtime libraries and shared bytecode C stub libraries). Together, the patches address section 2 of the RFC.There are several mechanisms for linking bytecode executables. This PR is exclusively concerned with standalone bytecode executables, which are those where the compiled bytecode image is prefixed with a launcher, but not with the OCaml runtime interpreter itself. This launcher can be a simple "shebang" line (e.g.
#!/path/to/ocamlrun) or a small executable, compiled fromstdlib/header.c. In this case, "standalone" refers to the image being in a separate from the runtime, rather than that the executable itself is standalone. There are situations where the interpreter itself cannot be used in a shebang line, and in this case, the compiler today instead emits a tiny shell script using#!/bin/shas the interpreter.Windows does not support shebang executables, always using the executable stub. In order to assist the old binary distributions of Windows OCaml, the Windows version of the executable stub has always performed a
PATH-search forocamlrun. However, this was done at a time where it was expected that a user would have a single installation of a single version of OCaml on their system, which is no longer true. It is very much the case today that anocamlruninPATHhas little to no guarantee of being theocamlrunrequired by a given bytecode executable on a user's machine. Therefore, a name mangling scheme is also proposed - i.e. we increase the ways in which a bytecode executable may seek to find its runtime, but by refining the name of the file it's searching for, we increase the chances of finding the correct binary and of multiple installations of OCaml not interfering with each other. Fundamentally, this simply means that two different versions of OCaml (either a different release, or a relevantly different configuration) have different names for the bytecode interpreter. There are two crucial consequences to this: the error messages when things do go wrong are much better (being along the lines of "I can't find an interpreter for OCaml 5.5" rather than "bad magic number", "symbol not found" or just a segfault) and it also stops things from "silently working" and then suddenly failing one day because a release of OCaml happened to add a new function to theUnixlibrary. This name mangling scheme is likewise applied to the shared libraries loaded by the interpreter.The key changes are:
ocamlc,-launch-method, allows dynamic selection of either the shebang (#!/usr/bin/ocamlrun) or executable-stub launcher for standalone bytecode executables. This option allows the metadata in theruntime-launch-infofile to be removed.ocamlc,-runtime-search, allows a new mechanism to be specified for the header of standalone bytecode executables where instead of only executingocamlrunfrom a fixed location, they are instead able to search for it.ocamlrun, etc.).ocamlmklib,-suffixedand-no-suffixed, and a new command line option allows the metadata in theruntime-launch-infofile to be removed forocamlc,-dllib-suffixed, provide a mechanism for using this name mangling scheme for bytecode C stub shared libraries. This mechanism is transparent to the user, for example#use "unix.cma"continues to work in the toplevel, but the interpreter executing the toplevel (i.e.ocamlrun) searches for a DLL based on its configuration.configureoption,--enable-suffixing, which is enabled by default uses this name mangling scheme for the bytecode interpreter executables and the shared library versions of the OCaml runtime. In particular, this means that thebindirectory of two different OCaml compilers may appear inPATHand thelib/ocamldirectory of two different OCaml runtimes inLD_LIBRARY_PATHbut executables compiled for either of those versions of OCaml continue to load correctly.configureoptions,--enable-runtime-searchand--enable-runtime-search-target, control how the bytecode executables of the compiler distribution and those produced by the compiler distribution respectively search for the runtime. In particular,--enable-runtime-search[=always]builds a compiler whose bytecode executables will continue to work correctly if the compiler is moved or copied to a new location after installation.The commit series is in three phases:
Searching
The goal of the commits in this first phase of the series is to provide the ability to have the launcher not require the absolute location of the interpreter. Fundamentally, this involves extending both
stdlib/header.cand thesh-script produced bybytecomp/bytelink.ml. As with-set-runtime-defaultin #14244, this is a facility which is needed during by some user executables (in particular, any bytecode executables installed in a "relocatable" opam switch) but not by others. This is a subtlety which I missed in the original implementation of this in 2021, which added the configuration to theruntime-launch-infofile, providing the ability to build the compiler distribution with relocatable bytecode binaries (by settingstdlib/runtime-launch-infoappropriately), but forcing executables produced by that compiler distribution either to be all relocatable or all not relocatable (by settingstdlib/target_runtime-launch-info). There is therefore a clear need for a command line option to select the search mode of the launcher.Additionally, with most of the changes in this PR needing to be made equivalently between
bytecomp/bytelink.ml(in POSIX Shell Command Language) andstdlib/header.c, it's desirable to be able to test executables produced with both the shebang launcher and executable launcher on the same system, but the only way to control this option is by changingruntime-launch-infofile. It'd be just about acceptable to have to do that for the test harness, but it hints at the desirability for a command line option to select between the shebang launcher and executable launcher.Having accepted that a command line option is needed to control the search mode of the launcher, it then seems strange to be encoding a default value for it in
stdlib/target_runtime-launch-info, rather than in theConfigmodule. Similarly, having accepted the addition of a command line option to select between shebang/executable for the launcher, it made me revisit the design in #12751 forruntime-launch-info. In addition to the launcher kind (shebang/executable),runtime-launch-infoalso contains the configured installation location of the binaries (which, for the installedruntime-launch-infofile will also matchConfig.bindirinocamlcommon) and the executable launcher itself. Given that both launcher kind and search mode are proposed to be conveyable by command line option, it seems to me to be sensible to add a command line mechanism to convey the location of the runtime interpreter executables toocamlcand changeruntime-launch-infoto be just the compiled executable fromstdlib/header.c, with the default values for launcher kind, search mode and binary directory residing where they belong in theConfigmodule. This makesruntime-launch-infoa cross-compilation concern only, and eliminates any difference betweenboot/ocamlcand./ocamlcduring the build. The only caveat is that when linking we must always be explicit about the launcher kind,Config.bindirand the search mode becauseboot/ocamlccannot have defaults for these. I think using command line options this way is not only simpler than the #12751's use ofruntime-launch-infobut is semantically simpler than thecamlheaderfiles which it replaced. That change is therefore made as part of this commit series, but the explanation is here to motivate it why it has been done and also to make it clear that it's a necessary change to make, especially as the older implementation not doing it this way still exists. The underlying principle is thatruntime-launch-infocontains only things are which properties of the library around it and not things which can be altered when the compiler is invoked (i.e.stdlib/header.chas been compiled for a given configuration).shis incorrect on Solaris. An indentation error of a large part of a function in Relocatable OCaml - test harness #14014 slipped through but, more importantly, the error reporting in the bytecode binaries test left something to be desired - especially given that the aforementioned Solaris problem triggered an failure in this test.bytecomp/bytelink.ml. Previously,-use-runtimeand-runtime-variantwere processed first yielding a booleanuse_runtimewhich indicated whether-use-runtimehad been specified and a value forruntime. If-use-runtimewas not specified and the compilation is not for Windows, then theruntimevalue is appended to the configured to the location for the interpreters specified inruntime-launch-infoi.e."ocamlrun"computed in the first check becomes"/usr/local/bin/ocamlrun"in the second step on Unix, but remains"ocamlrun"on Windows. However, if-use-runtimeis specified, then the value is unaltered. This is all a bit obtuse (and I think probably my fault originally...), and it's much clearer to combine these.--prefix. If someone can, they probably will (in this case, it was useful to test that the various de-quoting functions needed in the harness would not trip over single quotes in the values themselves).ocamlobjinfois extended to display information on the runtime of a standalone bytecode executable (thetools/ocamlsizePerl script already has this ability). Given the various changes with name mangling, this seems a very pertinent piece of information to be able to obtain. For the executable launcher, this is simply the content of theRNTMbytecode section. For the shebang launcher, it has to be read from the shebang or shell script. Code to do this is already present in the test harness from Relocatable OCaml - test harness #14014, but because this parsing gets more complex with subsequent changes, it's instead rewritten as a lexer with the new functionByterntm.read_runtime.-launch-methodis added toocamlcand then used in the in-prefix tests so that systems which support shebang scripts test both the executable launcher and the shebang launcher. The option takes a single parameter which exactly corresponds to the first line ofruntime-launch-info.-launch-methodis then extended to support an additional part of the argument specifying the directory containing the binaries so that-launch-method 'sh /home/user/bin'simultaneously conveys both that a shebang launcher is to be used and that the interpreters reside in/home/user/bin. Once bootstrapped,-launch-methodcan now be used inboot/ocamlcto remove the need for the first two lines ofboot/runtime-launch-info.Config.target_bindirandConfig.launch_methodcontain the two lines otherwise added tostdlib/target_runtime-launch-info. The rest is plumbing, with the removal of all the parsing code forruntime-launch-infoand a certain amount of temporary plumbing to cope with whetherboot/ocamlchas been bootstrapped or not.-runtime-searchis then implemented, which provides three modes for locating the bytecode interpreter.disableis the existing behaviour, and requires the runtime to be located at the absolute location the compiler was configured with.alwaysinstead first looks in the directory containing the bytecode executable itself and then searchesPATH, if necessary. In particular, binaries installed in an opam switch'sbindirectory always use the runtime in that switch'sbindirectory. Finally,enableprovides a hybrid of both approaches where the bytecode executable looks for the runtime in the absolute location the compiler was configured with (thedisablemode), then looks in the directory containing the bytecode executable, then searches inPATH(thealwaysmode). The-launch-methodand-runtime-searchoptions together make it trivial to test all 6 combinations in the test harness. At this stage, Windows defaults to-runtime-search alwayswhere Unix defaults to-runtime-search disable, which just about corresponds to behaviour of the Windows executable launcher, with one slight improvement. Previously, the runtime (e.g.ocamlrun) was searched inPathusingSearchPathW. The new behaviour first checks the directory containing the bytecode executable, which is a general usability improvement (this would have been useful, for example, when the change was originally made, as it would have removed the need to putC:\Program Files\OCaml\binintoPath, for example) and also brings the search behaviour more into line withLoadLibraryW(https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryw) which, when loading a DLL, first looks in the directory containing the executable.Suffixing
With the compiler now able to search for its runtimes, the next commits introduce an approach to name mangling and use it to suffix the filenames of the bytecode interpreters, shared runtime libraries and shared bytecode stub libraries.
I have attempted to document the mangling scheme in-tree in
runtime/Mangling.md. In summary:--enable-flambda,disable-flat-float-array,--without-zstd, etc.)int63constants, shared libraries, etc.)ocamlruninterpreter then loads.sobased on the bits affecting the bytecode runtime - which includes bits which do not affect the execution of the bytecode itself. In particular,.sofiles are mangled with the machine tripletThe sequence:
--disable-suffixingadded toconfigureto keep the existing behaviour.ocamlrunis now installed asxxxx-ocamlrun-bbbbwherexxxxis the triplet that the runtime executes on andbbbbis the Bytecode Runtime ID. This executable is symlinked asocamlrunand also asocamlrun-zzzzwherezzzzis the Zinc Runtime ID. When the tree is configured with--enable-suffixing,bytecomp/bytelink.mlusesocamlrun-zzzzrather thanocamlrunwhen determining the runtime name. Note that-use-runtimeis unaffected - if the runtime to use is explicitly stated then it overrides the name mangling. The information for the Zinc Runtime ID is split into two halves - the low bits, consisting of the release number and, intentionally, bits which are always zero, are universal and so come fromConfig(these are the first two characters). The high bits, which are configuration-specific, are put inruntime-launch-info(recall from the earlier stripping of information - this is information which cannot be dynamically changed when invoking the compiler) and merged bybytecomp/bytelink.ml.libasmrun.soandlibcamlrun.sothen get the same treatment, becominglibasmrun-xxxx-bbbb.soandlibcamlrun-xxxx-nnnn.sowith a symlink created with the original unmangled name. The use of-runtime-variantfor switching enabling the shared runtime has been something of a hack since it was originally added, but that's for another day - for now, bothasmcomp/asmlink.mlandbytecomp/bytelink.mlrecognise-runtime-variant "_shared"correctly mangle the name.dllunixbyt.sobecomesdllunixbyt-xxxx-bbbb.so. The implementation is mostly mechanical. Although cma format is updated, note that the bootstrap can be delayed becauseboot/ocamlcis, by definition, only ever passed cma files which havelib_dllibs = [], so it doesn't matter thatboot/ocamlchas the wrong "type", because it will never see a list.Bootstrap and utilisation
The final phase of the commits allows the compiler to use all of these features. Each one of the introduced changes in the second phases notionally requires a bootstrap, but the commit series has been organised such that only a single bootstrap is required, with a series of
elegant workaroundsgross hacks then removed in the following commit. Finally, the plumbing to implement--enable-runtime-searchand--enable-runtime-search-targetis available, along with the updates to the tests. Note that in the interests of sanity, but not for any particular implementation reasons,--enable-runtime-searchand--enable-runtime-search-targetboth require--enable-suffixing(i.e. the escape hatch is there to allow all of this to be disabled, but it must then all be disabled).