Skip to content

Relocatable OCaml - --with-relative-libdir#14244

Merged
gasche merged 17 commits intoocaml:trunkfrom
dra27:enable-relative
Dec 8, 2025
Merged

Relocatable OCaml - --with-relative-libdir#14244
gasche merged 17 commits intoocaml:trunkfrom
dra27:enable-relative

Conversation

@dra27
Copy link
Copy Markdown
Member

@dra27 dra27 commented Sep 15, 2025

This is the second of three PRs which implement Relocatable OCaml as proposed in ocaml/RFCs#53. The series of changes in this PR combine to allow the absolute location of the Standard Library (e.g. /usr/lib/ocaml) to be removed from both the C runtime (ocamlrun and libcamlrun.a, etc.) and also from the Config module in the ocamlcommon compiler-libs library. The patches address sections 3 & 4 of the RFC.

The key changes are:

  • A new primitive %standard_library_default allows an OCaml program to determine the default of the Standard Library as a compile-time constant. In particular, it means that rather than Config.standard_library_default being fixed once when config.ml itself is compiled, it allows the compiler to determine its value each time a program is linked.
  • A new command line option, -set-runtime-default allows the calculated value for the %standard_library_default primitive to be overridden when linking an executable.
  • A configure option, --with-relative-libdir, which allows the compiler to be configured to expect to find the Standard Library in a location specified relative to where the compiler is running from. When ./configure is run with no arguments, the default location of the Standard Library is /usr/local/lib/ocaml and binaries are installed to /usr/local/bin. The equivalent relative compiler would be configured with --with-relative-libdir=../lib/ocaml.
  • Utilisation of both the BUILD_PATH_PREFIX_MAP support added in Honor BUILD_PATH_PREFIX_MAP #1515 and usage of -fdebug-prefix-map-style options to the C compiler to make the compiler's artefacts considerably more reproducible when --with-relative-libdir has been specified.

These changes necessitate considerable churn in the start-up routines for the bytecode runtime, and in the way argv[0] is being processed. While this code is in motion, there are several additional changes which aren't a strict requirement of the main change, but are here because this is all being shaken up:

  • Building the mingw-w64 port using the MSYS2 improves slightly, in that configure now correctly recognises that it is a Cygwin-like environment, and uses cygpath et al as necessary.
  • On Windows, if backslashes are included in --prefix (e.g. ./configure --prefix='C:\OCaml') then these are preserved in the resulting compiler (this implements a slightly more sensible version of Display paths using backslashes on Windows #658). Apart from grinding my own axe where this is concerned, it prevents "mixed" slash paths (which look particularly amateur) from being generated when the compiler is configured --with-relative-libdir.
  • There is an obscure, but nonetheless present, bug in the order in which the bytecode runtime processes the various options for starting up. In particular, if an executable has been compiled with -custom, it is possible to direct the resulting executable to load a different bytecode image by manipulating argv[0]. This issue is fixed here, and on normal systems (where caml_executable_name is implemented), an executable compiled with -custom only loads the bytecode image it was compiled with and can no longer be directed to a load a different one.
  • As part of improving reproducibility, the Cygwin version of ocamlopt now calls as directly, just as Linux does, rather than going via gcc.
Technical background The crux of this PR is this pair of innocuous-looking lines:

https://github.com/dra27/ocaml/blob/0728f6af2aae32a97c2a7a1214c25736a26a479b/runtime/dynlink.c#L91
and
https://github.com/dra27/ocaml/blob/0728f6af2aae32a97c2a7a1214c25736a26a479b/utils/config.generated.ml.in#L23
which, by default, are:

  if (stdlib == NULL) stdlib = "/usr/local/lib/ocaml";

and

let standard_library_default = {|/usr/local/lib/ocaml|}

Here, /usr/local is the installation prefix. A consequence of being relocatable, as defined in the RFC, is that this string cannot appear (in any encoding!) in the compiler binaries. At present, the string is present in the Config module and in the bytecode runtime. Its presence in the bytecode runtime means that essentially all bytecode executables contain the location of the OCaml Standard Library, either directly, if compiled with -output-obj, -output-complete-exe, etc., or through the ocamlrun executable, if compiled with default options or -custom. Conversely, native executables only contain the location of the OCaml Standard Library if they link the Config module from the ocamlcommon.cmxa1.

For the compiler or runtime, installed to /usr/local/bin/ocamlopt or /usr/local/bin/ocamlrun, it is straightforward for ocamlopt/ocamlrun to determine the directory containing the running executable (/usr/local/bin) and to instead contain the location ../lib/ocaml and thus combine the two to derive /usr/local/lib/ocaml. In the code, ../lib/ocaml is the default value, which is static and may be explicit-relative or absolute, where the calculated /usr/local/lib/ocaml is the effective value, which is dynamic and must be absolute. Thus, at module initialisation in:

let standard_library =
try
Sys.getenv "OCAMLLIB"
with Not_found ->
try
Sys.getenv "CAMLLIB"
with Not_found ->
standard_library_default

we can instead calculate the effective value "/usr/local/lib/ocaml" from the default standard_library_default = {|../lib/ocaml|}.

If only it were that straightforward! The calculation of the effective location requires the caller to agree to be invoked from a binary located in a specific place relative to the effective location. The compiler is not the only consumer of Config.standard_library. Consider the trivial program display.ml:

Printf.printf "Config.standard_library = %S\n" Config.standard_library

compiled with:

$ pwd
/home/dra/work
$ command -v ocamlopt
/usr/local/bin/ocamlopt
$ ocamlopt -o display -I +compiler-libs ocamlcommon.cmxa display.ml
$ ocamlopt -where
/usr/local/lib/ocaml
$ ./display
/usr/local/lib/ocaml

Today, the two final commands display the same result. If Config.standard_library always uses the effective value as the default for the Standard Library location, ocamlopt -where will continue to display /usr/local/lib/ocaml but ./display will suddenly display /home/dra/work/../lib/ocaml (or something similar, but nonetheless not /usr/local/lib/ocaml).

Finally, bytecode executables pose an additional problem. Let us extend the trivial program slightly:

Unix.realpath Config.standard_library
|> Printf.printf "Config.standard_library = %S\n"

compiled with:

$ /usr/local/bin/ocamlc -o display -I +unix -I +compiler-libs unix.cma ocamlcommon.cma display.ml

Supposing we have two installations of the same version and configuration of OCaml, one in /usr/local and another in ~/.opam/default. In this case, we have:

$ /usr/local/bin/ocamlrun -config
standard_library_default: /usr/local/lib/ocaml
...
shared_libs_path:
  /usr/local/lib/ocaml/stublibs

$ /usr/local/bin/ocamlc -where
/usr/local/lib/ocaml

$ ~/.opam/default/bin/ocamlrun -config
standard_library_default: /home/dra/.opam/default/lib/ocaml
...
shared_libs_path:
  /home/dra/.opam/default/lib/ocaml/stublibs

$ /home/dra/.opam/default/bin/ocamlc -where
/home/dra/.opam/default/lib/ocaml

Now, executing ./display involves loading dllunixbyt.so from the stublibs directory. In these two invocations:

$ /usr/local/bin/ocamlrun ./display
/usr/local/lib/ocaml

$ ~/.opam/default/bin/ocamlrun ./display
/usr/local/lib/ocaml

there are two important things we reasonably expect to happen:

  1. ./display will display the same path in each case, which will be /usr/local/lib/ocaml (Config.standard_library for the compiler it was built with)
  2. Each ocamlrun will load dllunixbyt.so from its own installation - i.e. /usr/local/bin/ocamlrun will load /usr/local/lib/ocaml/stublibs/dllunixbyt.so and ~/.opam/default/bin/ocamlrun will load ~/.opam/default/lib/ocaml/stublibs/dllunixbyt.so

This leads to what I think is not an entirely obvious distinction. There are two locations to consider: the default location of the Standard Library for the runtime, and the default Standard Library location for the mutator. Config.standard_library_default refers to the mutator (i.e. program's view), but this may not necessarily be the same value as the runtime needs for OCAML_STDLIB_DIR in runtime/dynlink.c. In practice, this only affects standalone bytecode images - i.e. the situation where the runtime executable and the bytecode image are in separate locations. In all other compilations (including native code; though the native runtime doesn't ever care about the location of the Standard Library), there are still two values, but they are always the same.

The changeset is best reviewed commit-by-commit (and, I'm afraid, armed with the "Technical background" explanation...):

  • enum caml_byte_program_mode is augmented with a new APPENDED option which is used for -custom. caml_main then uses caml_byte_program_mode to differentiate between a #!-style "standalone" bytecode image running via ocamlrun and a -custom executable. While moving the code around, I renamed the COMPLETE_EXE enumeration constant to EMBEDDED as the mode is also used with -output-obj.
  • The runtime-launch-info is one of two places where the binary directory, rather than library directory is embedded in a file. The format for runtime-launch-info is trivially extended to recognise . as referring to the directory containing the compiler (note that runtime-launch-info is not a generally-configurable file - supporting arbitrary relative paths here would be a hypothetical installation where the runtime executables are installed to a different location from the compiler, which isn't supported or needed at present).
  • A symbol caml_runtime_standard_library_default for the runtime default value, principally so that OCAML_STDLIB_DIR is only referred to in once place.
  • %standard_library_default is the most involved change, introducing a compile-time constant to retrieve the mutator default value. In native code, the Standard Library location is only present if the Config module is linked. ocamlopt therefore has to create the string as part of linking an executable, which is done using a similar mechanism to the caml_apply, caml_curry and caml_send functions - a new field in the cmx header records that the compilation unit uses the %standard_library_default primitive. At link-time, if any of the compilation units has set this flag, the linker creates caml_standard_library_nat containing the location of the standard library and synthesises references to it. For bytecode, a similar technique is used, except that the linker already has the full list of %-primitives which are used by the program, so there is no need for cmo format to be changed. There are various strategies on offer for exactly where and how the value is stored. It is needed to link the bytecode runtime (i.e. the value has to be included in the same places where C tables of primitives and so forth are generated). It is also has to be included in bytecode images where no C is being produced. Given that -output-complete-exe and so forth share the same value for both runtime and mutator, for bytecode images I've opted to add a new OSLD bytecode section containing the mutator value, which is read to caml_standard_library_default (not to caml_runtime_standard_library_default). Then, as with the other compile-time constants, it's just a matter of adding another caml_sys_const_ primitive in runtime/sys.s which returns an OCaml copy of that string. I've proposed this as a %-primitive, rather than a "known" C primitive for two reasons. Firstly, the behaviour of this is a compile-time constant (i.e. something which the compiler must work out when linking the program), and the other compile-time constants are % primitives. Secondly, the standard library location only wants to be embedded when it's actually needed; if the native runtime used a C primitive, there would have to be a sentinel value or a default for caml_standard_library_nat - it seems worse to have a C primitive sat in the runtime which could return an invalid value (i.e. caml_sys_const_standard_library_default always returns a correct value in bytecode, but if exposed in native code, it would be possible to end up with a program which called it, but got a default empty string back). Given that these primitives are never intended to be called by user-code (because of the difference in linking), unlike in Always define caml_startup et al in bytecode, as already done in native code #13465, I think it's better to have the C primitive for bytecode only, and the completely synthesise the function in ocamlopt only when needed.
  • %standard_library_default provides a mechanism which means that the string constant carved into utils/config.ml when the compiler was built can now be determined when the compiler is run. The default value determined by the compiler is simply that same absolute path. The -set-runtime-default provides a mechanism to change that default value when linking a specific program.
  • These two features together combine to allow Config.standard_library_default to be changed to be the result of %standard_library_default. This mechanism is a necessary consequence of %standard_library_default for cross-compilation. When linking a cross-compiling version of ocamlopt, that compiler is built with a compiler which uses a host standard library but the resulting cross-compiler it's linking should default to a different target standard library (see the change in Makefile.cross).
  • At this point, %standard_library_default means that "/usr/local/lib/ocaml" has now moved the Config module to the executables themselves. Where before, ocamlopt and ocamlc.byte both had the string location linked in code via Config.standard_library, it's now instead embedded in caml_standard_library_nat for ocamlopt (and ocamlc.opt) and in an OSLD section for ocamlc.byte. The next commit then allows explicit-relative values of this path to be interpreted by both the runtime and compiler. This is then activated using -set-runtime-default, so that caml_standard_library_nat is changed to be "../lib/ocaml". Most of the complexity here arises from the fact that the bytecode C runtime needs to be able to do this. Especially in the light of the work on ld.conf in Relocatable OCaml - explicit-relative paths in ld.conf #14243, rather than implementing the logic in both C and OCaml, the logic is implemented just in C and the Config module uses a new caml_sys_get_stdlib_dirs primitive, which is passed the result of %standard_library_default and returns both the effective value and also the directory containing the executable (which is used to implement Config.bindir for ocamlmklib). Note, in passing, that ocamlmklib on Windows now searches for tools in the same way as on Unix, as there's no need for the PATH-search previously there. The C implementation itself is in caml_locate_standard_library which is implemented separately for Unix and Windows in runtime/unix.c and runtime/win32.c. The tools used to implement this (dirname, realpath, etc.) are not equivalently available on Windows, by which I mean that the functions don't have exactly equivalent semantics. In particular, an exact dirname is not available on Windows, but GetFullPathName has the required semantics for this specific use and actually has an option to return the dirname (albeit indirectly). On this occasion, it therefore seemed easier to make the entire "work out the location of the Standard Library" operation platform-specific, than to go to more effort to construct platform-specific building blocks for a generic version of this function.
  • The CI matrix for pull requests is altered to test a mix of absolute/relative builds by default. An additional test is added to deal with the "cross-runtime" example the Technical background above. Specifically, after building the compiler, it is duplicated to a new location, and a check is added to ensure ocamlc.byte -where produces the expected result regardless of whether the original or copied ocamlrun is used. If the CI: Full matrix label of GHA: add an optional wider test matrix (Cygwin, static, minimal, etc.) #14013 is added, then each of the jobs builds an additional compiler with the alternate configuration (i.e. the jobs which by default are --without-relative-libdir then have an additional --with-relative-libdir build created, and vice versa). These two compilers are likewise used for the "cross-runtime" ocamlc.byte -where test.
  • Finally, there's a little bit of additional plumbing added to go from "Relocatable" to "Reproducible" by utilising BUILD_PATH_PREFIX_MAP and directly adding -ffile-prefix-map to our internal C flags. Ignoring #! lines and RNTM sections (which are addressed in the final PR), when building with --with-relative-libdir, none of the binaries in the build contain either the build path or the installation prefix on Windows (with mingw-w64) or Linux.

Footnotes

  1. Note that prior to Emancipate dynlink from compilerlibs #11996 in 5.3.0, native code executables which linked dynlink.cmxa also contained the location of OCaml Standard Library through the copy of the Config module in the Dynlink_compilerlibs library.

@dra27 dra27 added run-crosscompiler-tests Makes the CI run the Cross compilers test workflow CI: Full matrix Makes the CI test a bigger set of configurations labels Sep 15, 2025
@dra27 dra27 added the relocatable towards a relocatable compiler label Sep 15, 2025
Copy link
Copy Markdown
Contributor

@shym shym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve spent a little bit of time reading through this impressive PR, if only because it touches upon cross compilation :-)
My reading of the test part of the PR was really perfunctory, I dug deeper in the other parts but I can’t pretend I always understand what’s at stake. The best comment I can make already is that I didn’t find anything suspicious :-)
Anyway, I thought I’d post my review as it is now, as I have a couple of small remarks that came up during that read.

Comment on lines +485 to +489
if (caml_byte_program_mode != APPENDED || proc_self_exe == NULL) {
/* First, try argv[0] (when ocamlrun is called by a bytecode program) */
exe_name = argv[0];
fd = caml_attempt_open(&exe_name, &trail, 0);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naïve question: it’s unclear to me why the condition isn’t just if (proc_self_exe == NULL). Shouldn’t proc_self_exe take always precedence over argv[0] when it can be used?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many cases that I don't think any question can be naïve! I think some of the commit message for 5395c37 should be in the comment here - my idea here is that when we compile a -custom executable, this only works if a /proc/self/exe mechanism is available.

The rationale behind this is because there's actually an obscure, but nonetheless present, security issue with a -custom executable, since you can abuse argv[0] to cause it to load a different bytecode image from the one actually appended to it.

I haven't done it in this PR, but gpakosz/whereami gives us all the code needed to have caml_executable_name work on all platforms we support.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so many cases…

my idea here is that when we compile a -custom executable, this only works if a /proc/self/exe mechanism is available.

I agree with that principle. I would have expected then that the condition would be an && so that argv[0] is never used in APPENDED mode.

While I’m digging again in that code, I wonder something.
IIUC, caml_main is only ever called in APPENDED and STANDARD (where it opens itself to behave as APPENDED if it finds itself the concatenation of ocamlrun with some bytecode executable) modes, never in EMBEDDED. Now that you distinguish APPENDED from STANDARD, could attempt_open be really run only in the APPENDED case and the control flow be simplified?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I realise my comment above describes what I think we should be doing after we know that caml_executable_name always works on all platforms... the || at the moment is a relaxation for platforms where caml_executable_name is not implemented. So that means on Linux/macOS/native-Windows, where proc_self_exe will never be NULL, that means it can't be executed but on, say, Cygwin/*BSD, etc. there is still the fallback.

For caml_attempt_open, I think it's worth considering this w.r.t. #13465, as in that I'm suggesting that EMBEDDED also go through caml_main (but not caml_attempt_open, obviously) for the EMBEDDED case.

EMBEDDED /* bytecode embedded in C (e.g. -output-complete-exe/-output-obj) */
};

extern enum caml_byte_program_mode caml_byte_program_mode;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly and caml_byte_program_mode could now be declared const?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (good catch!)

Comment on lines +203 to +204
| ({ui_need_stdlib = true; _}, _, _) -> true
| _ -> false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| ({ui_need_stdlib = true; _}, _, _) -> true
| _ -> false
| ({ui_need_stdlib; _}, _, _) -> ui_need_stdlib

maybe?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - I hope that was from it having been more complicated at some point in the past 🫣

(* Link in a compilation unit *)

let link_compunit output_fun currpos_fun inchan file_name compunit =
let link_compunit accu output_fun currpos_fun inchan file_name compunit =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the name of the new parameter is the only one not conveying its meaning. acc_need_stdlib is too long, acc_stdlib might be enough?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, although this combines with a follow-up PR in #14247 where fold_primitive below is now:

let fold_primitive (needs_stdlib, uses_dynlink) name =

so it really just an accumulator being threaded?

Comment on lines +528 to +546
let standard_library_default =
if standalone && needs_stdlib then
(* -set-runtime-default *)
if !Clflags.standard_library_default = None then
Some Config.standard_library_effective
else
!Clflags.standard_library_default
else
(* -custom executables don't need OSLD sections - the correct value
is already included in the runtime. *)
None
in
begin match standard_library_default with
| Some value ->
(* OCaml Standard Library Default location *)
output_string outchan value;
Bytesections.record toc_writer OSLD
| None -> ()
end;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suprised to see this chunk that way (building an option to destruct it immediately) instead of something like what is in emit_runtime_standard_library_default, namely something like:

Suggested change
let standard_library_default =
if standalone && needs_stdlib then
(* -set-runtime-default *)
if !Clflags.standard_library_default = None then
Some Config.standard_library_effective
else
!Clflags.standard_library_default
else
(* -custom executables don't need OSLD sections - the correct value
is already included in the runtime. *)
None
in
begin match standard_library_default with
| Some value ->
(* OCaml Standard Library Default location *)
output_string outchan value;
Bytesections.record toc_writer OSLD
| None -> ()
end;
if standalone && needs_stdlib then begin
let standard_library_default =
Option.value
(* -set-runtime-default *)
!Clflags.standard_library_default
~default:Config.standard_library_effective
in
(* OCaml Standard Library Default location *)
output_string outchan standard_library_default;
Bytesections.record toc_writer OSLD
end;
(* else: -custom executables don't need OSLD sections - the correct value is
already included in the runtime. *)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - this particular part (the whole OSLD mechanism) went through so many iterations, I've just ended up with code that's much more obtuse than it needs to be!

Makefile Outdated
rm -f $(FLEXDLL_SOURCE_DIR)/flexlink.exe
$(MAKE) -C $(FLEXDLL_SOURCE_DIR) $(FLEXLINK_BUILD_ENV) \
OCAMLOPT='$(FLEXLINK_OCAMLOPT) -nostdlib -I ../stdlib' flexlink.exe
OCAMLOPT='$(FLEXLINK_OCAMLOPT) -nostdlib -I ../stdlib $(SET_RELATIVE_STDLIB)' flexlink.exe
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, to shorten the line (if I’m not mistaken about USE_STDLIB)?

Suggested change
OCAMLOPT='$(FLEXLINK_OCAMLOPT) -nostdlib -I ../stdlib $(SET_RELATIVE_STDLIB)' flexlink.exe
OCAMLOPT='$(FLEXLINK_OCAMLOPT) $(USE_STDLIB) $(SET_RELATIVE_STDLIB)' \
flexlink.exe

configure.ac Outdated
AS_CASE([$build_bindir_to_libdir],
[./*],[libdir="$bindir${build_bindir_to_libdir[#].}"],
[../*],[libdir="$bindir/$build_bindir_to_libdir"],
[AC_MSG_ERROR([--with-relative-libdir requires a relative path])])])],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[AC_MSG_ERROR([--with-relative-libdir requires a relative path])])])],
[AC_MSG_ERROR(m4_normalize([--with-relative-libdir requires an explicit
relative path, starting with either ./ or ../]))])])],

runtime/unix.c Outdated
/* If realpath fails, use the non-normalised path for error messages. */
if (resolved_candidate != NULL) {
caml_stat_free(candidate);
/* caml_realpath uses malloc */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* caml_realpath uses malloc */
/* realpath uses malloc */

runtime/win32.c Outdated
Comment on lines +1396 to +1397
UNC paths are returned as \\?\UNC\ - in this case, reuse the \\ from
\\?\. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to say that the last \ in \\?\UNC\ is reused?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah - you may be unsurprised that that comment relates to a different iteration where it was copying the string back on itself - I've tried to reduce all the manually-rolled string functions and switched to caml_stat_wcsdup instead, but missed that comment!

p "version" version;
p "standard_library_default" standard_library_default;
p "standard_library_default" standard_library_effective;
p "standard_library_relative" standard_library_relative;
Copy link
Copy Markdown
Contributor

@shym shym Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor remark: Config.standard_library_relative is a bool but ocamlopt -config_var standard_library_relative is a string; maybe it doesn’t matter though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually ended up using Config.standard_library_relative anywhere (in the compiler itself or in any of the adapted programs), but it felt worth having it as an explicitly stated property. We could indeed rename the string here. At the moment, you have a relocatable compiler if test -n $(ocamlopt -config-var standard_library_relative) but it could instead be relocatable if test $(ocamlopt -config-var standard_library_raw) != $(ocamlopt -config-var standard_library_default), say?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should have noted also that ocamlopt -config-var standard_library_default is Config.standard_library_effective rather than ...default. I’m wary of the same names being used to stand for different things between the Config module and the -config cli option, which was really my point (having an empty value correspond to a false makes sense to me).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see what you mean. Perhaps it's just better to have Config.standard_library_default (newly introduced in this PR) / ocamlopt -config-var standard_library_default (which already exists) being the "effective value" (and therefore always an absolute path) and Config.standard_library_relative as a string option - so it's None if the compiler is configured with an absolute libdir, and Some "../lib/ocaml", etc., otherwise? ocamlopt -config-var standard_library_relative then keeps its current interpretation (which maps quite readily from a string option)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would indeed make a lot of sense to me!

configure.ac Outdated
Comment on lines +667 to +668
[AS_IF([test x"$withval" = 'xno'],
[bindir_to_libdir=''],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quality-of-life suggestion (that could wait for later):

Suggested change
[AS_IF([test x"$withval" = 'xno'],
[bindir_to_libdir=''],
[AS_CASE([$withval],
[no],
[bindir_to_libdir=''],
[yes],
[bindir_to_libdir=../lib/ocaml],

(it would also need a nested AS_CASE to yield a Windows path when applicable in the yes case, I suppose).
Noticed this while building a cross-compiler to Windows with relative libdir, which seemed to work as expected (ie I cannot rename the directory in which it is installed, but IIUC only the combination with the other PRs would make that work).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a brilliant idea, thanks! It gets rid of the two cases in the opam file, too (I absolutely want the backslashes on Windows, but I didn't like having to do that so explicitly in the opam file)

mutable ui_force_link: bool; (* Always linked *)
mutable ui_for_pack: string option } (* Part of a pack *)
mutable ui_for_pack: string option; (* Part of a pack *)
mutable ui_need_stdlib: bool} (* caml_standard_library_nat needed *)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the release process for .cmx changes? In the semver world this would be a backwards-incompatibility change, and that would require a major version bump. OCaml obviously doesn't do that. Instead, I think it implicitly treats each minor version change as possibly backwards-incompatible, and explicitly bumps Misc.Magic_number when necessary??

Do we need to bump Misc.Magic_number in this commit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic numbers are bumped unconditionally at each release, so that compiler artifacts are never compatible across different major versions of the compiler.

| Word_size -> cst make_const_int (8*B.size_int)
| Int_size -> cst make_const_int (8*B.size_int - 1)
| Max_wosize -> cst make_const_int ((1 lsl ((8*B.size_int) - 10)) - 1)
| Ostype_unix -> cst make_const_bool (Config.target_os_type = "Unix")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh. TIL that the split between Windows and Unix are plumbed all the way down to closures. Hopefully this is just for some internal flambda optimization. (Unrelated to this PR, but just something I'll have to research later since it sounds like a barrier for Windows<->Unix cross-compilation.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't part of this PR (it's got refactored slightly to add the code to emit the link-time constant), but this shouldn't be being antagonistic to cross-compilation, as it's entirely about it! The idea here is that Sys.os_type (and the various other things) take on the value of the target (i.e. the executable being created) rather than those of the host (the compiler).


let mk_set_runtime_default f =
"-set-runtime-default", Arg.String f, "<param>=<value> Set the default for \
runtime parameter <param> to <value>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a list of supported parameters. ie. <param>: standard_library_default

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed - if (as I hope) we end up also doing PR 11 in #14247 (which sits independently at dra27#186), then this list might get a bit big for a help screen. I wonder if it would be worth pre-emptively putting the list of parameters (even when it's just one) in both the ocamlc/ocamlopt man pages and the manual and then referring to that here? (or possibly going further and having "verbose" help, but as I don't think we do that for anything else yet, that's maybe too excessive!)

@jonahbeckford
Copy link
Copy Markdown
Contributor

status: I have reviewed so far the following commits:

  • Fix the detection of Cygwin-like build environments
  • Preserve backslashes in --prefix
  • Interpret . in runtime-launch-info
  • Harden startup of -custom executables
  • Add caml_runtime_standard_library_default
  • Add %standard_library_default - the change to .cmx means this PR can't be backported safely
  • Add -set-runtime-default
  • Bootstrap (I didn't review the binaries)
  • Use %standard_library_default in Config
  • Bootstrap (I didn't review the binaries)

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Oct 4, 2025

  • Add %standard_library_default - the change to .cmx means this PR can't be backported safely

What's the rationale behind the backporting comment? Even without a magic number bump (and any backports would certainly need one), there's no guarantee that the artefacts of one point release can be used with another? i.e. the magic number exists to protect marshalling and to ensure the compiler doesn't segfault, but I don't think we've ever made the stronger guarantee that, say, artefacts compiled with OCaml x.y.0 can be freely linked with artefacts compiled with x.y.1

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Oct 4, 2025

Thank you for the reviews so far, @jonahbeckford and @shym!

Comment on lines +220 to +222
let ui_need_stdlib =
List.fold_left (fun acc unit -> acc || unit.ui_need_stdlib) false units
in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let ui_need_stdlib =
List.fold_left (fun acc unit -> acc || unit.ui_need_stdlib) false units
in
let ui_need_stdlib = List.exists (fun info -> info.ui_need_stdlib) units in

There's a similar pattern below for ui_force_link.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what possessed me to write that originally! I blame the pandemic

link_archive accu output_fun currpos_fun file_name units

let link_files output_fun currpos_fun =
List.fold_left (link_file output_fun currpos_fun) false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List.fold_left (link_file output_fun currpos_fun) false
List.exists (link_file output_fun currpos_fun)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not equivalent, though - we do need to link all of the files, regardless of whether they use that symbol or not!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, I got carried away.

configure.ac Outdated
Comment on lines +353 to +354
# It is possible to use MSYS2's "Cygwin" gcc (the equivalent of compiling native
# Cygwin), in which case $build is *-*-msys*.
Copy link
Copy Markdown
Contributor

@MisterDA MisterDA Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-06-20 - Replacing x86_64-pc-msys with x86_64-pc-cygwin

As part of our ongoing effort to move MSYS2 closer to Cygwin, we have now replaced the x86_64-pc-msys triplet with x86_64-pc-cygwin as the default host triplet for the MSYS environment.

Not sure if our config.guess / config.sub have caught up with that, or if they're relevant here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is technically "no" in both cases - however, I think it is relevant inasmuch as it would be worth noting that trying to differentiate MSYS2/Cygwin here (which it isn't, fortunately) is clearly becoming harder/unnecessary.

@jonahbeckford
Copy link
Copy Markdown
Contributor

What's the rationale behind the backporting comment? Even without a magic number bump (and any backports would certainly need one), there's no guarantee that the artefacts of one point release can be used with another? i.e. the magic number exists to protect marshalling and to ensure the compiler doesn't segfault, but I don't think we've ever made the stronger guarantee that, say, artefacts compiled with OCaml x.y.0 can be freely linked with artefacts compiled with x.y.1

I didn't understand when the magic number was bumped. It was explained to me. Thanks; ignore the comment.

/* See Filename.generic_dirname */
size_t n = strlen(path) - 1;
char *res;
if (n < 0) /* path is "" */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should give a compiler warning, no? n is an unsigned size_t which can never be less than zero. Think n should be ssize_t or equivalent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the ptrdiff_t fix. LGTM.

runtime/win32.c Outdated
return caml_stat_wcsdup(stdlib_default);
}

CAMLassert(basename != root && Is_separator(*(basename - 1)));
Copy link
Copy Markdown
Contributor

@jonahbeckford jonahbeckford Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CAMLassert(basename != root && Is_separator(*(basename - 1)));
CAMLassert(basename && basename != root && Is_separator(*(basename - 1)));

for safety since https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamea says:

If lpBuffer refers to a directory and not a file, lpFilePart receives zero.

edit: Actually, what would be better is to return stdlib_default if basename==0, since basename is used later.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hang on, is it worth considering the domain of the values possible here - exe_name will be the result of GetModuleFileName so it cannot possibly be a directory nor - given Windows semantics - can it have been replaced by a directory in a race (because the running program can't be deleted).

So I agree it's worth the assertion checking that basename isn't NULL, but surely it's not worth a code path to handle that case, because it's basename being NULL is going to happen with a coding error (i.e. any thing outside an assertion checking this should be an unreachable code path?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is true.

@jonahbeckford
Copy link
Copy Markdown
Contributor

Looks good for me after my last two comments.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Oct 29, 2025

@dra27 reminded me today to have a look at the middle-end part of the patch (around the %standard_library_default primitive). Most of it looks reasonable, but I think it should not be mixed with compile-time primitives. I would advocate a dedicated Lambda/Clambda primitive, translated in Cmmgen to the appropriate symbol.
If you give me a few days I should be able to send you a patch with the changes I have in mind.

@dra27 dra27 force-pushed the enable-relative branch 2 times, most recently from 23733fc to ce0009a Compare November 9, 2025 11:57
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Nov 9, 2025

Rebased - review responses to follow

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Nov 10, 2025

I have hopefully addressed all the review comments, thank you - both the "Compare" should be working above to see the changes without the noise of the rebase and I have put each change in separate "Review" commits for squashing later.

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Nov 10, 2025

@lthls - ah, I see what you mean (I think); so instead of compiling it away as we go into {f,c}lambda, instead introduce a new middle-end primitive (which should get rid of the hacking around with symbol_for_global, I think?)

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Nov 10, 2025

@lthls - ah, I see what you mean (I think); so instead of compiling it away as we go into {f,c}lambda, instead introduce a new middle-end primitive (which should get rid of the hacking around with symbol_for_global, I think?)

Yes. I did give it a try last week and it turned out to be less elegant (and more work) than I expected, so I'm now inclined to let the current version stay.

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Nov 28, 2025

precheck#1082 seems happy. This one looks ready to go, next - thanks, everyone!

@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Nov 29, 2025

Let's try that again - an enhanced battery of tests is running in precheck#1084 as well.

@dra27 dra27 added the merge-me label Nov 29, 2025
dra27 added 17 commits December 5, 2025 10:58
Both Cygwin and MSYS2 are now consistently detected on MSYS2. In
particular, this means that ./configure --prefix $PWD/install and
similar will cause the prefix to be correctly translated to a Windows
path, as already happens on Cygwin.
Previously, the --prefix argument was always normalised with cygpath -m
which meant that regardless of the argument, the paths used in the
compiler would always use slashes.

This behaviour is preserved if a slash is detected in the argument, i.e.
the caller explicitly uses mixed notation (e.g. `--prefix=C:/Prefix` or
`--prefix $PWD/install`). In particular, it means that a Cygwin-style
path will be correctly converted to a Windows-style path.

If the path uses backslashes, then it is still converted to use forward
slashes for the installation commands, but the backslashes are otherwise
preserved and used within the build itself.
The runtime-launch-info file includes the location of the binary
directory. The compiler is extended so that . refers to the directory of
the compiler binary.
By default, ocamlrun first tries to resolve argv[0] to determine where
the bytecode image is and then tries opening the executable image
itself. This is obviously correct for ocamlrun, when being called using
a shebang or executable header, but it's not correct for -custom
executables where we _know_ that the bytecode image should be with the
executable. To achieve this, a new mode is added to
caml_byte_program_mode (and the existing ones renamed) such that
caml_byte_program_mode is now STANDARD (for ocamlrun - the existing
behaviour), APPENDED (for -custom executables - the new behaviour) and
EMBEDDED (for -output-complete-exe/-output-obj - the original use of
it).

The mode is also set directly by the linker, rather than having a
default in libcamlrun which is then overridden by the startup code for
-output-complete-exe.

In the new APPENDED mode, if caml_executable_name is implemented (i.e.
it returns a string) then this file _must_ contain the bytecode image
and no other mechanisms are used. On platforms where
caml_executable_name is not implemented, APPENDED falls back to STANDARD
for compatibility.

Technically, this stops an argv[0] injection attack on setuid/setgid
-custom bytecode executables, although setuid should be used with
-output-complete-exe, if at all.
Previously, the bytecode runtime just used OCAML_STDLIB_DIR from
build_config.h. This value is now stored once in dynlink.o as
caml_runtime_standard_library_default.
%standard_library_default allows Config.standard_library_default to be
converted to a compile-time derived value, as with existing compile-time
constants such as %backend_type, etc. This paves the way for allowing
Config.standard_library_default to be changed at link-time, rather than
fixed when the Config module itself is compiled.
Allows the default location used by the bytecode runtime for the
Standard Library to be overridden when creating bytecode executables.
Config.standard_library_default is now implemented using the
%standard_library_default primitive. This allows a convenient test which
can be added for `-set-runtime-default`.

The change also makes the host-like nature of of
Config.standard_library_default clearer, as the build of the
cross-compiler must now (correctly) specify the location of its (target)
Standard Library.
When configured with --with-relative-libdir, the runtime uses the
directory of the executable to determine the location of the Standard
Library. Thus, ocamlrun and the compilers look for ../lib/ocaml by
default.

This is implemented by changing caml_standard_library_default to be a
relative path, and then computing the actual value at startup (for
bytecode) and when queried (for native).

Executables (and objects) produced by the compiler always have an
absolute value of caml_standard_library_default. ocamlc.opt and
ocamlopt.opt are built using -set-runtime-default to force
caml_standard_library_default to be a relative value.
mingw-w64 is based on GCC, so supports -fdebug-prefix-map, but the test
for it is skipped in configure. The test is no longer skipped (which
means that Config.c_has_debug_prefix_map returns true) but the flag is
still explicitly not used by the compilers (as before).
Indication as to whether ocamlopt assembles files via the C compiler or
by calling the assembler directly.
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Dec 5, 2025

Rebased to clear the merge conflict - while the double-bootstrap provides a welcome warming for my very cold office, can I cajole/goad/bribe another core dev into merging this before the next merge conflict arises? 🙂

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 8, 2025

AppVeyor failed, and I have no idea why -- it fails after installing Cygwin, apparently after doing anything OCaml-related, and there is no error message in the logs. Oh well, let's merge anyway.

@gasche gasche merged commit cfbf210 into ocaml:trunk Dec 8, 2025
31 of 32 checks passed
@dra27
Copy link
Copy Markdown
Member Author

dra27 commented Dec 8, 2025

AppVeyor failed, and I have no idea why -- it fails after installing Cygwin, apparently after doing anything OCaml-related, and there is no error message in the logs. Oh well, let's merge anyway.

It passed on trunk! 😱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI: Full matrix Makes the CI test a bigger set of configurations merge-me relocatable towards a relocatable compiler run-crosscompiler-tests Makes the CI run the Cross compilers test workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants