Skip to content

Add deployment phase support for BUILD_PATH_PREFIX_MAP#12126

Open
richardlford wants to merge 2 commits intoocaml:trunkfrom
richardlford:bpp-map-1
Open

Add deployment phase support for BUILD_PATH_PREFIX_MAP#12126
richardlford wants to merge 2 commits intoocaml:trunkfrom
richardlford:bpp-map-1

Conversation

@richardlford
Copy link
Copy Markdown
Contributor

@richardlford richardlford commented Mar 21, 2023

Add deployment phase support for BUILD_PATH_PREFIX_MAP
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.

However, that specification only describes half of the story. Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.

Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.

We use the same mechanism in the deployment phase. Then
the BUILD_PATH_PREFIX_MAP must be setup with the logical
inverse of the mapping used during the build phase.

Also, this is part of a larger PR,
#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for
these changes. See it for prior discussion and the other changes.

I will now describe the essence of the changes.

  1. utils/build_path_prefix_map.{ml,mli}

Added these functions to aid in use of the mapping
during the deployment phase. They allow to skip
mapping entries that are not applicable (find_rewrite).
find_rewrite is then used to implement rewrite_exists and
matching_dirs.

val find_rewrite : map -> (path -> bool) -> path -> path option
(** [rewrite_opt map pred path] tries to find a source in [map]
    that is a prefix of the input [path] that produces a result
    that satisfies predicate [pred]. If it succeeds,
    it replaces this prefix with the corresponding target.
    If it fails, it just returns [None]. *)

val rewrite_exists : map -> path -> path option
(** [rewrite_exists map path] tries to find a source in [map]
    that maps to a result that exists in the file system.
    If so, returns Some result. Otherwise, is there was
    at least one source that was a prefix of path, it is
    assumed that the map is deficient and Not_found is raised.
    If no source prefixes were found, None is returned.
*)

val matching_dirs : map -> path -> path list
(** [matching_dirs map absdir] accumulates a list of existing directories,
  [dirs], that are the result of mapping abstract directory, [absdir],
  over all the mapping pairs in [map]. The list [dirs] will be in
  priority order (head as highest priority). If there is success,
  returns [dirs]. If no source in the map was a prefix of [absdir],
  returns [[]]. If at least one source in the map was a prefix of
  [absdir], but none of the mappings were existing directories,
  raise Not_found.
*)
  1. parsing/location.{ml, mli}

This has convenience functions for using the above
mapping API. It reads and caches the environment
variable so the end-user does not need to.

Location.absolute_path was modified so that BUILD_PATH_PREFIX_MAP
rewriting is done for both absolute and relative paths. Relative paths
are made absolute by appending to the cwd. Previously only relative
paths were rewritten. One discovery during testing was that if the
compiler is given an absolute path as the input source, the debug
event information had that absolute path (see below).

New functions are:

val rewrite_exists: string -> string option option
(** [rewrite_exists path] uses a BUILD_PATH_PREFIX_MAP mapping
    (https://reproducible-builds.org/specs/build-path-prefix-map/)
    and tries to find a source in mapping
    that maps to a result that exists in the file system.
    There are the following return values:
    - None, means BUILD_PATH_PREFIX_MAP is not set.
    - Some None, no source prefixes of [path] in the mapping were found,
    - Some (Some target), means target is the first file (in priority
      order) that [path] mapped to that exists in the file system.
    - Not_found raised, means some source prefixes in the map
      were found that matched [path], but none of them existed
      in the file system. *)

val matching_dirs: string -> string list option
(** [matching_dirs absdir] accumulates a list of existing directories,
    [dirs], that are the result of mapping abstract directory, [absdir],
    over all the mapping pairs in the BUILD_PATH_PREFIX_MAP environment
    variable. The list [dirs] will be in priority order (head as highest
    priority). The possible results are:
    - None, means BUILD_PATH_PREFIX_MAP is not set.
    - Some [], no source prefixes in the mapping were found,
    - Some dirs, means dirs are the directories found
    - Not_found raised, means some source prefixes in the map
      were found that matched [path], but none of mapping results
      were existing directories.
    See the BUILD_PATH_PREFIX_MAP spec at
    (https://reproducible-builds.org/specs/build-path-prefix-map/)
    *)
  1. bytecomp/emitcode.ml

Added sanitizing of paths in the debug events.
If the compiler is given absolute source paths
this was leaking absolute build paths.

  1. bytecomp/bytelink.ml

Rewrite when producing the path for a shebang.
The value written is from the user "-use-runtime"
option. The one setting BUILD_PATH_PREFIX_MAP
will have control whether this does anything or not.

  1. debugger/command_line.ml

A debugger variable, "mapping", was added, which is tied to the
BUILD_PATH_PREFIX_MAP environment variable. So "set mapping value"
will set BUILD_PATH_PREFIX_MAP, and "show mapping" will show the value
of BUILD_PATH_PREFIX_MAP. This allows the user to set it
programmaticaly, possibly from the ".ocamldebug" file.

  1. debugger/source.ml

In source_of_module, use the Location.rewrite_exists.

  1. debugger/symbols.{ml, mli}

a. Avoid adding directories redundantly

b. Expose bppm_expand_path and get_load_path so they
can be called from a fake printer which, when
loaded and installed in a test, can do unit testing
of ocamldebug.

c. Use Location.matching_dirs to expand directories.

  1. Man page and manual updated.

Fixes #12083

@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 21, 2023

@gasche, as you requested, this is the first part from #12085. @dra27, @dbuenzli, @shindere, would you also like to take a look?

Note that there is no longer a need for two versions of Location.absolute_path as it now will rewrite both relative and absolute paths.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Mar 21, 2023

I have a proposed revision of that specification. I have capture the original specification on github, and then created a pull requests with my proposed changes. See

I'm afraid I don't have the time to look into all this for the forseeable future.

But I doubt you need to add more concepts to that specification. If the tool setting up the prefixes does it in a smart way so that the map is bijective (i.e. distinct source prefixes map on distinct target prefixes) I don't think there's any need for search lists. You should always be able to redirect each target prefix to the location you need at "deploy time".

@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 22, 2023

But I doubt you need to add more concepts to that specification. If the tool setting up the prefixes does it in a smart way so that the map is bijective (i.e. distinct source prefixes map on distinct target prefixes) I don't think there's any need for search lists. You should always be able to redirect each target prefix to the location you need at "deploy time".

What you say is true, but a much more complex mapping is needed in that case. If the product involves n packages, then the map would need n separate entries, one for each package. If search lists are used, then the mapping can be done at a higher level and usually only 3 entries are needed. That would mean it would be possible to setup up the mapping manually, whereas without search lists it would certainly require tools to set it up.

And the generalization to allow search lists is very small (even a small part of this PR), so I think it is worthwhile in order to gain simplicity.

You could think of this as being analogous to the PATH environment variable, which is a search list. Instead of having a PATH environment variable, you could just have a map that specifies where each executable you want is located. But that would be more complex than being able to use PATH. It would be like requiring you to look only in a single directory for any executable. It would work, because you could setup symbolic links from the directory to the real locations (and to some extent that is what /usr/bin is), but it would not be as convenient as having PATH. Usually a person's PATH will be much smaller than the number of execuables reachable from it.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Mar 22, 2023

What you say is true, but a much more complex mapping is needed in that case. If the product involves n packages, then the map would need n separate entries, one for each package. If search lists are used,

To make that a little bit more precise, I think it would involve 2 + k mappings, where k are the packages to be installed that are built by your project. Basically you'd have, in right to left order:

  1. k mappings (/lib/$P, _build/…/lib/$P) for the k packages built and to be installed by your project (note: lots of projects only install one package).
  2. One mapping (/lib, $(opam var lib)) for the installed packages used by your project (and compiled according to this convention).
  3. One mapping (/private, _build/…/private/lib), for the artefacts that your project builds but keeps for itself.

(With _build path above refined according to the concrete build structure but that's the gist of the idea)

With that you can simply follow exactly the same lookup specified in BUILD_PATH_PREFIX_MAP specification. To map your reproducible paths to concrete paths.

That would mean it would be possible to setup up the mapping manually, whereas without search lists it would certainly require tools to set it up.

Personally I'd rather have a simpler mechanism than catering for this use case. Because:

  1. I'm not keen on having to understand one more lookup specification when things go wrong.
  2. I don't expect programmers to setup this manually.

Regarding 2., a significant trigger for programmers to use debuggers are the setup costs. Basically they use them if they are push button, otherwise they'll just reach for printf.

So it should be expected that a tool will set up the mapping for you. Namely the one which perfectly knows all this information: your build system.

@richardlford
Copy link
Copy Markdown
Contributor Author

To make that a little bit more precise, I think it would involve 2 + k mappings

Good point. Only one mapping is needed to cover the whole opam library. As you point out, k will usually be small. You almost have me convinced. Search lists could reduce the k to 1.

Also, a possible scenario that search lists would handle better would be a situation where there was more than one opam-like repository, e.g. experimental vs. stable. Then at deploy-time, the /lib could map to a search list with /experimental/lib and /stable/lib. Without search lists that would be handled by an additional m mappings, one for each package from the experimental repository.

I should point out that the API with search lists is (almost) backward compatible with the old API. I say almost, because the function to encode a pair in the new API takes a list of targets rather than a single target. But if the idea of including search lists is accepted, we could have a backward-compatible API and an enhanced API if search-lists are desired.

I will leave it up to whomever has the authority to decide, whether to include the search-list feature. My preference is to include it. But I will abide by whatever decision is made. Would @gasche or one of the others maintainers please make a decision on this point?

Speaking of producers of mappings, the mappings Dune produces for the build phase are correct if the developer has their source layout isomorphic to the installation layout (so reproducible paths will look like /workspace_root/package/item). But if the developer does not use that layout, I don't believe that Dune makes any special effort to make the reproducible paths correct. That is something I am investigating.

And, of course, Dune will need to have some features to make the proper deploy-time map, and perhaps invoke the debugger. The emacs or other interfaces will also need to be updated. I will also be investigating that.

@richardlford
Copy link
Copy Markdown
Contributor Author

@gasche, I see that you are the author of the build_path_prefix_map package. Do you see any merit in adding the search-list capability?

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 22, 2023

I wrote the support code for the current BUILD_PATH_PREFIX_MAP specification indeed -- it was upstreamed into the compiler, and also published as a separate library in case other OCaml programs would want to implement it.

I am hesitant to follow your road of changing the specification to do more things. Initially the reproducible-builds people sold the specification as a done deal, soon to be supported by gcc/clang, but I am not sure that gcc/clang actually followed this approach in the end. In any case, I liked the idea of using the same approach as others and sharing a common specification. With your suggestion we would go down the route of adopting and extending the specification for the needs of (a small fragment of) the OCaml community only, and we would either have to gather support from a non-existent or at least unknown group of outside users, or be responsible for maintaining it moving forward. If we wanted to do our own thing independently of others, I am not sure that I would use this specification with its weird quoting rules; maybe something simpler could be used.

This is all said without looking at the details of your proposal, which does not sound unreasonable or badly designed. (The idea of quoting an additional special character is better than your previous approach of reusing :.)

Because I haven't invested the time yet to understand your work in details, I am left with the same question that I already asked previously: In a previous PR a suggestion was made to just "revert" an existing BUILD_PATH_PREFIX_MAP mapping, which in the general case may give a one-to-many relation. Is that not enough to support what you call "deployment phase"?
If artifact A is compiled with the mapping (path-a -> root), and artifact B is compiled with the mapping (path-b -> root), then the union of the mappings is (path-a -> root, path-b -> root), and asking ocamldebug to interpret this mapping "in reverse" gives the non-deterministic (or one-to-many) relation (root -> {path-a, path-b}). I don't know if all interesting one-to-many mappings can be expressed in this way (as the reverse of a standard one-to-one mapping) -- maybe you already answered this before, apologies.

Finally, @dbuenzli suggested using different "workspace root" names for different packages to avoid the issue of needing one-to-many mappings: maybe we want to use the build-time mapping path-a -> root(a), path-b -> root(b) (for example: '~/Prog/foo/_build" -> "workspace_root(foo)"), and the deploy-time mapping remains one-to-one, we just need to remap "workspace_root(foo)" to "~/.opam/lib/foo" and "workspace_root(bar)" to "~/.opam/lib/bar".

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 22, 2023

(Note: when I talk of a mapping foo -> bar, what I have in mind is that applying this mapping rewrites the foo prefix in a path into bar. This is worth pointing out because the BUILD_PATH_PREFIX_MAP format presents this, for reasons that are unknown to me, as the pair (bar,foo) in that order.)

@richardlford
Copy link
Copy Markdown
Contributor Author

OK. You all have convinced me that it is not worth the effort to introduce search lists and get them accepted into the community. I will revert that part of my change. It will be up to what the spec calls "producers", i.e. those that produce the BUILD_PATH_PREFIX_MAP, (both during the build phase and the deploy phase) to get it right. "Low-level tools", like the compiler and debugger, will just do the mapping requested. For Dune users, Dune will need to add the capability for the deploy phase.

@richardlford richardlford force-pushed the bpp-map-1 branch 2 times, most recently from abbe6e1 to 881e4c1 Compare March 22, 2023 20:07
@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 22, 2023

I have removed the search list feature from the current push.

When using Dune, one thing that occurs to me I have lost is the ability (without great effort) to preferentially find the original source files rather than finding the mirror file in the _build directory. The abstract path to a source file will be something like

/workspace_root/apackage/adir/mod.ml

At build time, when I want to do debugging, we will have the original source:

srcdir/apackage/adir/mod.ml and its mirror in the build directory:

srcdir/_build/default/apackage/adir/mod.ml

But the abstract path to a .cma file will typically be:

/workspace_root/apackage/adir/adir.cma

and it will exist in the build directory at:

srcdir/_build/default/apackage/adir/adir.cma

If I preferentially map /workspace_root/apackage to srcdir/apackage, then
it would also map /workspace_root/apackage/adir/adir.cma to
srcdir/apackage/adir/adir.cma, which does not exist.

The workaround would be to have mappings at the individual artifact level,
but that will increase the size of the mapping considerably.

I suppose it is OK to get the location of the sources in the build directory, rather than the originals, though when using the emacs interface, whenever that happens, it asks whether I want to go to the original, which is rather annoying.

Although, I happened to think of a small tweak to the mapping functions that might add power without much complexity. What if we had a variation of Build_path_prefix_map.rewrite_opt that took a predicate? This variation would find the first rewrite that satisfied the predicate. If the predicate was that the rewritten file existed, then the above problem would be solved. First the mapping of the .cma file to the srcdir would be found, but rejected, so the next mapping would be tried.

Here is a draft implementation:

let pred_rewrite_opt prefix_map pred path =
  let is_prefix { target = _; source } =
      String.length source <= String.length path
      && String.equal source (String.sub path 0 (String.length source))
  in
  let subst path { source; target } =
    target ^ (String.sub path (String.length source)
                (String.length path - String.length source)) in
  match
    List.find (function
     | None -> false
     | Some pair -> (is_prefix pair) && pred (subst path pair))

      (* read key/value pairs from right to left, as the spec demands *)
      (List.rev prefix_map)
  with
  | exception Not_found -> None
  | None -> None
  | Some pair -> Some (subst path pair)

What would you think of a small enhancement like that?

@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 23, 2023

The more I think about it, the more important I think it is to have the pred_rewrite_opt function mentioned above. During the "build phase', abstract paths are produced that are not expected to exist. But during the "deploy phase" we are producing paths that we expect to exist. Therefore I think it makes sense to be able to have a mode where mapping pairs that do not produce existing paths are skipped.

This has all of the advantages of the search-list strategy I had without any of the complexities. I think if the writers of the BUILD_PATH_PREFIX_MAP spec had been thinking about the deploy phase as well as the build phase they might have suggested something like this. I'm going to go ahead and add this one function (and a companion in Location), and do some testing with it. We could consider it a "quality of implementation" feature.

I'm also going to temporarily bring over my ocamltest and testsuite changes for the purpose of testing these changes. But the reviewers should ignore the ocamltest and testsuite changes. I will revert those when the review of this PR is complete so that they will not be included.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 23, 2023

I am fine with pred_rewrite_opt. This is a form of lookup into the map that is different from the one defined in the specification, but it does not change the syntax or extend the map data structure in any way.

On the other hand:

  • The implementation is very redundant with the rewrite_opt function already provided, I would want a single implementation of the lookup logic. (For example rewrite_opt can be an instance of pred_rewrite_opt with a trivial predicate.)
  • The implementation calls subst twice on each path, which I find a bit distasteful. You could implement this using List.find_map instead.
  • I'm not excited by the name pred_rewrite_opt, maybe find_rewrite?

A more vague question: rewrite_absolute_path_existing does not rewrite the input path at all if there are several possible rewrites, but none of them exists. I don't see a clearly better choice, but I wonder if doing this silently is the right behavior. Intuitively I would distinguish two cases:

  1. There is no matching source prefix in the prefix map, so not doing any rewrite is clearly the right behavior.
  2. There are matching prefixes in the prefix map, but none of them satisfy the predicate. This sounds like a situation where the BUILD_PATH_PREFIX_MAP value is incorrect or at least does not match the intentions of the caller of rewrite_absolute_path_existing. It might make sense to raise an exception in this case, or at least to signal the situation with a return value different from the one from case (1).

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 23, 2023

Also: have you ever commented on the proposed feature of asking tools to "invert" a given BUILD_PATH_PREFIX_MAP mapping? (The idea would be that producers and consumers could use the same mapping, but consumers would be asked to 'invert' the mapping.)

This was suggested there and there by @dbuenzli in your original issue, I asked about it there in the larger PR (my question was: have you considered this suggestion and does it solve the question of how to specify one-to-many relations?) and in the present discussion again, but I haven't been able to locate a comment from you about this proposed feature or an answer to my questions about it (does it solve your problem? did you consider it?). I would be curious to hear what you think about it. I wonder if there is some misunderstanding about this suggestion that makes you not consider it, because from a distance it feels like it could be part of a nice solution to the overall problem of "deployment phase support".

@richardlford
Copy link
Copy Markdown
Contributor Author

I am fine with pred_rewrite_opt. This is a form of lookup into the map that is different from the one defined in the specification, but it does not change the syntax or extend the map data structure in any way.

Thank you.

On the other hand:

  • The implementation is very redundant with the rewrite_opt function already provided, I would want a single implementation of the lookup logic. (For example rewrite_opt can be an instance of pred_rewrite_opt with a trivial predicate.)
  • The implementation calls subst twice on each path, which I find a bit distasteful. You could implement this using List.find_map instead.

I had thought about that. I was concerned that the version with the predicate might slow down the version used in the build phase. But I would be happy to consolidate them and have the build phase version just use a true predicate.

  • I'm not excited by the name pred_rewrite_opt, maybe find_rewrite?

I like find_rewrite.

A more vague question: rewrite_absolute_path_existing does not rewrite the input path at all if there are several possible rewrites, but none of them exists. I don't see a clearly better choice, but I wonder if doing this silently is the right behavior. Intuitively I would distinguish two cases:

  1. There is no matching source prefix in the prefix map, so not doing any rewrite is clearly the right behavior.
  2. There are matching prefixes in the prefix map, but none of them satisfy the predicate. This sounds like a situation where the BUILD_PATH_PREFIX_MAP value is incorrect or at least does not match the intentions of the caller of rewrite_absolute_path_existing. It might make sense to raise an exception in this case, or at least to signal the situation with a return value different from the one from case (1).

Those thoughts had occurred to me too. During the deployment phase it seems that it is the responsibility of the map producer should be to give a map where the abstract artifacts are found, and if not, that should be an error. I will make a corresponding change.

@richardlford
Copy link
Copy Markdown
Contributor Author

Also: have you ever commented on the proposed feature of asking tools to "invert" a given BUILD_PATH_PREFIX_MAP mapping? (The idea would be that producers and consumers could use the same mapping, but consumers would be asked to 'invert' the mapping.)

No, I haven't, so I will now. The mapping Dune uses is cwd -> /workspace_root. That will have been used, conceivably by different users on different machines in different countries with different operating systems. In each of those cases cwd had different values. The result is the products in the opam repository. The BUILD_PATH_PREFIX_MAP spec says the maps are not to be saved, and even if they had been, it would be no help as the build environment is different from the deployment environment.

The only case where inverting the map might make sense is for the local developer for the mapping used to build the products in his own development directory, because then the build environment and the deployment environment are the same. But even here, the map is used only in the _build/<context> directory, and it misses the desired reference to the source directory, and also the references to the opam library.

It is true that the logical inversion of cwd -> /workspace_root will be used. It will be

opamlib=/workspace_root:builddir=/workspace_root:sourcedir=/workspace_root

where again, the right-most mapping takes precedence. Using such a map is made possible by the existence of the newly-named find_rewrite. So thank you again.

I hope this adequately answers that suggestion.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 23, 2023

I was concerned that the version with the predicate might slow down the version used in the build phase.

Just to clarify, there is nothing performance-sensitive about these functions, so we have to optimize for clarity and maintainability instead of performance. (If we were to find out that there is a cost, or possibly for memory-sharing reasons, we could add a memoization step.)

@richardlford
Copy link
Copy Markdown
Contributor Author

Just to clarify, there is nothing performance-sensitive about these functions, so we have to optimize for clarity and maintainability instead of performance. (If we were to find out that there is a cost, or possibly for memory-sharing reasons, we could add a memoization step.)

Understood.

@dbuenzli
Copy link
Copy Markdown
Contributor

@richardlford I think the problem we are having is that you are taking the current map made by dune as cast in stone. I think dune should change the way it maps build artefacts: see this comment.

With the map I suggest in this comment, there is:

  1. No need to save it, that would defeat the purpose anyways.
  2. The fact that cwd can have different values on different machines (and even your own) is irrelevant.
  3. The map setup by dune for compiling reproducible artefacts becomes usable by ocamldebug to map back these paths to the concrete artefacts it needs to look up.

I still fail to see what's the problem with this simple approach.

@richardlford
Copy link
Copy Markdown
Contributor Author

I still fail to see what's the problem with this simple approach.

You are right. I think the problem is that I thought you were somehow suggesting there was a map that needed to be inverted. Instead, what I now understand is that the build tool, e.g. Dune, is responsible for making the map during the build phase. And that the build tool must also be responsible for building the deployment phase map. That deployment map will be a logical inverse of the build phase map. Then it will likely build a map as you suggested in your comment.

If the user has their source layout isomorphic to the install layout, then Dune can get by with the simple maps I suggested. But if the user has a more complex source layout, then Dune will have more work to create the maps, both for build and deployment phases.

The find_rewrite feature is still needed in order to find sources preferentially in the source directory rather than the build directory.

And, as you suggested, invoking the debugger should be a push-button operation and the build tool should handle the setup. I totally agree.

@richardlford
Copy link
Copy Markdown
Contributor Author

I have a question about the CI testing environment. I'd like to write some tests that use Dune. I have written an ocamltest action, has_dune, that checks whether Dune is available, and would write the tests guarded by that action.

My question is, does the CI testing environment have Dune? If not, then the tests I write would only be useful locally. Presumably one might also care about which version of dune was available.

If there were other build systems that used BUILD_PATH_PREFIX_MAP we could consider testing with them as well, but for now I'm focused on getting things to work for Dune.

Any opinions on having some build system tests in the OCaml test suite?

@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 23, 2023

The find_rewrite feature is still needed in order to find sources preferentially in the source directory rather than the build directory.

This is not strictly true, since the build tool could make a mapping pair for each individual source file that mapped to its location in the source directory rather than in the build directory. This would be a more specific mapping than the mappings that map directories, so that would work.

But having find_rewrite makes it possible for the build tool to get by with directory mappings, thus simplifying things. But the user need not be concerned with this as that will be handled transparently by the build tool. But the build tool does need to be aware that deployment-phase tools like ocamldebug are using find_rewrite.

@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 23, 2023

I have pushed a commit with changes to reflect the suggestions from @gasche. One additional helper not mentioned above is

val matching_dirs : map -> path -> path list
(** [matching_dirs map absdir] accumulates a list of existing directories,
  [dirs], that are the result of mapping abstract directory, [absdir],
  over all the mapping pairs in [map]. The list [dirs] will be in
  priority order (head as highest priority). If there is success,
  returns [dirs]. If no source in the map was a prefix of [absdir],
  returns [[]]. If at least one source in the map was a prefix of
  [absdir], but none of the mappings were existing directories,
  raise Not_found.
*)

This is needed for handling the directories in the debug information.
An alternate would have been to collect the abstract directories, and then,
when looking something up (like a .cma), make the full abstract path
and use rewriite_exists on that. For now, I decided to go with this
approach.

I'll now proceed to do more testing.

@richardlford
Copy link
Copy Markdown
Contributor Author

I've squashed and pushed again, with a small change to do a List.rev on the list of debug source directories, so they will be in priority order.

@richardlford
Copy link
Copy Markdown
Contributor Author

richardlford commented Mar 23, 2023

I pushed some of the tests from other PR and added some new unit tests for matching_dirs and rewrite_exists. Reminder: the ocamltest and testsuite changes are not in this PR, but are temporarily here for the purpose of testing.

@richardlford
Copy link
Copy Markdown
Contributor Author

OK. Based on my testing, I think the compiler and debugger changes are complete and functionally correct. So I think this is ready for a detailed code review.

The changes have been squashed into two commits. One is the compiler and debugger changes. That is the commit to review. The other has ocamltest enhancements and the tests for the functionality. They will not be merged as part of the PR, but are there for your reference, and to verify that the code is passing the tests. When the main part is approved this part will be removed before the merge.

I wanted to do some testing with Dune, but that is something at a higher level. Since there does not appear to be an easy way to do Dune testing in the ocaml/testsuite, I will do that separately in the Dune test suite, and start investigating how to implement a dune debug command.

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a review pass on the first commit. As usual, this is rather impressive and overall fine but also there are too many things in a single commit to my taste. I would have preferred a commit with the build_path_prefix_map and Location logic, a commit in bytecode, and a commit for the debugger. But so it is, and I don't know that you should spend more effort splitting it.

I don't understand the potential impact of rewriting all paths provided to the compiler, not just relative paths. I don't know if anyone else does, and I'm not sure if there is a testing strategy to learn of issues in advance. (I suspect that we don't know what we want to check, and that complaints if any would come from people with non-standard use-cases we don't know about, not building/running code from the opam repository.) I have gathered my courage to approve the change in any case, with the idea that we would revert or at least change course if we learned of major issues after the release.

order) that [path] mapped to that exists in the file system.
- Not_found raised, means some source prefixes in the map
were found that matched [path], but none of them existed
in the file system. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has four subtly-different non-standard return modes. This is too many, possibly three too many.

From the perspective of the user of Location, the map is an internal detail. They only need to know that some rewriting may happen depending on some configuration choice, and maybe we also want to expose the possibility that this rewriting fails due to misconfiguration. But I don't see why there should be any difference between an unset map, an empty map, or a map without any matching prefix.

Proposal:

  • If a rewrite gives a path p that exists, return Some p
  • If a matching prefix exists but no rewrites exists, the function prints a warning itself and then behaves as if no matching prefix exists.
  • If no map is set or no match is found or exists: if the input path p exists, return Some p, otherwise return None.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed as you suggested.


See the BUILD_PATH_PREFIX_MAP spec at
(https://reproducible-builds.org/specs/build-path-prefix-map/)
*)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same opinion here. You take a directory path and rewrite it into a list of directories that exist, possibly emitting a warning along the way. I would call it rewrite_all_existing_dirs maybe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed as you suggested.

val decode_map : string -> (map, error_message) result

val find_rewrite : map -> (path -> bool) -> path -> path option
(** [rewrite_opt map pred path] tries to find a source in [map]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite_opt => find_rewrite

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses your new API, rewrite_first.

| Some { source; target } ->
Some (target ^ (String.sub path (String.length source)
(String.length path - String.length source)))
let check = (function
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(no need for parentheses here; I don't usually comment on pure style but this is really unidiomatic)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code is now gone.

if (is_relative path) then (concat (Sys.getcwd ()) path)
else path
in
let mapped_abspath = Location.rewrite_absolute_path unmapped_abspath in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equivalent to let mapped_abspath = Location.absolute_path path?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I wanted the unmapped path available so it could be compared to the mapped path. If the mapping did nothing, I was just going to keep the existing event.

var_action = environ_variable false "BUILD_PATH_PREFIX_MAP";
var_help =
"BUILD_PATH_PREFIX_MAP used to map abstract\n\
paths to runtime environment."}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could include a trailing semicolon here, so that the next person does not have to change line 1237 to add a new element.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

where mapped to abstract reproducible paths.

Before ocamldebug is invoked, ``BUILD_PATH_PREFIX_MAP`` must again
be set, but this time to cause the logical inverse of the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrasing of this paragraph suggests that setting BUILD_PATH_PREFIX_MAP is mandatory. The documentation should be clear that this is an optional feature.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Before ocamldebug is invoked, ``BUILD_PATH_PREFIX_MAP`` must again
be set, but this time to cause the logical inverse of the
first mapping. Typically that will be one by a build tool,
but it may be done manually by a knowledgeable user.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider adding support in the compiler codebase for asking the BUILD_PATH_PREFIX_MAP reader (here ocamldebug) to invert the mapping themselves, to make life simpler for tools. (I think that @dbuenzli also had such a convenience feature in mind, but I am not sure.) But this could be done separately from the current PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will consider this. The mapping done during the build phase will involve the build directory, but during the deployment phase we also want to have the source directory and the opam directory as part of the map, so the deployment map cannot be purely derived from the build-phase map. I think it makes more sense for the build tool, e.g. Dune, to produce the inverse map than for the low-level tool.

returns [[]]. If at least one source in the map was a prefix of
[absdir], but none of the mappings were existing directories,
raise Not_found.
*)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that we need to expose those two specialized functions in this module, given that they can easily be defined from the outside using find_rewrite. I don't have a strong opinion either and including them here is acceptable to me.

I also wondered if we should include a function that returns the list of possible rewrites, and not the first one only, so that one could implement matching_dirs without the false-predicate hack.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we expose a function that returns all possible rewrites in priority order, then we don't actually need find_rewrite anymore, the user can define it on top. (There is the slight performance disadvantage of computing all rewrites to use just one, we could use Seq.t to make the computation lazy if we wanted to be really fancy, but a list is fine I think.)

This would give the following API:

val rewrite_first : path -> path option
val rewrite_all : path -> path list option   (or: path Seq.t option)

val rewrite : path -> path (* convenience function *)

I find the simplicity of the resulting API seductive. Everything else, including showing a warning, can be done in Location or some other module.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adopted your suggestion and I think things turned out quite nicely.

*)

val absolute_path: string -> string
(** Makes absolute and uses rewrite_absolute_path. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have many BUILD_PATH_PREFIX_MAP functions in this module. I think they would deserve their own documentation section (currently they are in {1 Printing locations}, they could move to {1 Rewriting paths} or something). We could also think about their naming as a group, instead of individual functions.

  • should all functions that apply some rewriting start with the rewrite_ prefix? Then absolute_path should change now. (But maybe we don't want that.)
  • rewrite_exists and matching_dirs are closely related but have inconsistent names; maybe rewrite_find_first_existing and rewrite_find_all_existing_dirs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@richardlford
Copy link
Copy Markdown
Contributor Author

I don't understand the potential impact of rewriting all paths provided to the compiler, not just relative paths. I don't know if anyone else does, and I'm not sure if there is a testing strategy to learn of issues in advance. (I suspect that we don't know what we want to check, and that complaints if any would come from people with non-standard use-cases we don't know about, not building/running code from the opam repository.) I have gathered my courage to approve the change in any case, with the idea that we would revert or at least change course if we learned of major issues after the release.

I don't know what other build systems use the OCaml compiler and also do BUILD_PATH_PREFIX_MAP mapping, but Dune at least is only doing rewriting of paths in the current build directory. So if ocamlc ask for rewriting for some other absolute paths, nothing will happen. So it seems that rewriting other absolute paths will only have effect if the tool setting BUILD_PATH_PREFIX_MAP intends it to.

One case where ocamlc was missing rewriting was the case where the user passed an absolute path to the source file (ocamltest does this). In that case the absolute path to the source was leaking in the debug event paths.

In a separate conversion with @lpw25, #12106 (comment), I determined that the list of debug directories includes all the directories where the source files are found. The conclusion is that if the debugger just looked up the basename of the source path using the provided directories, then the sources would be found.

So an alternate for the debug events, rather than using BUILD_PATH_PREFIX_MAP mapping, is just to replace the source path with the basename. That would prevent absolute paths from leaking. And even when the debug event path is relative, it often will have an irrelevant part, so it would be good to get rid of the directory part of the debug event path. So I'm going to try that approach for now.

@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 26, 2023

From a distance it sounds like BUILD_PATH_PREFIX_MAP is a more precise approach that would help avoid weird ambiguities and under-specifications -- if several modules in different subdirectories have the same filename for example, but the other approach, less precise, is less work to implement.

@richardlford
Copy link
Copy Markdown
Contributor Author

@gasche, I squashed, rebased, and force-pushed, but I kept the changes in response to your review separate until you have a chance to look at them. The commits are now the following:

  1. The original compiler and debugger commit.
  2. Changed for the compiler and debugger in response to your review. The most significant change is to use the new build_path_prefix_map API you suggested.
  3. The original ocamltest and testsuite.
  4. Changes to ocamltest you suggested, but also changes to my tests to use the new API and avoid the need for ocamldebug to expose internals.

As before, only commits 1 and 2 are part of this PR. Commits 3 and 4 are for reference and will be moved to a separate PR.

As mentioned above, I'm not yet sure about the rewriting of debug events. I believe the basename of the debug event path along with the debug directories should be sufficient to find the sources, but I've not yet gotten that to work.

If you wish, I can split out a separate PR with just the Build_path_prefix_map and Location changes if you think those parts are ready.

@richardlford
Copy link
Copy Markdown
Contributor Author

Another quick push to fix alldepend problem.

richardlford added a commit to richardlford/ocaml that referenced this pull request Mar 27, 2023
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.

However, that specification only describes half of the story.  Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.

Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.

We use the same mechanism in the deployment phase. Then
the BUILD_PATH_PREFIX_MAP must be setup with the logical
inverse of the mapping used during the build phase.

This PR generalized the functions in Build_path_prefix_map
and Location to facility the inverse map.

Also, this is part of a larger PR,
ocaml#12126, which itself
was part of
ocaml#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for
these changes. See it for prior discussion and the other changes.

I will now describe the essence of the changes.

1. utils/build_path_prefix_map.{ml,mli}

The API is now:

    val rewrite_first : map -> path -> path option
    (** [rewrite_first map path] tries to find a source in [map]
	that is a prefix of the input [path]. If it succeeds,
	it replaces this prefix with the corresponding target.
	If it fails, it just returns [None]. *)

    val rewrite_all : map -> path -> path list
    (** [rewrite_all map path] finds all sources in [map]
	that are a prefix of the input [path]. For each matching
	source, in priority order, it replaces this prefix with
	the corresponding target and adds the result to
	the returned list.
	If there are no matches, it just returns [[]]. *)

    val rewrite : map -> path -> path

2. parsing/location.{ml,mli}

Use the new API and renamed the rewriting functions.

    val rewrite_find_first_existing: string -> string option
    (** [rewrite_find_first_existing path] uses a BUILD_PATH_PREFIX_MAP mapping
	(https://reproducible-builds.org/specs/build-path-prefix-map/)
	and tries to find a source in mapping
	that maps to a result that exists in the file system.
	There are the following return values:
	- None, means BUILD_PATH_PREFIX_MAP is not set.
	- Some None, no source prefixes of [path] in the mapping were found,
	- Some (Some target), means target is the first file (in priority
	  order) that [path] mapped to that exists in the file system.
	- Not_found raised, means some source prefixes in the map
	  were found that matched [path], but none of them existed
	  in the file system. *)

    val rewrite_find_all_existing_dirs: string -> string list
    (** [rewrite_find_all_existing_dirs absdir] accumulates a list of existing
	directories, [dirs], that are the result of mapping abstract directory,
	[absdir], over all the mapping pairs in the BUILD_PATH_PREFIX_MAP
	environment variable, if any. The list [dirs] will be in priority order
	(head as highest priority). The possible results are:
	- [], means BUILD_PATH_PREFIX_MAP is not set, or if set, then
	  there were no matching prefixes of path.
	- Some dirs, means dirs are the directories found. A possibility
	  is that there was no mapping, but the input path is an existing
	  directory and the result if [[path]].
	- Not_found raised, means some source prefixes in the map
	  were found that matched [path], but none of mapping results
	  were existing directories (possibly due to misconfiguration)

	See the BUILD_PATH_PREFIX_MAP spec at
	(https://reproducible-builds.org/specs/build-path-prefix-map/)
	*)

In addition to those new APIs, Location.absolute_path was modified
to do rewriting on absolute paths (in addition to relative paths
which it was already doing).

At the present time their is no code depending on these APIs.
They are used in the other parts of the PRs mentioned.
richardlford added a commit to richardlford/ocaml that referenced this pull request Mar 27, 2023
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.

However, that specification only describes half of the story.  Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.

Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.

We use the same mechanism in the deployment phase. Then
the BUILD_PATH_PREFIX_MAP must be setup with the logical
inverse of the mapping used during the build phase.

This PR generalized the functions in Build_path_prefix_map
and Location to facility the inverse map.

Also, this is part of a larger PR,
ocaml#12126, which itself
was part of
ocaml#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for
these changes. See it for prior discussion and the other changes.

I will now describe the essence of the changes.

1. utils/build_path_prefix_map.{ml,mli}

The API is now:

    val rewrite_first : map -> path -> path option
    (** [rewrite_first map path] tries to find a source in [map]
	that is a prefix of the input [path]. If it succeeds,
	it replaces this prefix with the corresponding target.
	If it fails, it just returns [None]. *)

    val rewrite_all : map -> path -> path list
    (** [rewrite_all map path] finds all sources in [map]
	that are a prefix of the input [path]. For each matching
	source, in priority order, it replaces this prefix with
	the corresponding target and adds the result to
	the returned list.
	If there are no matches, it just returns [[]]. *)

    val rewrite : map -> path -> path

2. parsing/location.{ml,mli}

Use the new API and renamed the rewriting functions.

    val rewrite_find_first_existing: string -> string option
    (** [rewrite_find_first_existing path] uses a BUILD_PATH_PREFIX_MAP mapping
	(https://reproducible-builds.org/specs/build-path-prefix-map/)
	and tries to find a source in mapping
	that maps to a result that exists in the file system.
	There are the following return values:
	- None, means BUILD_PATH_PREFIX_MAP is not set.
	- Some None, no source prefixes of [path] in the mapping were found,
	- Some (Some target), means target is the first file (in priority
	  order) that [path] mapped to that exists in the file system.
	- Not_found raised, means some source prefixes in the map
	  were found that matched [path], but none of them existed
	  in the file system. *)

    val rewrite_find_all_existing_dirs: string -> string list
    (** [rewrite_find_all_existing_dirs absdir] accumulates a list of existing
	directories, [dirs], that are the result of mapping abstract directory,
	[absdir], over all the mapping pairs in the BUILD_PATH_PREFIX_MAP
	environment variable, if any. The list [dirs] will be in priority order
	(head as highest priority). The possible results are:
	- [], means BUILD_PATH_PREFIX_MAP is not set, or if set, then
	  there were no matching prefixes of path.
	- Some dirs, means dirs are the directories found. A possibility
	  is that there was no mapping, but the input path is an existing
	  directory and the result if [[path]].
	- Not_found raised, means some source prefixes in the map
	  were found that matched [path], but none of mapping results
	  were existing directories (possibly due to misconfiguration)

	See the BUILD_PATH_PREFIX_MAP spec at
	(https://reproducible-builds.org/specs/build-path-prefix-map/)
	*)

In addition to those new APIs, Location.absolute_path was modified
to do rewriting on absolute paths (in addition to relative paths
which it was already doing).

At the present time their is no code depending on these new APIs,
so the only functional change is the change to Location.absolute_path

They new API functions are used in the other parts of the PRs mentioned.
richardlford added a commit to richardlford/ocaml that referenced this pull request Mar 27, 2023
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.

However, that specification only describes half of the story.  Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.

Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.

We use the same mechanism in the deployment phase. Then
the BUILD_PATH_PREFIX_MAP must be setup with the logical
inverse of the mapping used during the build phase.

This PR generalized the functions in Build_path_prefix_map
and Location to facility the inverse map.

Also, this is part of a larger PR,
ocaml#12126, which itself
was part of
ocaml#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for
these changes. See it for prior discussion and the other changes.

I will now describe the essence of the changes.

1. utils/build_path_prefix_map.{ml,mli}

The API is now:

    val rewrite_first : map -> path -> path option
    (** [rewrite_first map path] tries to find a source in [map]
	that is a prefix of the input [path]. If it succeeds,
	it replaces this prefix with the corresponding target.
	If it fails, it just returns [None]. *)

    val rewrite_all : map -> path -> path list
    (** [rewrite_all map path] finds all sources in [map]
	that are a prefix of the input [path]. For each matching
	source, in priority order, it replaces this prefix with
	the corresponding target and adds the result to
	the returned list.
	If there are no matches, it just returns [[]]. *)

    val rewrite : map -> path -> path

2. parsing/location.{ml,mli}

Use the new API and renamed the rewriting functions.

    val rewrite_find_first_existing: string -> string option
    (** [rewrite_find_first_existing path] uses a BUILD_PATH_PREFIX_MAP mapping
	(https://reproducible-builds.org/specs/build-path-prefix-map/)
	and tries to find a source in mapping
	that maps to a result that exists in the file system.
	There are the following return values:
	- None, means BUILD_PATH_PREFIX_MAP is not set.
	- Some None, no source prefixes of [path] in the mapping were found,
	- Some (Some target), means target is the first file (in priority
	  order) that [path] mapped to that exists in the file system.
	- Not_found raised, means some source prefixes in the map
	  were found that matched [path], but none of them existed
	  in the file system. *)

    val rewrite_find_all_existing_dirs: string -> string list
    (** [rewrite_find_all_existing_dirs absdir] accumulates a list of existing
	directories, [dirs], that are the result of mapping abstract directory,
	[absdir], over all the mapping pairs in the BUILD_PATH_PREFIX_MAP
	environment variable, if any. The list [dirs] will be in priority order
	(head as highest priority). The possible results are:
	- [], means BUILD_PATH_PREFIX_MAP is not set, or if set, then
	  there were no matching prefixes of path.
	- Some dirs, means dirs are the directories found. A possibility
	  is that there was no mapping, but the input path is an existing
	  directory and the result if [[path]].
	- Not_found raised, means some source prefixes in the map
	  were found that matched [path], but none of mapping results
	  were existing directories (possibly due to misconfiguration)

	See the BUILD_PATH_PREFIX_MAP spec at
	(https://reproducible-builds.org/specs/build-path-prefix-map/)
	*)

In addition to those new APIs, Location.absolute_path was modified
to do rewriting on absolute paths (in addition to relative paths
which it was already doing).

At the present time their is no code depending on these new APIs,
so the only functional change is the change to Location.absolute_path

They new API functions are used in the other parts of the PRs mentioned.
richardlford added a commit to richardlford/ocaml that referenced this pull request Mar 27, 2023
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.

However, that specification only describes half of the story.  Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.

Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.

We use the same mechanism in the deployment phase. Then
the BUILD_PATH_PREFIX_MAP must be setup with the logical
inverse of the mapping used during the build phase.

This PR generalized the functions in Build_path_prefix_map
and Location to facility the inverse map.

Also, this is part of a larger PR,
ocaml#12126, which itself
was part of
ocaml#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for
these changes. See it for prior discussion and the other changes.

For the new functions, see the '.mli' file for details.

1. utils/build_path_prefix_map.{ml,mli}

The new api functions are:

    val rewrite_first : map -> path -> path option
    val rewrite_all : map -> path -> path list

2. parsing/location.{ml,mli}

The new api functions are:

    val rewrite_find_first_existing: string -> string option
    val rewrite_find_all_existing_dirs: string -> string list

In addition to those new APIs, Location.absolute_path was modified
to do rewriting on absolute paths (in addition to relative paths
which it was already doing).

At the present time their is no code depending on these new APIs,
so the only functional change is the change to Location.absolute_path

The new API functions are used in the other parts of the PRs mentioned.
@richardlford
Copy link
Copy Markdown
Contributor Author

@gasche, continuing to follow your suggestion I split out part 2 which is just the bytecomp changes. They are now in a new PR, #12139.

@richardlford
Copy link
Copy Markdown
Contributor Author

I converted to draft because it is not intended to merge this PR. Instead PRs are being split off it.

@richardlford richardlford marked this pull request as ready for review May 18, 2023 03:06
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.

However, that specification only describes half of the story.  Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.

Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.

We use the same mechanism in the deployment phase, but use a different
variable, DEPLOY_PATH_PREFIX_MAP.  It must be setup with the logical
inverse of the mapping used during the build phase.

Also, this is part of a larger PR,
ocaml#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for these changes. See
it for prior discussion and the other changes.

I will now describe the essence of the changes.

1. bytecomp/emitcode.ml

Added sanitizing of paths in the debug events.
If the compiler is given absolute source paths
this was leaking absolute build paths.

2. debugger/command_line.ml

A debugger variable, "mapping", was added, which is tied to the
DEPLOY_PATH_PREFIX_MAP environment variable.  So "set mapping value"
will set DEPLOY_PATH_PREFIX_MAP, and "show mapping" will show the value
of DEPLOY_PATH_PREFIX_MAP. This allows the user to set it
programmaticaly, possibly from the ".ocamldebug" file.

3. debugger/source.ml

In source_of_module, use the Location.rewrite_exists.

4. debugger/symbols.{ml, mli}

a. Avoid adding directories redundantly

b. Use Location.matching_dirs to expand directories.

5. Man page and manual updated.

Updated new tests to take into account:

1. The new ocamltest syntax that is C-like rather than
org-mode like.

2. The debugger now uses DEPLOY_PATH_PREFIX_MAP for its
inverse mapping, rather than BUILD_PATH_PREFIX_MAP.

Fixes ocaml#12083
@richardlford
Copy link
Copy Markdown
Contributor Author

This PR has now been updated and consists of the following parts:

  1. bytecomp, parsing, and utils changes. These are from Modify bytecomp for BUILD_PATH_PREFIX_MAP #12139 and will go away from this PR once that is merged.
  2. ocamltest and testsuite changes. These are not part of this PR, but are left in for now to show that this code has been tested. Note that I had to revised the tests since the ocamltest syntax changed since these were written, and also now ocamldebug uses the DEPLOY_PATH_PREFIX_MAP environment variable.
  3. debugger, man, and manual changes. This is the code to be reviewed.
    Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modify ocamldebug to handle Dune workspace mapping

3 participants