Add deployment phase support for BUILD_PATH_PREFIX_MAP#12126
Add deployment phase support for BUILD_PATH_PREFIX_MAP#12126richardlford wants to merge 2 commits intoocaml:trunkfrom
Conversation
I'm afraid I don't have the time to look into all this for the forseeable future. But I doubt you need to add more concepts to that specification. If the tool setting up the prefixes does it in a smart way so that the map is bijective (i.e. distinct source prefixes map on distinct target prefixes) I don't think there's any need for search lists. You should always be able to redirect each target prefix to the location you need at "deploy time". |
What you say is true, but a much more complex mapping is needed in that case. If the product involves And the generalization to allow search lists is very small (even a small part of this PR), so I think it is worthwhile in order to gain simplicity. You could think of this as being analogous to the |
To make that a little bit more precise, I think it would involve
(With With that you can simply follow exactly the same lookup specified in
Personally I'd rather have a simpler mechanism than catering for this use case. Because:
Regarding 2., a significant trigger for programmers to use debuggers are the setup costs. Basically they use them if they are push button, otherwise they'll just reach for So it should be expected that a tool will set up the mapping for you. Namely the one which perfectly knows all this information: your build system. |
Good point. Only one mapping is needed to cover the whole Also, a possible scenario that search lists would handle better would be a situation where there was more than one opam-like repository, e.g. experimental vs. stable. Then at deploy-time, the /lib could map to a search list with /experimental/lib and /stable/lib. Without search lists that would be handled by an additional I should point out that the API with search lists is (almost) backward compatible with the old API. I say almost, because the function to encode a pair in the new API takes a list of targets rather than a single target. But if the idea of including search lists is accepted, we could have a backward-compatible API and an enhanced API if search-lists are desired. I will leave it up to whomever has the authority to decide, whether to include the search-list feature. My preference is to include it. But I will abide by whatever decision is made. Would @gasche or one of the others maintainers please make a decision on this point? Speaking of producers of mappings, the mappings Dune produces for the build phase are correct if the developer has their source layout isomorphic to the installation layout (so reproducible paths will look like And, of course, Dune will need to have some features to make the proper deploy-time map, and perhaps invoke the debugger. The emacs or other interfaces will also need to be updated. I will also be investigating that. |
|
@gasche, I see that you are the author of the build_path_prefix_map package. Do you see any merit in adding the search-list capability? |
|
I wrote the support code for the current BUILD_PATH_PREFIX_MAP specification indeed -- it was upstreamed into the compiler, and also published as a separate library in case other OCaml programs would want to implement it. I am hesitant to follow your road of changing the specification to do more things. Initially the reproducible-builds people sold the specification as a done deal, soon to be supported by gcc/clang, but I am not sure that gcc/clang actually followed this approach in the end. In any case, I liked the idea of using the same approach as others and sharing a common specification. With your suggestion we would go down the route of adopting and extending the specification for the needs of (a small fragment of) the OCaml community only, and we would either have to gather support from a non-existent or at least unknown group of outside users, or be responsible for maintaining it moving forward. If we wanted to do our own thing independently of others, I am not sure that I would use this specification with its weird quoting rules; maybe something simpler could be used. This is all said without looking at the details of your proposal, which does not sound unreasonable or badly designed. (The idea of quoting an additional special character is better than your previous approach of reusing Because I haven't invested the time yet to understand your work in details, I am left with the same question that I already asked previously: In a previous PR a suggestion was made to just "revert" an existing BUILD_PATH_PREFIX_MAP mapping, which in the general case may give a one-to-many relation. Is that not enough to support what you call "deployment phase"? Finally, @dbuenzli suggested using different "workspace root" names for different packages to avoid the issue of needing one-to-many mappings: maybe we want to use the build-time mapping |
|
(Note: when I talk of a mapping |
|
OK. You all have convinced me that it is not worth the effort to introduce search lists and get them accepted into the community. I will revert that part of my change. It will be up to what the spec calls "producers", i.e. those that produce the BUILD_PATH_PREFIX_MAP, (both during the build phase and the deploy phase) to get it right. "Low-level tools", like the compiler and debugger, will just do the mapping requested. For Dune users, Dune will need to add the capability for the deploy phase. |
abbe6e1 to
881e4c1
Compare
|
I have removed the search list feature from the current push. When using Dune, one thing that occurs to me I have lost is the ability (without great effort) to preferentially find the original source files rather than finding the mirror file in the _build directory. The abstract path to a source file will be something like
At build time, when I want to do debugging, we will have the original source:
But the abstract path to a
and it will exist in the build directory at:
If I preferentially map The workaround would be to have mappings at the individual artifact level, I suppose it is OK to get the location of the sources in the build directory, rather than the originals, though when using the emacs interface, whenever that happens, it asks whether I want to go to the original, which is rather annoying. Although, I happened to think of a small tweak to the mapping functions that might add power without much complexity. What if we had a variation of Build_path_prefix_map.rewrite_opt that took a predicate? This variation would find the first rewrite that satisfied the predicate. If the predicate was that the rewritten file existed, then the above problem would be solved. First the mapping of the Here is a draft implementation: let pred_rewrite_opt prefix_map pred path =
let is_prefix { target = _; source } =
String.length source <= String.length path
&& String.equal source (String.sub path 0 (String.length source))
in
let subst path { source; target } =
target ^ (String.sub path (String.length source)
(String.length path - String.length source)) in
match
List.find (function
| None -> false
| Some pair -> (is_prefix pair) && pred (subst path pair))
(* read key/value pairs from right to left, as the spec demands *)
(List.rev prefix_map)
with
| exception Not_found -> None
| None -> None
| Some pair -> Some (subst path pair)
What would you think of a small enhancement like that? |
|
The more I think about it, the more important I think it is to have the This has all of the advantages of the search-list strategy I had without any of the complexities. I think if the writers of the I'm also going to temporarily bring over my ocamltest and testsuite changes for the purpose of testing these changes. But the reviewers should ignore the ocamltest and testsuite changes. I will revert those when the review of this PR is complete so that they will not be included. |
|
I am fine with On the other hand:
A more vague question:
|
|
Also: have you ever commented on the proposed feature of asking tools to "invert" a given BUILD_PATH_PREFIX_MAP mapping? (The idea would be that producers and consumers could use the same mapping, but consumers would be asked to 'invert' the mapping.) This was suggested there and there by @dbuenzli in your original issue, I asked about it there in the larger PR (my question was: have you considered this suggestion and does it solve the question of how to specify one-to-many relations?) and in the present discussion again, but I haven't been able to locate a comment from you about this proposed feature or an answer to my questions about it (does it solve your problem? did you consider it?). I would be curious to hear what you think about it. I wonder if there is some misunderstanding about this suggestion that makes you not consider it, because from a distance it feels like it could be part of a nice solution to the overall problem of "deployment phase support". |
Thank you.
I had thought about that. I was concerned that the version with the predicate might slow down the version used in the build phase. But I would be happy to consolidate them and have the build phase version just use a
I like find_rewrite.
Those thoughts had occurred to me too. During the deployment phase it seems that it is the responsibility of the map producer should be to give a map where the abstract artifacts are found, and if not, that should be an error. I will make a corresponding change. |
No, I haven't, so I will now. The mapping Dune uses is The only case where inverting the map might make sense is for the local developer for the mapping used to build the products in his own development directory, because then the build environment and the deployment environment are the same. But even here, the map is used only in the It is true that the logical inversion of where again, the right-most mapping takes precedence. Using such a map is made possible by the existence of the newly-named I hope this adequately answers that suggestion. |
Just to clarify, there is nothing performance-sensitive about these functions, so we have to optimize for clarity and maintainability instead of performance. (If we were to find out that there is a cost, or possibly for memory-sharing reasons, we could add a memoization step.) |
Understood. |
|
@richardlford I think the problem we are having is that you are taking the current map made by With the map I suggest in this comment, there is:
I still fail to see what's the problem with this simple approach. |
You are right. I think the problem is that I thought you were somehow suggesting there was a map that needed to be inverted. Instead, what I now understand is that the build tool, e.g. Dune, is responsible for making the map during the build phase. And that the build tool must also be responsible for building the deployment phase map. That deployment map will be a logical inverse of the build phase map. Then it will likely build a map as you suggested in your comment. If the user has their source layout isomorphic to the install layout, then Dune can get by with the simple maps I suggested. But if the user has a more complex source layout, then Dune will have more work to create the maps, both for build and deployment phases. The And, as you suggested, invoking the debugger should be a push-button operation and the build tool should handle the setup. I totally agree. |
|
I have a question about the CI testing environment. I'd like to write some tests that use Dune. I have written an ocamltest action, has_dune, that checks whether Dune is available, and would write the tests guarded by that action. My question is, does the CI testing environment have Dune? If not, then the tests I write would only be useful locally. Presumably one might also care about which version of dune was available. If there were other build systems that used BUILD_PATH_PREFIX_MAP we could consider testing with them as well, but for now I'm focused on getting things to work for Dune. Any opinions on having some build system tests in the OCaml test suite? |
This is not strictly true, since the build tool could make a mapping pair for each individual source file that mapped to its location in the source directory rather than in the build directory. This would be a more specific mapping than the mappings that map directories, so that would work. But having |
|
I have pushed a commit with changes to reflect the suggestions from @gasche. One additional helper not mentioned above is val matching_dirs : map -> path -> path list
(** [matching_dirs map absdir] accumulates a list of existing directories,
[dirs], that are the result of mapping abstract directory, [absdir],
over all the mapping pairs in [map]. The list [dirs] will be in
priority order (head as highest priority). If there is success,
returns [dirs]. If no source in the map was a prefix of [absdir],
returns [[]]. If at least one source in the map was a prefix of
[absdir], but none of the mappings were existing directories,
raise Not_found.
*)This is needed for handling the directories in the debug information. I'll now proceed to do more testing. |
|
I've squashed and pushed again, with a small change to do a |
|
I pushed some of the tests from other PR and added some new unit tests for matching_dirs and rewrite_exists. Reminder: the ocamltest and testsuite changes are not in this PR, but are temporarily here for the purpose of testing. |
|
OK. Based on my testing, I think the compiler and debugger changes are complete and functionally correct. So I think this is ready for a detailed code review. The changes have been squashed into two commits. One is the compiler and debugger changes. That is the commit to review. The other has ocamltest enhancements and the tests for the functionality. They will not be merged as part of the PR, but are there for your reference, and to verify that the code is passing the tests. When the main part is approved this part will be removed before the merge. I wanted to do some testing with Dune, but that is something at a higher level. Since there does not appear to be an easy way to do Dune testing in the ocaml/testsuite, I will do that separately in the Dune test suite, and start investigating how to implement a |
gasche
left a comment
There was a problem hiding this comment.
I did a review pass on the first commit. As usual, this is rather impressive and overall fine but also there are too many things in a single commit to my taste. I would have preferred a commit with the build_path_prefix_map and Location logic, a commit in bytecode, and a commit for the debugger. But so it is, and I don't know that you should spend more effort splitting it.
I don't understand the potential impact of rewriting all paths provided to the compiler, not just relative paths. I don't know if anyone else does, and I'm not sure if there is a testing strategy to learn of issues in advance. (I suspect that we don't know what we want to check, and that complaints if any would come from people with non-standard use-cases we don't know about, not building/running code from the opam repository.) I have gathered my courage to approve the change in any case, with the idea that we would revert or at least change course if we learned of major issues after the release.
parsing/location.mli
Outdated
| order) that [path] mapped to that exists in the file system. | ||
| - Not_found raised, means some source prefixes in the map | ||
| were found that matched [path], but none of them existed | ||
| in the file system. *) |
There was a problem hiding this comment.
This function has four subtly-different non-standard return modes. This is too many, possibly three too many.
From the perspective of the user of Location, the map is an internal detail. They only need to know that some rewriting may happen depending on some configuration choice, and maybe we also want to expose the possibility that this rewriting fails due to misconfiguration. But I don't see why there should be any difference between an unset map, an empty map, or a map without any matching prefix.
Proposal:
- If a rewrite gives a path
pthat exists, returnSome p - If a matching prefix exists but no rewrites exists, the function prints a warning itself and then behaves as if no matching prefix exists.
- If no map is set or no match is found or exists: if the input path
pexists, returnSome p, otherwise returnNone.
There was a problem hiding this comment.
I changed as you suggested.
|
|
||
| See the BUILD_PATH_PREFIX_MAP spec at | ||
| (https://reproducible-builds.org/specs/build-path-prefix-map/) | ||
| *) |
There was a problem hiding this comment.
Same opinion here. You take a directory path and rewrite it into a list of directories that exist, possibly emitting a warning along the way. I would call it rewrite_all_existing_dirs maybe?
There was a problem hiding this comment.
I changed as you suggested.
utils/build_path_prefix_map.mli
Outdated
| val decode_map : string -> (map, error_message) result | ||
|
|
||
| val find_rewrite : map -> (path -> bool) -> path -> path option | ||
| (** [rewrite_opt map pred path] tries to find a source in [map] |
There was a problem hiding this comment.
Uses your new API, rewrite_first.
utils/build_path_prefix_map.ml
Outdated
| | Some { source; target } -> | ||
| Some (target ^ (String.sub path (String.length source) | ||
| (String.length path - String.length source))) | ||
| let check = (function |
There was a problem hiding this comment.
(no need for parentheses here; I don't usually comment on pure style but this is really unidiomatic)
There was a problem hiding this comment.
That code is now gone.
| if (is_relative path) then (concat (Sys.getcwd ()) path) | ||
| else path | ||
| in | ||
| let mapped_abspath = Location.rewrite_absolute_path unmapped_abspath in |
There was a problem hiding this comment.
Isn't this equivalent to let mapped_abspath = Location.absolute_path path?
There was a problem hiding this comment.
Yes, but I wanted the unmapped path available so it could be compared to the mapped path. If the mapping did nothing, I was just going to keep the existing event.
debugger/command_line.ml
Outdated
| var_action = environ_variable false "BUILD_PATH_PREFIX_MAP"; | ||
| var_help = | ||
| "BUILD_PATH_PREFIX_MAP used to map abstract\n\ | ||
| paths to runtime environment."} |
There was a problem hiding this comment.
You could include a trailing semicolon here, so that the next person does not have to change line 1237 to add a new element.
manual/src/cmds/debugger.etex
Outdated
| where mapped to abstract reproducible paths. | ||
|
|
||
| Before ocamldebug is invoked, ``BUILD_PATH_PREFIX_MAP`` must again | ||
| be set, but this time to cause the logical inverse of the |
There was a problem hiding this comment.
The phrasing of this paragraph suggests that setting BUILD_PATH_PREFIX_MAP is mandatory. The documentation should be clear that this is an optional feature.
| Before ocamldebug is invoked, ``BUILD_PATH_PREFIX_MAP`` must again | ||
| be set, but this time to cause the logical inverse of the | ||
| first mapping. Typically that will be one by a build tool, | ||
| but it may be done manually by a knowledgeable user. |
There was a problem hiding this comment.
I would consider adding support in the compiler codebase for asking the BUILD_PATH_PREFIX_MAP reader (here ocamldebug) to invert the mapping themselves, to make life simpler for tools. (I think that @dbuenzli also had such a convenience feature in mind, but I am not sure.) But this could be done separately from the current PR.
There was a problem hiding this comment.
I will consider this. The mapping done during the build phase will involve the build directory, but during the deployment phase we also want to have the source directory and the opam directory as part of the map, so the deployment map cannot be purely derived from the build-phase map. I think it makes more sense for the build tool, e.g. Dune, to produce the inverse map than for the low-level tool.
utils/build_path_prefix_map.mli
Outdated
| returns [[]]. If at least one source in the map was a prefix of | ||
| [absdir], but none of the mappings were existing directories, | ||
| raise Not_found. | ||
| *) |
There was a problem hiding this comment.
I am not sure that we need to expose those two specialized functions in this module, given that they can easily be defined from the outside using find_rewrite. I don't have a strong opinion either and including them here is acceptable to me.
I also wondered if we should include a function that returns the list of possible rewrites, and not the first one only, so that one could implement matching_dirs without the false-predicate hack.
There was a problem hiding this comment.
Note: if we expose a function that returns all possible rewrites in priority order, then we don't actually need find_rewrite anymore, the user can define it on top. (There is the slight performance disadvantage of computing all rewrites to use just one, we could use Seq.t to make the computation lazy if we wanted to be really fancy, but a list is fine I think.)
This would give the following API:
val rewrite_first : path -> path option
val rewrite_all : path -> path list option (or: path Seq.t option)
val rewrite : path -> path (* convenience function *)I find the simplicity of the resulting API seductive. Everything else, including showing a warning, can be done in Location or some other module.
There was a problem hiding this comment.
I adopted your suggestion and I think things turned out quite nicely.
parsing/location.mli
Outdated
| *) | ||
|
|
||
| val absolute_path: string -> string | ||
| (** Makes absolute and uses rewrite_absolute_path. *) |
There was a problem hiding this comment.
We now have many BUILD_PATH_PREFIX_MAP functions in this module. I think they would deserve their own documentation section (currently they are in {1 Printing locations}, they could move to {1 Rewriting paths} or something). We could also think about their naming as a group, instead of individual functions.
- should all functions that apply some rewriting start with the
rewrite_prefix? Thenabsolute_pathshould change now. (But maybe we don't want that.) rewrite_existsandmatching_dirsare closely related but have inconsistent names; mayberewrite_find_first_existingandrewrite_find_all_existing_dirs?
I don't know what other build systems use the OCaml compiler and also do One case where ocamlc was missing rewriting was the case where the user passed an absolute path to the source file (ocamltest does this). In that case the absolute path to the source was leaking in the debug event paths. In a separate conversion with @lpw25, #12106 (comment), I determined that the list of debug directories includes all the directories where the source files are found. The conclusion is that if the debugger just looked up the basename of the source path using the provided directories, then the sources would be found. So an alternate for the debug events, rather than using |
|
From a distance it sounds like BUILD_PATH_PREFIX_MAP is a more precise approach that would help avoid weird ambiguities and under-specifications -- if several modules in different subdirectories have the same filename for example, but the other approach, less precise, is less work to implement. |
|
@gasche, I squashed, rebased, and force-pushed, but I kept the changes in response to your review separate until you have a chance to look at them. The commits are now the following:
As before, only commits 1 and 2 are part of this PR. Commits 3 and 4 are for reference and will be moved to a separate PR. As mentioned above, I'm not yet sure about the rewriting of debug events. I believe the basename of the debug event path along with the debug directories should be sufficient to find the sources, but I've not yet gotten that to work. If you wish, I can split out a separate PR with just the Build_path_prefix_map and Location changes if you think those parts are ready. |
|
Another quick push to fix alldepend problem. |
The BUILD_PATH_PREFIX_MAP specification tells how to use that environment variable to achieve reproducible build, i.e. builds of products that do not leak absolute paths from the build environment. See https://reproducible-builds.org/specs/build-path-prefix-map. However, that specification only describes half of the story. Let us call the building of reproducible products the "Build Phase". That is the phase covered by the existing specification. Let us defined the "Deployment phase" as the phase where you accept a built reproducible product and make use of it, i.e. deploy it. An example would be the debugger taking a reproducible binary and letting the user debug it, showing the user the source code, etc. We use the same mechanism in the deployment phase. Then the BUILD_PATH_PREFIX_MAP must be setup with the logical inverse of the mapping used during the build phase. This PR generalized the functions in Build_path_prefix_map and Location to facility the inverse map. Also, this is part of a larger PR, ocaml#12126, which itself was part of ocaml#12085, that is being split up because it was too large. In addition to these changes, that PR includes ocamltest enhancements plus the tests for these changes. See it for prior discussion and the other changes. I will now describe the essence of the changes. 1. utils/build_path_prefix_map.{ml,mli} The API is now: val rewrite_first : map -> path -> path option (** [rewrite_first map path] tries to find a source in [map] that is a prefix of the input [path]. If it succeeds, it replaces this prefix with the corresponding target. If it fails, it just returns [None]. *) val rewrite_all : map -> path -> path list (** [rewrite_all map path] finds all sources in [map] that are a prefix of the input [path]. For each matching source, in priority order, it replaces this prefix with the corresponding target and adds the result to the returned list. If there are no matches, it just returns [[]]. *) val rewrite : map -> path -> path 2. parsing/location.{ml,mli} Use the new API and renamed the rewriting functions. val rewrite_find_first_existing: string -> string option (** [rewrite_find_first_existing path] uses a BUILD_PATH_PREFIX_MAP mapping (https://reproducible-builds.org/specs/build-path-prefix-map/) and tries to find a source in mapping that maps to a result that exists in the file system. There are the following return values: - None, means BUILD_PATH_PREFIX_MAP is not set. - Some None, no source prefixes of [path] in the mapping were found, - Some (Some target), means target is the first file (in priority order) that [path] mapped to that exists in the file system. - Not_found raised, means some source prefixes in the map were found that matched [path], but none of them existed in the file system. *) val rewrite_find_all_existing_dirs: string -> string list (** [rewrite_find_all_existing_dirs absdir] accumulates a list of existing directories, [dirs], that are the result of mapping abstract directory, [absdir], over all the mapping pairs in the BUILD_PATH_PREFIX_MAP environment variable, if any. The list [dirs] will be in priority order (head as highest priority). The possible results are: - [], means BUILD_PATH_PREFIX_MAP is not set, or if set, then there were no matching prefixes of path. - Some dirs, means dirs are the directories found. A possibility is that there was no mapping, but the input path is an existing directory and the result if [[path]]. - Not_found raised, means some source prefixes in the map were found that matched [path], but none of mapping results were existing directories (possibly due to misconfiguration) See the BUILD_PATH_PREFIX_MAP spec at (https://reproducible-builds.org/specs/build-path-prefix-map/) *) In addition to those new APIs, Location.absolute_path was modified to do rewriting on absolute paths (in addition to relative paths which it was already doing). At the present time their is no code depending on these APIs. They are used in the other parts of the PRs mentioned.
The BUILD_PATH_PREFIX_MAP specification tells how to use that environment variable to achieve reproducible build, i.e. builds of products that do not leak absolute paths from the build environment. See https://reproducible-builds.org/specs/build-path-prefix-map. However, that specification only describes half of the story. Let us call the building of reproducible products the "Build Phase". That is the phase covered by the existing specification. Let us defined the "Deployment phase" as the phase where you accept a built reproducible product and make use of it, i.e. deploy it. An example would be the debugger taking a reproducible binary and letting the user debug it, showing the user the source code, etc. We use the same mechanism in the deployment phase. Then the BUILD_PATH_PREFIX_MAP must be setup with the logical inverse of the mapping used during the build phase. This PR generalized the functions in Build_path_prefix_map and Location to facility the inverse map. Also, this is part of a larger PR, ocaml#12126, which itself was part of ocaml#12085, that is being split up because it was too large. In addition to these changes, that PR includes ocamltest enhancements plus the tests for these changes. See it for prior discussion and the other changes. I will now describe the essence of the changes. 1. utils/build_path_prefix_map.{ml,mli} The API is now: val rewrite_first : map -> path -> path option (** [rewrite_first map path] tries to find a source in [map] that is a prefix of the input [path]. If it succeeds, it replaces this prefix with the corresponding target. If it fails, it just returns [None]. *) val rewrite_all : map -> path -> path list (** [rewrite_all map path] finds all sources in [map] that are a prefix of the input [path]. For each matching source, in priority order, it replaces this prefix with the corresponding target and adds the result to the returned list. If there are no matches, it just returns [[]]. *) val rewrite : map -> path -> path 2. parsing/location.{ml,mli} Use the new API and renamed the rewriting functions. val rewrite_find_first_existing: string -> string option (** [rewrite_find_first_existing path] uses a BUILD_PATH_PREFIX_MAP mapping (https://reproducible-builds.org/specs/build-path-prefix-map/) and tries to find a source in mapping that maps to a result that exists in the file system. There are the following return values: - None, means BUILD_PATH_PREFIX_MAP is not set. - Some None, no source prefixes of [path] in the mapping were found, - Some (Some target), means target is the first file (in priority order) that [path] mapped to that exists in the file system. - Not_found raised, means some source prefixes in the map were found that matched [path], but none of them existed in the file system. *) val rewrite_find_all_existing_dirs: string -> string list (** [rewrite_find_all_existing_dirs absdir] accumulates a list of existing directories, [dirs], that are the result of mapping abstract directory, [absdir], over all the mapping pairs in the BUILD_PATH_PREFIX_MAP environment variable, if any. The list [dirs] will be in priority order (head as highest priority). The possible results are: - [], means BUILD_PATH_PREFIX_MAP is not set, or if set, then there were no matching prefixes of path. - Some dirs, means dirs are the directories found. A possibility is that there was no mapping, but the input path is an existing directory and the result if [[path]]. - Not_found raised, means some source prefixes in the map were found that matched [path], but none of mapping results were existing directories (possibly due to misconfiguration) See the BUILD_PATH_PREFIX_MAP spec at (https://reproducible-builds.org/specs/build-path-prefix-map/) *) In addition to those new APIs, Location.absolute_path was modified to do rewriting on absolute paths (in addition to relative paths which it was already doing). At the present time their is no code depending on these new APIs, so the only functional change is the change to Location.absolute_path They new API functions are used in the other parts of the PRs mentioned.
The BUILD_PATH_PREFIX_MAP specification tells how to use that environment variable to achieve reproducible build, i.e. builds of products that do not leak absolute paths from the build environment. See https://reproducible-builds.org/specs/build-path-prefix-map. However, that specification only describes half of the story. Let us call the building of reproducible products the "Build Phase". That is the phase covered by the existing specification. Let us defined the "Deployment phase" as the phase where you accept a built reproducible product and make use of it, i.e. deploy it. An example would be the debugger taking a reproducible binary and letting the user debug it, showing the user the source code, etc. We use the same mechanism in the deployment phase. Then the BUILD_PATH_PREFIX_MAP must be setup with the logical inverse of the mapping used during the build phase. This PR generalized the functions in Build_path_prefix_map and Location to facility the inverse map. Also, this is part of a larger PR, ocaml#12126, which itself was part of ocaml#12085, that is being split up because it was too large. In addition to these changes, that PR includes ocamltest enhancements plus the tests for these changes. See it for prior discussion and the other changes. I will now describe the essence of the changes. 1. utils/build_path_prefix_map.{ml,mli} The API is now: val rewrite_first : map -> path -> path option (** [rewrite_first map path] tries to find a source in [map] that is a prefix of the input [path]. If it succeeds, it replaces this prefix with the corresponding target. If it fails, it just returns [None]. *) val rewrite_all : map -> path -> path list (** [rewrite_all map path] finds all sources in [map] that are a prefix of the input [path]. For each matching source, in priority order, it replaces this prefix with the corresponding target and adds the result to the returned list. If there are no matches, it just returns [[]]. *) val rewrite : map -> path -> path 2. parsing/location.{ml,mli} Use the new API and renamed the rewriting functions. val rewrite_find_first_existing: string -> string option (** [rewrite_find_first_existing path] uses a BUILD_PATH_PREFIX_MAP mapping (https://reproducible-builds.org/specs/build-path-prefix-map/) and tries to find a source in mapping that maps to a result that exists in the file system. There are the following return values: - None, means BUILD_PATH_PREFIX_MAP is not set. - Some None, no source prefixes of [path] in the mapping were found, - Some (Some target), means target is the first file (in priority order) that [path] mapped to that exists in the file system. - Not_found raised, means some source prefixes in the map were found that matched [path], but none of them existed in the file system. *) val rewrite_find_all_existing_dirs: string -> string list (** [rewrite_find_all_existing_dirs absdir] accumulates a list of existing directories, [dirs], that are the result of mapping abstract directory, [absdir], over all the mapping pairs in the BUILD_PATH_PREFIX_MAP environment variable, if any. The list [dirs] will be in priority order (head as highest priority). The possible results are: - [], means BUILD_PATH_PREFIX_MAP is not set, or if set, then there were no matching prefixes of path. - Some dirs, means dirs are the directories found. A possibility is that there was no mapping, but the input path is an existing directory and the result if [[path]]. - Not_found raised, means some source prefixes in the map were found that matched [path], but none of mapping results were existing directories (possibly due to misconfiguration) See the BUILD_PATH_PREFIX_MAP spec at (https://reproducible-builds.org/specs/build-path-prefix-map/) *) In addition to those new APIs, Location.absolute_path was modified to do rewriting on absolute paths (in addition to relative paths which it was already doing). At the present time their is no code depending on these new APIs, so the only functional change is the change to Location.absolute_path They new API functions are used in the other parts of the PRs mentioned.
The BUILD_PATH_PREFIX_MAP specification tells how to use that environment variable to achieve reproducible build, i.e. builds of products that do not leak absolute paths from the build environment. See https://reproducible-builds.org/specs/build-path-prefix-map. However, that specification only describes half of the story. Let us call the building of reproducible products the "Build Phase". That is the phase covered by the existing specification. Let us defined the "Deployment phase" as the phase where you accept a built reproducible product and make use of it, i.e. deploy it. An example would be the debugger taking a reproducible binary and letting the user debug it, showing the user the source code, etc. We use the same mechanism in the deployment phase. Then the BUILD_PATH_PREFIX_MAP must be setup with the logical inverse of the mapping used during the build phase. This PR generalized the functions in Build_path_prefix_map and Location to facility the inverse map. Also, this is part of a larger PR, ocaml#12126, which itself was part of ocaml#12085, that is being split up because it was too large. In addition to these changes, that PR includes ocamltest enhancements plus the tests for these changes. See it for prior discussion and the other changes. For the new functions, see the '.mli' file for details. 1. utils/build_path_prefix_map.{ml,mli} The new api functions are: val rewrite_first : map -> path -> path option val rewrite_all : map -> path -> path list 2. parsing/location.{ml,mli} The new api functions are: val rewrite_find_first_existing: string -> string option val rewrite_find_all_existing_dirs: string -> string list In addition to those new APIs, Location.absolute_path was modified to do rewriting on absolute paths (in addition to relative paths which it was already doing). At the present time their is no code depending on these new APIs, so the only functional change is the change to Location.absolute_path The new API functions are used in the other parts of the PRs mentioned.
|
I converted to draft because it is not intended to merge this PR. Instead PRs are being split off it. |
The BUILD_PATH_PREFIX_MAP specification tells how to use that environment variable to achieve reproducible build, i.e. builds of products that do not leak absolute paths from the build environment. See https://reproducible-builds.org/specs/build-path-prefix-map. However, that specification only describes half of the story. Let us call the building of reproducible products the "Build Phase". That is the phase covered by the existing specification. Let us defined the "Deployment phase" as the phase where you accept a built reproducible product and make use of it, i.e. deploy it. An example would be the debugger taking a reproducible binary and letting the user debug it, showing the user the source code, etc. We use the same mechanism in the deployment phase, but use a different variable, DEPLOY_PATH_PREFIX_MAP. It must be setup with the logical inverse of the mapping used during the build phase. Also, this is part of a larger PR, ocaml#12085, that is being split up because it was too large. In addition to these changes, that PR includes ocamltest enhancements plus the tests for these changes. See it for prior discussion and the other changes. I will now describe the essence of the changes. 1. bytecomp/emitcode.ml Added sanitizing of paths in the debug events. If the compiler is given absolute source paths this was leaking absolute build paths. 2. debugger/command_line.ml A debugger variable, "mapping", was added, which is tied to the DEPLOY_PATH_PREFIX_MAP environment variable. So "set mapping value" will set DEPLOY_PATH_PREFIX_MAP, and "show mapping" will show the value of DEPLOY_PATH_PREFIX_MAP. This allows the user to set it programmaticaly, possibly from the ".ocamldebug" file. 3. debugger/source.ml In source_of_module, use the Location.rewrite_exists. 4. debugger/symbols.{ml, mli} a. Avoid adding directories redundantly b. Use Location.matching_dirs to expand directories. 5. Man page and manual updated. Updated new tests to take into account: 1. The new ocamltest syntax that is C-like rather than org-mode like. 2. The debugger now uses DEPLOY_PATH_PREFIX_MAP for its inverse mapping, rather than BUILD_PATH_PREFIX_MAP. Fixes ocaml#12083
|
This PR has now been updated and consists of the following parts:
|
Add deployment phase support for BUILD_PATH_PREFIX_MAP
The BUILD_PATH_PREFIX_MAP specification tells how to use that
environment variable to achieve reproducible build, i.e. builds of
products that do not leak absolute paths from the build
environment. See
https://reproducible-builds.org/specs/build-path-prefix-map.
However, that specification only describes half of the story. Let us
call the building of reproducible products the "Build Phase". That is
the phase covered by the existing specification.
Let us defined the "Deployment phase" as the phase where you accept a
built reproducible product and make use of it, i.e. deploy it. An
example would be the debugger taking a reproducible binary and letting
the user debug it, showing the user the source code, etc.
We use the same mechanism in the deployment phase. Then
the BUILD_PATH_PREFIX_MAP must be setup with the logical
inverse of the mapping used during the build phase.
Also, this is part of a larger PR,
#12085, that is being split up
because it was too large. In addition to these changes, that PR
includes ocamltest enhancements plus the tests for
these changes. See it for prior discussion and the other changes.
I will now describe the essence of the changes.
Added these functions to aid in use of the mapping
during the deployment phase. They allow to skip
mapping entries that are not applicable (find_rewrite).
find_rewrite is then used to implement rewrite_exists and
matching_dirs.
This has convenience functions for using the above
mapping API. It reads and caches the environment
variable so the end-user does not need to.
Location.absolute_path was modified so that BUILD_PATH_PREFIX_MAP
rewriting is done for both absolute and relative paths. Relative paths
are made absolute by appending to the cwd. Previously only relative
paths were rewritten. One discovery during testing was that if the
compiler is given an absolute path as the input source, the debug
event information had that absolute path (see below).
New functions are:
Added sanitizing of paths in the debug events.
If the compiler is given absolute source paths
this was leaking absolute build paths.
Rewrite when producing the path for a shebang.
The value written is from the user "-use-runtime"
option. The one setting BUILD_PATH_PREFIX_MAP
will have control whether this does anything or not.
A debugger variable, "mapping", was added, which is tied to the
BUILD_PATH_PREFIX_MAP environment variable. So "set mapping value"
will set BUILD_PATH_PREFIX_MAP, and "show mapping" will show the value
of BUILD_PATH_PREFIX_MAP. This allows the user to set it
programmaticaly, possibly from the ".ocamldebug" file.
In source_of_module, use the Location.rewrite_exists.
a. Avoid adding directories redundantly
b. Expose bppm_expand_path and get_load_path so they
can be called from a fake printer which, when
loaded and installed in a test, can do unit testing
of ocamldebug.
c. Use Location.matching_dirs to expand directories.
Fixes #12083