Skip to content

[RFC] Add a single file to load path#9056

Open
nojb wants to merge 9 commits intoocaml:trunkfrom
nojb:load_path_single_file
Open

[RFC] Add a single file to load path#9056
nojb wants to merge 9 commits intoocaml:trunkfrom
nojb:load_path_single_file

Conversation

@nojb
Copy link
Copy Markdown
Contributor

@nojb nojb commented Oct 18, 2019

This PR proposes a way to add a single file to the load path (which up to now consisted of a list of directories). The reasons for doing so are multiple but they revolve around the need for greater control over dependencies.

For the user interface, I reuse the -I flag, but for files: -I foo, where foo is a filename. Here, foo can be any file, but the situation to keep in mind is when foo is a .cmi and/or .cmx file.

The following interaction gives a sense of what this patch does:

$ cat > foo.ml <<EOF
> let _ = Unix.system
> EOF
$ ocamlopt -nostdlib -nopervasives -I +unix.cmi -c foo.ml
File "_none_", line 1:
Warning 58: no cmx file was found in path for module Unix, and its interface was not compiled with -opaque
$ ocamlopt -nostdlib -nopervasives -I +unix.cmi -I +unix.cmx -c foo.ml
$

In order not to pollute the patch, I left for later renaming dir for something more appropriate reflecting that now both files and directories can be used in the load path.

Opinions welcome!

@trefis @alainfrisch @diml

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Oct 18, 2019

Also @dbuenzli

@dbuenzli
Copy link
Copy Markdown
Contributor

xref #7080

Any reason why you don't simply specify the files as positional arguments like other compilation objects ? (rather than overload -I).

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Oct 18, 2019

Any reason why you don't simply specify the files as positional arguments like other compilation objects ? (rather than overload -I).

If we use positional arguments it is a bit less clear what to do in some situations. For example, suppose the compiler is invoked to link something. Then .cmx files on the command line could be understood to mean both that they are added to the load path or that they should be linked in the output artifact (or both). While this last interpretation sounds potentially useful, I thought this choice required some more discussion...

So I as a first step I reused the -I flag to keep it simple. (This also had the added benefit of resulting in a tiny patch, since we reuse all the existing machinery.)

@dbuenzli
Copy link
Copy Markdown
Contributor

Then .cmx files on the command line could be understood to mean both that they are added to the load path or that they should be linked in the output artifact (or both).

I may be forgetting something but I think it is unambiguous:

  • If you are in the compile phase -c then a cmx cannot be meant to be linked.
  • If you are in the archive (-a) or link (-o) phase then a cmi on the line doesn't make sense and a cmx one is meant to be linked.
  • If you use the lets do everything in a single shot convenience mode then you likely want both cmi to be used and cmx to be linked.

else if Sys.is_directory dir then
Dir, Sys.readdir dir
else
File, [|Filename.basename dir|]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with the compiler code base, but where do you make sure that this file actually exists?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better, although I wish there was a stricter mode. In other words, the compiler would error if provided files that don't exist. This would be best for directories as well obviously, but there are some compatibility concerns. From the point of view of the build system, it would be more helpful if compiler is lax about this stuff.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect the number of .cmi/.cmx files passed on the command-line to be potentially very large (transitive dependencies), especially if the feature is used to implement libraries in the build system. So I'm slightly concerned by the extra cost induced by those Sys.is_directory and Sys.file_exists. Remember that we observed significant gains when we switch from file lookups (admittedly in may directories) to reading whole directories at once.

Cannot we not add any such calls? Just remember which .cmi/.cmx file have been added (based on their extension, or specific CLI options...) to make them available if the compiler needs them.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Oct 20, 2019

I may be forgetting something but I think it is unambiguous:

Yes, I think this makes sense.

Copy link
Copy Markdown
Contributor

@trefis trefis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order not to pollute the patch, I left for later renaming dir for something more appropriate reflecting that now both files and directories can be used in the load path.

I think that's a mistake.
The code just looks weird as a result of this PR.

Comment on lines 23 to 27
module Dir = struct
type kind =
| File
| Dir

type t = {
kind : kind;
path : string;
files : string list;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do change this in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed "dir" by "entry".

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Oct 21, 2019

Also, could you add tests? (That -I works when given a single file, that positional arguments are interpreted as your last commit suggests, ...)

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 21, 2019

I don't think .cmi files should be positional arguments. Positional arguments should be for inputs that are essentially consumed to produce the output. That is not true for .cmi files.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Oct 21, 2019

@lpw25 could you please expand on what it means to be essentially consumed ?

The digest of cmis is for example being consumed from them and written in the output objects. So I don't really understand the distinction you trying to make here.

As a cli user is see the compiler as a function taking input files and producing new ones. The type for these files can be detected by their extension and their purpose unambiguous in each of the tool's mode of operation, as such I don't see why we can't simply specify them as positional arguments.

@dbuenzli
Copy link
Copy Markdown
Contributor

And btw. I would actually rather be against overriding -I and keep it solely for directories as is now. That's the convention for many compilers and it makes it easier for me to spot in build logs if the compiler may have considered things that were not explicitly specified on the cli.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 21, 2019

@dbuenzli What I'm trying to get at is that all OCaml's source files and intermediate files get used as positional argument exactly once -- when they are the main focus of the command and are being processed in some way to produce the output.

Whereas .cmi files are used by many different compilation commands in order to look things up from the interfaces. They are being used to help compile other modules, but they are not themselves being compiled.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 21, 2019

I agree that it would be better to use a new parameter name rather than overloading -I.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Oct 21, 2019

@lpw25 I still don't see what kind of benefit you get from that perspective as far as cli usability is concerned. Your goal can't be achieved since compiled objects as positional arguments can already be used to add them to an archive or link them into an executable. So you already have two different positional meaning here for an object.

It seems to me that the line you are trying to draw is quite a brittle one. For example why do you mention cmi files only ? Why should be cmx files accompanying the cmi files for inlining be allowed to be positional while cmi should not ?

If people feel strongly about that I'm fine about another cli flag, I just find it totally redundant and unnecessary since the compilation mode and extensions allow us to understand what is happening without ambiguity.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 21, 2019

It seems to me that the line you are trying to draw is quite a brittle one.

To me it seems both an obvious difference and a fundamental one. I'd find it really strange to pass cmi files positionally to a compilation command.

Maybe another way to look at it: if I passed in additonal cmi files it would not change the output. That is not true of the other positional arguments.

For example why do you mention cmi files only ? Why should be cmx files accompanying the cmi files for inlining be allowed to be positional while cmi should not ?

They aren't allowed to be positional either for the exact same reason.

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Oct 21, 2019

They aren't allowed to be positional either for the exact same reason.

You didn't mention them in that message

Maybe another way to look at it: if I passed in additonal cmi files it would not change the output. That is not true of the other positional arguments.

So if a file is tagged with a cli flag, then it may or it may not change the output. If it's positional then it has to change the output. Well I don't know where you get these conventions from but it doesn't change anything as far as the compiler and the user are concerned and it's not helpful in anyway.

From an incremental build perspective what I want is simply the ability to specify which files I consider will be read by the compiler. After that what I'm really, really, really interested is knowing whether an input file actually influenced the output not, not whether it might have done so or not.

@lthls
Copy link
Copy Markdown
Contributor

lthls commented Oct 21, 2019

I've played a bit with the compiler's command line, and it happens that (for ocamlc) .cmo files passed on the command line will be silently ignored if the -c option is given. Even if the file doesn't exist.

Also, the error message for giving a .cmi file is don't know what to do with foo.cmi. I can think of only one sensible thing to do with a .cmi file, which is what is proposed in this pull request.

The problem is different with .cmx because they can be used either for linking (or making libraries), or for lookup by the inliner. Currently, .cmx files given as positional arguments will only be considered for linking, not lookup, so with the right setup you can have ocamlopt complaining that it cannot find a file that was given to it as a positional parameter.
Example: two files, a.ml and b.ml, with a.ml depending on b.ml, in two different directories dir_a and dir_b. Running ocamlopt -c b.ml in directory dir_b will produce b.cmi and b.cmx.
Copying b.cmi into dir_a then running ocamlopt -c a.ml will work, with a warning that the implementation for B is missing. Running ocamlopt ../dir_b/b.cmx -c a.ml will do exactly the same. And if you remove -c, it will complain during the compilation phase but link without problems.

(I'm fine with both the approaches of @dbuenzli and @lpw25. I just thought I'd point out that the current behaviour of the compiler might not be the best reference.)

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 22, 2019

Ok, one last attempt to express the condition in a way that makes it clear:

When you call a compiler the positional arguments should be the files that are getting compiled. When you call a linker the positional arguments should be the files that are getting linked. They don't need flags because the action to be performed is implied by the name of the tool itself. Anything else the compiler or linker might require needs to be flagged to indicate what role it has in the process. You are not compiling those .cmi files so what are you doing with them?

@dbuenzli
Copy link
Copy Markdown
Contributor

Ok so let me clarify where exactly we differ.

When you call a compiler the positional arguments should be the files that are getting compiled. When you call a linker the positional arguments should be the files that are getting linked.

This is were we differ. This is some kind of esthetical rule which has no fundamental cli usability justification.

Anything else the compiler or linker might require needs to be flagged to indicate what role it has in the process.

The role of each file can be clearly identified by its extension so flagging them is redundant from a cli interaction perspective.

@alainfrisch
Copy link
Copy Markdown
Contributor

I don't have a strong opinion on the cli debate, but just one argument in favor of positional arguments: they make it easier to use shell wildcards (e.g. foo/*.cmi to give access to all .cmi files in a directory, but not their .cmx counterparts; or foo/lib1_*.cmi, etc) or to pass the output of e.g. a find command for recursive lookups. (Admittedly, the new style proposed in this PR with more explicit dependencies is not likely to be used from hand-written command lines.)

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Oct 22, 2019

The role of each file can be clearly identified by its extension so flagging them is redundant from a cli interaction perspective.

That is only the case if the role is both unique and obvious. It is not clear that holds for cmi files. For example, when using -pack you can pass .cmi positional arguments to be included in the pack as well as passing -I arguments to specify the type environment used for checking inclusion with the interface. So here .cmi files can be used as a genuine compilation input as well as being used as part of the type environment.

@dbuenzli
Copy link
Copy Markdown
Contributor

It is not clear that holds for cmi files

My suggestion is of course only if it's clearly unambiguous in all of the cases. Otherwise a new option should be added but I think it should be short and unique (if we can assume the "side" role will be unambiguous according to the phase). I would have proposed -i but it's already taken here's a brain dump: -m (metadata), -r (read) or -p (path).

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Oct 23, 2019

Also, could you add tests? (That -I works when given a single file, that positional arguments are interpreted as your last commit suggests, ...)

I added a test, thanks.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Nov 4, 2019

If I followed the discussion correctly, the idea of passing cmi/cmx files as positional arguments has been abandoned.

So the remaining propositions are:

  1. reuse -I
  2. use explicit flags -cmi / -cmx
  3. use short flags: -m, -p, -r.

For which I'm aware of the following opinions:

  • @dbuenzli seems to dislike (1), I initially shared his opinion but I'm now more open to the idea, considering the alternatives.
  • @dbuenzli seems to dislike (2) (this was discussed offline), apparently -cmi foo.cmi is too repetitive for him
  • I dislike (3).

I personally believe that this new option will primarily be used by build systems, not by end users, so the length of the flag doesn't matter so much, and we can afford to provide something verbose but easily readable / understandable. (Which does not mean that I believe -cmi to be the best option out there, just that I like it better than -m).

Would other participants in this discussion (in particular, @nojb) care to voice an opinion so we can move forward with this PR?

@gasche
Copy link
Copy Markdown
Member

gasche commented Nov 4, 2019

I also dislike short names (I spend more time reading logs of command-line invocations than writing them, so for me clearer names are better). I'm perfectly fine with -I, or with a dedicated option.

@alainfrisch
Copy link
Copy Markdown
Contributor

I have no strong opinion on the naming for options.

I'm rather against -I for the following reason: it forces at least one syscall per command-line option (to read .cmi files available in the directory), unless we decide that directory names cannot have a .cmx/.cmi extension. For performance reasons (esp. under Windows), I expect that a .cmi/.cmx file passed to the compiler should not produce any syscall if that file is not actually looked up. The build system might end up passing hundred or thousands of files on the command-line if it chooses to re-implement the notion of libraries itself. Even if it does some dependency analysis first, it will still need to pass the transitive closure of immediate dependencies (e.g. to resolve type aliases).

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Nov 4, 2019

I also dislike short names (I spend more time reading logs of command-line invocations than writing them, so for me clearer names are better).

That's of course not the point, the point is to have readable logs. Compactness is also part of readability. In particular having -cmi FILE.cmi feels totally egregious and redundant to me.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Nov 4, 2019

I think we should not duplicate the extension in the command line flag since it is redundant. What about -If (as in "include file") ?

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Nov 4, 2019

I'd rather keep these uppercase letters for dirs. What about -inc ?

@nojb nojb closed this Jan 8, 2020
@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 29, 2020

Remark: I believe that the two last models of @dbuenzli (concat (dirname f) p = f) and @nojb (basename p = p && concat (dirname f) = f) are equivalent: if f is a file path, then f = concat (dirname f) (basename f), so concat (dirname f) p = f implies p = basename f and thus basename p = basename (basename f) = basename f = p.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Jul 29, 2020

Remark: I believe that the two last models of @dbuenzli (concat (dirname f) p = f) and @nojb (basename p = p && concat (dirname f) = f) are equivalent: if f is a file path, then f = concat (dirname f) (basename f), so concat (dirname f) p = f implies p = basename f and thus basename p = basename (basename f) = basename f = p.

Indeed! Thanks for pointing it out. This means that the current implementation fits @dbuenzli's (second) proposed semantics.

@gasche
Copy link
Copy Markdown
Member

gasche commented Aug 21, 2020

I realize that my attempt at thinking about the problem and giving an opinion may have held back work on this PR, which is counterproductive as I think it is a very nice feature to have.

Let me summarize my understanding, and make it clear that I would welcome moving forward:

  • My preference would be to restrict the feature to the semantics that are nice and have no known issue: -inc is accepted only for .cmi files (currently) and used only during compilation-unit-name lookup when checking source files or -open, etc.
  • The alternative proposal to add a generic mecanism to add a single file to the load path has weird lookup-semantics discontinuities (it handles basename queries different from path queries, or it behaves unexpectedly in some corner case). This approach does not have my preference, but I trust @nojb and @dbuenzli to make decisions on this anyway, and I think it would be fine to move forward if that's the approach they want.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 1, 2020

Thanks @gasche for the comments. I am coming back from the holidays and I expect to be too busy to work on this for the next week or two, but will take it up again after that.

@damiendoligez
Copy link
Copy Markdown
Member

As a way of pinging this PR, let me give my opinion: the latest proposal by @dbuenzli and @nojb is the one I find the easiest to understand. And it doesn't have "weird lookup-semantics discontinuities", the distinction between basename queries and path queries is only an optimization, not part of the semantics, and probably only worth documenting as a side note.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 12, 2021

That's great, thanks for the feedback, let's go for that!

@gasche
Copy link
Copy Markdown
Member

gasche commented May 14, 2021

@nojb would you be available to revive this PR? It seems that we have reached a weak consensus on the semantics, so the remaining bits should be the easy ones! (famous last words, etc.)

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented May 14, 2021

@nojb would you be available to revive this PR? It seems that we have reached a weak consensus on the semantics, so the remaining bits should be the easy ones! (famous last words, etc.)

Yes, I haven't forgotten about it... I was meaning to get back on it. I will try to do so soon.

@rgrinberg
Copy link
Copy Markdown
Member

@nojb does it make sense to add a toplevel directive to include a single .cmi? The equivalent of #directory I suppose.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 16, 2022

I discussed this PR with @Octachron a few weeks ago. We're all very busy but we think that it is a nice improvement, and would like to progress towards upstreaming.

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Sep 25, 2022

I discussed this PR with @Octachron a few weeks ago. We're all very busy but we think that it is a nice improvement, and would like to progress towards upstreaming.

Thanks for the ping, and sorry for the lag. This has thoroughly bitrotted, but I will rebase it when I get a chance, hopefully soon.

@gasche
Copy link
Copy Markdown
Member

gasche commented Dec 16, 2022

(New gentle ping, even though there is absolutely no pressure.)

@nojb
Copy link
Copy Markdown
Contributor Author

nojb commented Oct 5, 2024

I think this PR has gone beyond its shelf life; unfortunately I don't seem to be able to find the time to get back to it. Closing!

@nojb nojb closed this Oct 5, 2024
@gasche gasche self-assigned this Oct 5, 2024
@gasche gasche reopened this Oct 5, 2024
@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 5, 2024

I'll try to see if I can push this to completion myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.