Skip to content

Fix a 5.2 source-quotation-in-error-messages regression for dune+ppxlib#12991

Closed
gasche wants to merge 1 commit intoocaml:trunkfrom
gasche:dune-ppxlib-quoting-source-file
Closed

Fix a 5.2 source-quotation-in-error-messages regression for dune+ppxlib#12991
gasche wants to merge 1 commit intoocaml:trunkfrom
gasche:dune-ppxlib-quoting-source-file

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Feb 25, 2024

#12403 simplified the compiler logic to quote source file fragments, by keeping the whole input file in memory instead of reopening the input file when the error lies outside the lexing buffer. This fixed issue with source files that contain preprocessor directives (typically, they are the output of a source-to-source preprocessor), see #12238.

This simplification is a regression for build systems like Dune that perform a source-to-AST preprocessing step, then call the compiler with the binary AST as input: in this case there is no source input file in memory when printing errors, so source quotations are gone. This was reported by @anmonteiro in #12403 (comment) .

The present PR restores the fallback logic that we had in 5.1 and earlier to apply in this case: when there is nothing in the lexing buffer, try to reopen the input source file. (I suspect that this reintroduces the possibility of showing incoherent source fragments when the input-file information is wrong, but oh well.)

@anmonteiro tested the patch and confirms that it fixes the dune+ppxlib issue.

We recently simplified the compiler logic to quote source file
fragments, by keeping the whole input file in memory instead of
reopening the input file when the error lies outside the lexing
buffer. This fixes issue with source files that contain preprocessor
directives (typically, they are the output of a source-to-source
preprocessor).

This simplification is a regression for build systems like Dune that
perform a source-to-AST preprocessing step, then call the compiler
with the binary AST as input: in this case there is no source input file in
memory when printing errors, so source quotations are gone.

The present commit restores the fallback logic that we had in 5.1 and
earlier to apply in this case, when given AST as input.
@gasche gasche added this to the 5.2 milestone Feb 25, 2024
@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 25, 2024

I decided to write a test for this case, and I realized something confusing: there appears to already be a test for this case, and it passes (on trunk):

https://github.com/ocaml/ocaml/blob/b2cd7e59e5b563b6a0527014689b7224bb678661/testsuite/tests/tool-ocamlc-locations/marshalled.ml

This test takes an input file with a type error, foo.ml, generates a binary AST file foo.marshalled.ml (using the code in the marshalled.ml file linked above to do it, by calling Pparse.write_ast), and then calls ocamlc on foo.marshalled.ml, and check that the typing error message quotes the source as intended : https://github.com/ocaml/ocaml/blob/b2cd7e59e5b563b6a0527014689b7224bb678661/testsuite/tests/tool-ocamlc-locations/marshalled.compilers.reference

Right now I am a bit confused because I don't understand why this test passes on trunk, and what the difference is with the ppxlib usecase.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 25, 2024

The reason why the test succeeds in trunk (without the PR) is that there is already logic in the Pparse module to read the original source of binary ASTs for error-message purposes:

ocaml/driver/pparse.ml

Lines 186 to 199 in b2cd7e5

if is_ast_file then begin
let ast =
Fun.protect ~finally:close_ic @@ fun () ->
Location.input_name := (input_value ic : string);
begin match
In_channel.with_open_bin !Location.input_name set_input_lexbuf
with
| (_ : Lexing.lexbuf) -> ()
| exception Sys_error _ -> ()
end;
if !Clflags.unsafe then
Location.prerr_warning (Location.in_file !Location.input_name)
Warnings.Unsafe_array_syntax_without_parsing;
(input_value ic : a)

The call to In_channel.with_open_bin !Location.input_name set_input_lexbuf in the middle is "setting the input lexbuf" with the !Location.input_name value that is provided by the binary AST file (see the input_value call just above).

So when provided an input AST, the compiler immediately tries to read the original source file (as indicated in the binary-AST medata) and put it in the lexing buffer. In particular, when we see an error later on, the lexing buffer is available to quote the source (without the additional fallback logic I included in this PR).

Now something appears to go wrong in the dune+ppxlib usecase, so that this initial set_input_lexbuf logic fails, but then @anmonteiro reports that the fallback logic in this PR succeeds (while it also opens the file Location.input_name). There is a mystery here. Hypotheses include:

  • Location.input_name or the current working directory changed between the Pparse call and when the error message is printed
  • the input file was not yet available at the time of the Pparse call due to build-system concurrency
  • I'm reading the code wrong
  • the report of @anmonteiro is partly wrong, or applies to a different compiler than the one I am looking at

@anmonteiro being a distinguished OCaml frontend hacker by now, is there a chance that you would be motivated to sort out what is happening here?

@anmonteiro
Copy link
Copy Markdown
Contributor

anmonteiro commented Feb 25, 2024

Hey @gasche, I believe the reason is that ppxlib calls Parse.{implementation,interface} directly without going through Pparse: https://github.com/ocaml-ppx/ppxlib/blob/main/src/utils.ml#L100-L103, though it does set Location.input_name, allowing your fallback to work https://github.com/ocaml-ppx/ppxlib/blob/455f21787ad52a32813a34538a4d493f8d3efc64/astlib/location.ml#L3

Copy link
Copy Markdown
Contributor

@anmonteiro anmonteiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and fixes the regression I reported.

The code looks almost the same as what had been in the compiler for a while and then removed in 666de0e

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 25, 2024

The code that you are pointing out is how ppxlib consumes its input. I previously assumed that the regression you reported occurred when the compiler consumes the output of the ppxlib driver. In fact, re-reading your report, the problematic error message is emitted by the ppxlib driver, not the compiler later on, and my fix has an effect because the driver calls the compiler error-printing logic through compiler-libs. Right?

@anmonteiro
Copy link
Copy Markdown
Contributor

anmonteiro commented Feb 25, 2024

the problematic error message is emitted by the ppxlib driver, not the compiler later on, and my fix has an effect because the driver calls the compiler error-printing logic through compiler-libs. Right?

This is correct.

EDIT: and that's why I wrote that I wasn't sure whether this needed to be fixed in ppxlib (perhaps, too, for future-proofing).

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 25, 2024

The ppxlib fix would be to imitate the compiler by parsing the whole source at once and setting Location.input_lexbuf. This would also simplify the ppxlib code by avoiding weird stuff such as https://github.com/ocaml-ppx/ppxlib/blob/455f21787ad52a32813a34538a4d493f8d3efc64/src/utils.ml#L90-L96 .

(Of course ppxlib has a weirdly modified copy of the compiler driver because they have an additional use-case (reading source from standard input) that they never bothered to report upstream to have compiler-libs support it.)

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Feb 26, 2024

The ppxlib fix would be to imitate the compiler by parsing the whole source at once and setting Location.input_lexbuf. This would also simplify the ppxlib code by avoiding weird stuff such as https://github.com/ocaml-ppx/ppxlib/blob/455f21787ad52a32813a34538a4d493f8d3efc64/src/utils.ml#L90-L96 .

cc @NathanReb

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 26, 2024

My current thinking: now that I understand the issue better, I think that changing ppxlib may be better than reintroducing (a small amount of) old complexity in the compiler, because both systems would end up simpler. If for some reason fixing ppxlib does not work out, the present PR is better than doing nothing.

@NathanReb
Copy link
Copy Markdown
Contributor

I'm a bit rusty on these matters. I'll look into the ppxlib side of things and report back to you ASAP!

@NathanReb
Copy link
Copy Markdown
Contributor

NathanReb commented Feb 27, 2024

From what I understand, it seems that properly setting Location.input_lexbuf in ppxlib should fix the issue right?

@anmonteiro do you have a reproduction somewhere that I can work with? I'm a bit surprised as our test suite does rely on error reporting working properly and I didn't see any such failures when working with 5.2 recently.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Feb 27, 2024

From what I understand, it seems that properly setting Location.input_lexbuf in ppxlib should fix the issue right?

Yes, modulo the fact that the compiler now expects input_lexbuf to contain the whole source, and to never have to refill. (If you are going to do this in ppxlib as well, you can get rid of the current logic to read the magic number first and then re-inject this inside the input. But then you need to figure out how to unmarshal the binary AST from the same string in the binary case, which can be done using the Marshal module but is a bit cumbersome -- you have to call Marshal.total_size to know the size of the current value to update the offset to read the next value.)

@NathanReb
Copy link
Copy Markdown
Contributor

@anmonteiro do you have a reproduction somewhere that I can work with? I'm a bit surprised as our test suite does rely on error reporting working properly and I didn't see any such failures when working with 5.2 recently.

Nevermind, I reproduced it and added it to our test suite. Most of our test that relied on reporting located exceptions used OCAML_ERROR_STYLE=short to work across our supported compiler version range.

I'll work on fixing this on ppxlib's trunk-support branch.

@anmonteiro
Copy link
Copy Markdown
Contributor

After reflecting for a bit and considering other use cases, I think I found one that justifies this change:

  • code that emits alerts via Ocaml_common.Location.prerr_alert in PPXes won't have their source quotation correct if the compiler doesn't have this source fallback anymore

@Octachron
Copy link
Copy Markdown
Member

Since the regression is fixed in ppxlib, I am removing this PR from the 5.2 milestone. I am not sure if it still is worth it to discuss this PR in the light of melange?

@Octachron Octachron removed this from the 5.2 milestone Mar 20, 2024
@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 20, 2024

@anmonteiro what do you think?

@anmonteiro
Copy link
Copy Markdown
Contributor

This is not necessary for Melange if it's fixed in ppxlib, apart from the newly broken usecase that all OCaml programs will also face:

from #12991 (comment)

code that emits alerts via Ocaml_common.Location.prerr_alert in PPXes won't have their source quotation correct if the compiler doesn't have this source fallback anymore

So if you're comfortable breaking source quotations for alerts emitted from a PPX (which I don't even know if it's a supported use case, but we're using it in melange), then I'm OK closing it.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Mar 20, 2024

This is not such a big change in the compiler, so I would be tempted to try to accomodate these niche (but not unreasonable) use-cases. @Octachron, what do you think?

@Octachron
Copy link
Copy Markdown
Member

I am not sure I follow why the alert printer would behave differently than the error printer?

@anmonteiro
Copy link
Copy Markdown
Contributor

They’re different printers. In that case the error printer would come from ppxlib (which now has this fix), but the alert printer would come from Ocaml_common.Location which comes from the compiler and doesn’t have the lexbuf set up etc

@Octachron
Copy link
Copy Markdown
Member

They are the same printers? The Astlib module on the ppxlib side is including the compiler's Location module, and updating the shared Location.input_lexbuf.

@NathanReb
Copy link
Copy Markdown
Contributor

Yes I would assume it just works but I can add a test for it and check!

@anmonteiro
Copy link
Copy Markdown
Contributor

They are the same printers? The Astlib module on the ppxlib side is including the compiler's Location module, and updating the shared Location.input_lexbuf.

Got it, thanks. This seems to be a melange-specific issue. I created a failing repro that passes with OCaml + ppxlib. OK to close for me.

@anmonteiro
Copy link
Copy Markdown
Contributor

Actually, revisiting this again, it doesn't seem to be a Melange bug at all. I prepared a reproduction in this repository: https://github.com/anmonteiro/no-source-quotation-repro

If you run the instructions in the README (building a ppx rewriter and passing it to ocamlc as -ppx myppx) an alert will be generated that won't have a source quotation.

If you run it with OCaml 5.1, the source quotation is present.

@anmonteiro
Copy link
Copy Markdown
Contributor

OK, so I changed my mind again: there is a bug, but it's not in OCaml.

Turns out, I believe the ppxlib fix is incomplete. When reading binary AST, we want to set the input lexbuf as well. I submitted ocaml-ppx/ppxlib#483 with an example implementation.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Apr 3, 2024

My understanding -- for now -- is that having the fix on the ppxlib side worked, and that there is no change necessary in the compiler now. So I am closing this. Please let me know if my understanding is wrong and we can reconsider.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Dec 28, 2025

I am hitting an issue today that looks suspiciously close to the issue discussed above.

[gasche@framawork repro2 ]$ cat Foo.ml
let x = 1
[gasche@framawork repro2 ]$ cat Foo.mli
val x : NonExistingModule.t
[gasche@framawork repro2 ]$ cat dune-project 
(lang dune 3.20)
[gasche@framawork repro2 ]$ cat dune
(library
  (name test)
  (preprocess (pps ppx_compare))
)

[gasche@framawork repro2 ]$ opam exec --switch=5.3.0 dune build
File "Foo.mli", line 1, characters 8-27:
1 | val x : NonExistingModule.t
            ^^^^^^^^^^^^^^^^^^^
Error: Unbound module NonExistingModule

[gasche@framawork repro2 ]$ opam exec --switch=5.4.0 dune build
File "_none_", line 1:              
Error: Unbound module NonExistingModule

There seems to be a regression in 5.4 compared to 5.3, the file where the error occurs is not reported anymore. It is necessary for some ppx to be configured for the bug to occur (otherwise the error is located correctly).

This is using ppxlib 0.37.0.

@anmonteiro, @NathanReb, do you have an idea whether this is the same issue or something different? My understanding was that the issue had been fixed in ppxlib, does this turn out to be wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants