In error messages, print the source fragment responsible for the error#2096
In error messages, print the source fragment responsible for the error#2096gasche merged 9 commits intoocaml:trunkfrom
Conversation
| (string * int) list | ||
| = | ||
| (* Converts a global position to one that is relative to the lexing buffer *) | ||
| let rel n = n - lb.lex_abs_pos in |
There was a problem hiding this comment.
Hum, I suspect this is in general incorrect, because I'm not taking lb.lex_start_pos into account?
There was a problem hiding this comment.
If I read the code of lexing.ml correctly, I believe it should indeed be something like: n - lb.lex_abs_pos + lb.lex_start_pos. Does the current code work?
BTW, I find looking directly into the lexing buffer internals a bit tedious. Personally, I would just keep the whole input as a single string and extract the source code from that string, that would make the code simpler.
There was a problem hiding this comment.
If I read the code of lexing.ml correctly, I believe it should indeed be something like: n - lb.lex_abs_pos + lb.lex_start_pos.
Yeah, that's what I suspected after reading lexing.ml a bit more closely
Does the current code work?
..yes :) [on the testsuite]
BTW, I find looking directly into the lexing buffer internals a bit tedious.
Agreed.
I would just keep the whole input as a single string and extract the source code from that string
Hum, which part of the code would take care of that? (note that the lexbuf business is only really needed for the toplevel, otherwise we can just go read the source file)
There was a problem hiding this comment.
The code that processes an input file. But yh, I agree that for plain files it's not an issue. For the toplevel, the code would need to be adapted. Without changing everything, we could keep all the input read since the start of the current loop iteration in a Buffer.t.
There was a problem hiding this comment.
@diml I just tried your suggestion, and it breaks the function (either stops quoting the source, or quotes wrong parts of it) :D. So I guess the current implementation is correct?
There was a problem hiding this comment.
I guess so. Thanks for trying, at least we know the current one is more likely to be right :)
|
This is superb! I've wanted this the whole time I've used OCaml! At the risk of asking to gild the lily, might I suggest also moving towards a more standard error location format as well? The FSF coding standards, which are followed by gcc and llvm and many other compilers, specify that line numbers and columns be printed as or alternatively as: See: https://www.gnu.org/prep/standards/html_node/Errors.html Doing it that way makes it easier for tools to parse the mentioned locations. That said, if this notion is controversial, please ignore it. I don't want to slow down getting error context printed for the user. |
|
I think this is orthogonal to this PR (it is not touching the |
|
@Armael Fair enough. But it should be done at some point, and given that so many tests suite error messages are going to have to get changed anyway to deal with the changed output, it would be good to do it all at about the same time. |
gasche
left a comment
There was a problem hiding this comment.
This looks very nice, thanks! Some comments from a first read.
parsing/location.ml
Outdated
| let highlight_quote ppf | ||
| ~(get_lines: | ||
| start_pos:position -> end_pos:position -> | ||
| (string (* line *) * int (* beginning of line *)) list) |
There was a problem hiding this comment.
Why not use a record type instead of comments in the type?
There was a problem hiding this comment.
Ehh, why not, but that sounds a bit heavywheight.
There was a problem hiding this comment.
If you feel the need to indicate what the values are for, I think that record or object fields are in order.
parsing/location.ml
Outdated
| let lines = | ||
| get_lines ~start_pos:leftmost ~end_pos:rightmost | ||
| |> List.map (fun (line, s) -> | ||
| line, try_get_line_number s (s + String.length line), s) |
There was a problem hiding this comment.
We don't expect this part of the compiler to be performance-critical, but I'm still a bit uneasy with the quadratic runtime of this code. Isn't there a linear or nlogn version? (It feels like you are only keeping a set around.)
parsing/location.ml
Outdated
| | Some line_nb -> max acc (String.length (string_of_int line_nb)) | ||
| | None -> acc | ||
| ) 0 lines | ||
| in |
There was a problem hiding this comment.
If you could separate the display code to go in utils/Misc, that could maybe be reused more easily later? I think what you are really implementing is two-column printing with a separator, with a parametrized justification function. I guess that Arthur would have reused that.
There was a problem hiding this comment.
(Besides this whole highlight_quote function breaks the "don't mix different abstraction levels" rule for good code. We shouldn't get from list manipulations of location sets to low-level padding code without a transition, we should have auxiliary functions for the logic of each domain.)
parsing/location.ml
Outdated
| | [(line, line_nb_opt, line_start_cnum)] -> | ||
| (* Single-line error *) | ||
| Format.fprintf ppf "%a%s@," (prefix ~sep:true) line_nb_opt line; | ||
| Format.fprintf ppf "%a" (prefix ~sep:false) None; |
There was a problem hiding this comment.
What's the benefit of the ~sep:true ... ~sep:false protocol over just having a command in-between to print the separator?
parsing/location.ml
Outdated
| highlighted := false; | ||
| Format.fprintf ppf "@}" | ||
| end; | ||
| if is_start pos then begin |
There was a problem hiding this comment.
I'm again uneasy with the linear lookup here, giving quadratic behavior. We could have a set of start positions.
parsing/location.ml
Outdated
| let ellipsed_first, ellipsed_last = | ||
| let lines_nb = List.length lines in | ||
| if lines_nb > max_lines then | ||
| let printed_lines = max_lines - 1 in (* ellipsis uses one line *) |
| (string * int) list | ||
| = | ||
| (* Converts a global position to one that is relative to the lexing buffer *) | ||
| let rel n = n - lb.lex_abs_pos in |
|
Thanks for the quick first review @gasche . I agree that the code grew into an untasteful mess in some places, I'll work on cleaning that up today. |
|
@gasche I just pushed a commit that moves the code for printing two columns into I'm not completely satisfied with the second part, though. The new code works by defining a structure for a "bag of start/end positions", created from a list of locations, and by providing cursors that allow traversing this bag in O(l) where l is the number of locs that have been put in the bag. But even ignoring the complexity, I agree the previous code was ugly and had to be rewritten... |
5baed97 to
068216c
Compare
parsing/location.ml
Outdated
| List.iter (function | ||
| | `Start, _ -> Format.fprintf ppf "@{<%s>" highlight_tag | ||
| | `End, _ -> Format.fprintf ppf "@}" | ||
| ) ps; |
There was a problem hiding this comment.
Does that really work correctly when there are multiple overlapping locations positions? I might, but it looks a bit subtle.
|
I agree that the cursor approach is a bit weird. From the change to the Bag structure, I like the impact on the Location code (nice factorization), not so much the cursor API, and the implementation indeed looks complex. Also, I don't want to waste your time over-designing an efficient solution when there is no strong need for now. So I would suggest a compromise:
My suggestion would be : imagine that you have an efficient implementation of an interval tree. The natural API would be to ask for leftmost and rightmost interval, and also whether at least one interval in the bag overlaps with a given absolute position (and maybe to request the list of overlapping intervals). Then implementing this with lists as you suggest could be ok. (Another approach would be to consider that we don't need to represent overlapping intervals in the structure, as they can always be fused at structure-construction time. Then a Map indexed on the starting position would be enough. But I don't remember if that is enough for your needs, and I'm not sure that trying to be both clever and specialized at the same time is a good idea.) |
|
In principle I approve of the change; we need to converge on a version the code in location.ml that @Armael and myself feel good about. |
068216c to
2919111
Compare
|
Take 3. (the first commit is from #2103, I'll rebase one it is merged) There is one issue remaining: check-typo (rightfully) complains about the tests because some of them now use more than 80 columns. Indeed, if an error happens in a 80 col line of code, the error message will contain |
gasche
left a comment
There was a problem hiding this comment.
I still have some minor comments, but overall this looks good, thanks! The other PR has been merged so you can rebase now.
For line limits, check-typo distinguishes "long lines" (>80) and "very long lines" alarms (>130), so we could conceivably say that long lines that are not very long are ok, or add an "overflowing_line" category at 85. None of this is very hard, but correctly adding this alarm-marker to all testsuite files is tricky for now, before #1910 is merged (and I'm the one that has to turn the crank there).
Another approach would be to print lines that are two long differently:
1 | foo bar
2 |
very long foo bar
3 | foo baz
parsing/location.ml
Outdated
| val of_intervals : (('a * int) * ('a * int)) list -> 'a t | ||
|
|
||
| val mem : 'a t -> pos:int -> bool | ||
| val find_in : 'a t -> s:int -> e:int -> 'a option |
There was a problem hiding this comment.
At first I couldn't tell what s and e were for here. Maybe just int * int, or more explicit label names.
The specification, if I understand the implementation correctly, is: return the value associated to one of the interval bounds that lies within the (s, e) parameters. Maybe find_bound_in?
There was a problem hiding this comment.
Agreed, s and e are just bad names.
More explicit label names are indeed required (and I'd be in favor adding a label even if the function is going to take a pair of ints).
There was a problem hiding this comment.
Should be better now (val find_bound_in : 'a t -> range:(int * int) -> 'a bound option).
Changes
Outdated
| ### Compiler user-interface and warnings: | ||
|
|
||
| - PR#????: Add source highlighting for errors & warnings in batch mode | ||
| (Armaël Guéneau, review by ?) |
parsing/location.ml
Outdated
| end | ||
| = | ||
| struct | ||
| type 'a boundary = 'a * int |
There was a problem hiding this comment.
In fact you could expose this type and have find_boundary_in return it.
parsing/location.ml
Outdated
| overlap). | ||
| *) | ||
|
|
||
| module IMapSet : sig |
parsing/location.ml
Outdated
| | `E, `Inside (s, n) -> `Inside (s, n-1), acc | ||
| ) (`Outside, []) pos in | ||
| if nesting <> `Outside then | ||
| raise (Invalid_argument "ISet.of_intervals: invalid intervals"); |
There was a problem hiding this comment.
When are intervals invalid in this sense?
Should the exception name be changed?
There was a problem hiding this comment.
If I read the implementation correctly, I don't see how this exception can ever be raised given that the bounds only inserted when the S is before the E. (So we have never read more Es than Ss, and we have always read exactly at much at the end). So it could be an assert (nesting = Outside)`.
There was a problem hiding this comment.
Indeed, the exceptions were actually impossible cases. I turned them into assert false.
| implementation if needed. | ||
|
|
||
| Note: the structure only stores maximal intervals (that therefore do not | ||
| overlap). |
There was a problem hiding this comment.
Out of curiosity, does the "maximal intervals only" aspect matter much? (It adds some complexity in of_intervals, and if I understand correctly it means that some of the user-provided bound annotations are lost.)
There was a problem hiding this comment.
Right. I think it simplifies the logic for inserting @{<..> and @} later, which will get trickier if it has to handled possibly overlapping intervals (and I think the previous implementation which tried to do it was incorrect).
So I think it's better to merge intervals directly in the interval set structure, additionally that's what an actual interval set implementation would do I believe.
parsing/location.ml
Outdated
| let highlight_quote ppf | ||
| ~(get_lines: | ||
| start_pos:position -> end_pos:position -> | ||
| (string (* line *) * int (* beginning of line *)) list) |
There was a problem hiding this comment.
If you feel the need to indicate what the values are for, I think that record or object fields are in order.
parsing/location.ml
Outdated
| Format.fprintf ppf "%s | %s@," line_nb line; | ||
| Format.fprintf ppf "%*s " (String.length line_nb) ""; | ||
| for pos = line_start_cnum to rightmost.pos_cnum - 1 do | ||
| (* NB: The two cases below are mutually exclusive *) |
There was a problem hiding this comment.
There are four cases below, which two do we mean? Maybe use else if if the tests are exclusive?
There was a problem hiding this comment.
Indeed, that wasn't clear. I used an else if as you suggest.
parsing/location.ml
Outdated
| It first tries to read from [!input_lexbuf], then if that fails (because the | ||
| lexbuf no longers contains the input we want), it reads from [!input_name] | ||
| directly *) | ||
| let get_lines_from_current_input ~start_pos ~end_pos = |
There was a problem hiding this comment.
lines_around_from_lexbuf, lines_around_from_file, lines_around_from_current_input?
2919111 to
570d84a
Compare
|
I'm getting fairly satisfied with the implementation, but here is an UI question: suppose a user strongly dislikes the new error formatting (maybe the extra lines break their workflow in some way), is there an easy way for them to disable it? |
|
(If there is, maybe it could be mentioned in the Changes entry.) |
|
I think I addressed all of your comments on the code! I still have to look at the long lines issue. I'm not so fond of the idea of printing long lines differently, it does not look too good, and it only makes a difference for lines between 76 (or so) and 80 characters long... It does not look too hard to set the long-line attribute on all testsuite files, maybe I could do that? |
There isn't, currently. It's not hard to implement, but we have to think carefully if 1) we need it (adding alternative codepaths has a maintenance cost) 2) how to pass this option (as an environment variable? a commandline flag?) |
It seems reasonable to me, but I think it would be better to wait for #1910 to be merged (hopefully today or tomorrow) to do it.
I think that an environment variable could be nice, but on the other hand the complexity of adding the configuration-plumbing makes me as uneasy as you. Maybe we could merge as-is and see what is the feedback in trunk testing and beta versions. |
570d84a to
80521ed
Compare
|
I imagine that people who compile from inside an editor or IDE probably don't want this output, since the editor is going to highlight the code in your buffer anyway. So an option to disable it would probably be appreciated. |
|
Right, although note that anyone using merlin (directly or through LSP) should not be affected (merlin will disable this part of the message directly through the compiler-libs api). |
|
My feeling is that Leo is speaking the voice of reason on this one. You could start with just an OCAML_QUOTE_SOURCE environment variable, or if you want the full command-line-parameter dance just look at how |
|
FYI, I don't think an option to disable such messages exists in other compilers I use, and almost all of them produce output like this now. |
|
While my first gut feeling was @lpw25's one I'm not entirely sure --- but would need to live with it to definitively answer the question. Here are a few points to ponder:
|
@gasche Just to make things a bit clearer. The way usually these kind of cli API is implemented/expected (and supported by e.g. Your program has a setting which has a default value and whose value can be changed on the cli via optional(s) argument(s). If the setting has a corresponding environment variable, setting the environment variable only changes the default of the setting and that's all (i.e. a cli specification always "wins").
|
gasche
left a comment
There was a problem hiding this comment.
I made two minor code comments but in general I'm fairly happy with the shape of the code now. I'm marking this Approved to show progress, although of course we should wait for the error-style naming suggestions to converge before merging.
|
I would also like to add that in this particular example build systems should always only go for the default. This allows end-users (or |
f2f3ab8 to
dd5b131
Compare
dd5b131 to
6b16bcc
Compare
|
It is my understanding that all issues have been resolved. Unless I hear otherwise, I plan to merge if CI passes. (Thanks @dbuenzli for the naming help!) |
|
Many thanks to everyone involved! |
|
Commit ef76e6e has just been pushed to |
|
What do you suggest? |
|
Armaël Guéneau (2019/02/26 08:26 -0800):
@trefis tells me on IRC that we should just make sure ocamltest explicitly sets the `OCAML_ERROR_STYLE` environment variable to the default.
This sounds fairly easy (and a job for @shindere :-)).
What a wonderful way for you @Armael to start diving into the wonderful
code of ocamltest!!! ;-)
|
|
I'd like to point out that I didn't suggest that part:
What I will suggest however is making sure that other variables which can affect the testsuite are handled the same way, e.g. |
|
@mshinwell I'll try to look at that tomorrow and implement @trefis suggestions (have ocamltest explicitly unset the environment variables that can modify the compiler output). |
|
Thomas Refis (2019/02/26 08:42 -0800):
I'd like to point out that I didn't suggest that part:
> This sounds fairly easy (and a job for @shindere :-)).
Too late, I'm already very angry at you @trefis ;-)
What I will suggest however is making sure that other variables which
can affect the testsuite are handled the same way, e.g.
`OCAMLRUNPARAM`.
That one should be handled correctly.
|
|
See #2271 . |
* now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
(please see sedlex branch for previous version of this repo) * now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
(please see sedlex branch for previous version of this repo) * now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
(please see sedlex branch for previous version of this repo) * now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
(please see sedlex branch for previous version of this repo) * now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
(please see sedlex branch for previous version of this repo) * now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
(please see sedlex branch for previous version of this repo) * now using ocamllex instead of sedlex (for simplicity) * reusable code now distributed as an opam package * use dune's latest features to vastly simplify built and config * use OCaml 4.08's beautiful error messages with quoted error location (ocaml/ocaml#2096) * travis CI
This PR generalizes to the standard compiler output the "source highlighting" mechanism that is already present in case of error in a "dumb" toplevel and in the testsuite expect tests.
Examples:


The motivation is to improve error readability: it should now be clearer what the mistake is when looking at the compiler output. AFAIK elm, rust or bucklescript implement similar error rendering, and were inspiration for this work.
The PR reuses the existing code for quoting source code, making a few improvements:
The code for the highlighting function got a bit more complicated, because we now have to be able to read the source to quote from the input file. The current function reads directly from the lexbuf and only works if it hasn't been refilled in the meantime -- which is reasonable for the toplevel but not so much for a .ml file. There are also a lot of changes in the testsuite, that come in a separate commit.
On the plus side, one consequence of these changes is the merge of three different codepaths for error printing into one, which makes me very happy. Specifically, "batch mode", "dumb toplevel" and "expect test" now use exactly the same code for printing errors.