Improve: format invalid files#1026
Improve: format invalid files#1026gpetiot merged 76 commits intoocaml-ppx:masterfrom gpetiot:merlin-parser
Conversation
|
The ocaml-ci target |
emillon
left a comment
There was a problem hiding this comment.
Thanks, that's starting to look good! If we're going to maintain this library here (I think we should), a subsequent step should be making sure it feels home in this project, including making the library private and probably under the same project as ocamlformat (while remaining under a different directory for maintenance and legal purposes). This would block a release (by making dune-release try to release both I think) so this should probably be done in this PR.
lib/Translation_unit.mli
Outdated
| { init_cmts: debug:bool -> Source.t -> 'a -> Cmt.t list -> Cmts.t | ||
| ; fmt: debug:bool -> Source.t -> Cmts.t -> Conf.t -> 'a -> Fmt.t | ||
| ; parse: Lexing.lexbuf -> 'a | ||
| ; make_parsable: string -> string |
There was a problem hiding this comment.
make_parsable is a bit of a strange name (and probably a frenchism). What do you think about recover? The type of the record field can also be (string -> string) option to indicate that there's a nice default implementation available (the identity function)
There was a problem hiding this comment.
I agree on renaming the function, but I don't see what having an option type here will achieve, it will force us to test both option values even though we know the appropriate recover function is always provided
There was a problem hiding this comment.
That was an API design comment. string -> string is great and it stay that way. Let me explain what I meant:
Suppose you're a user of that API and you want to create a value of type 'a t. You have to provide a make_parsable function of type string -> string for your language right? Suppose you do not care about that and want to not use that feature. What should you pass? fun x -> x? fun _ -> ""? fun _ -> assert false? Difficult to tell without looking at the other uses! Asking for a (string -> string) option makes that a bit more obvious - you can pass None. It clarifies the fact that there's a natural default value. Of course, you need to add a Option.value ~default:Fn.id where you use that function. (it's also possible to keep storing string -> string but do this default value thing in a constructor)
Julow
left a comment
There was a problem hiding this comment.
I don't really understand the CI error. I submitted an issue ocurrent/ocaml-ci#120
lib/Translation_unit.ml
Outdated
| in | ||
| match parse xunit.parse conf ~source:fmted with | ||
| let fmted_parsable = | ||
| if opts.Conf.partial then xunit.make_parsable fmted else fmted |
There was a problem hiding this comment.
Shouldn't we do that only if the parse failed ?
There was a problem hiding this comment.
Also, this seems to ignore parsing errors. Should we show them ? (eg. as warnings)
There was a problem hiding this comment.
If this option is set, I think the parsing errors should be ignored.
The code is written this way to avoid having to duplicate too much code and have a minimal impact on the current implementation. (as this PR is already hard enough to review)
There was a problem hiding this comment.
Why would this make duplicated code or a big impact ?
Showing all the parsing errors can be unhelpful indeed. What about showing a single warning in this case ? (eg. "This file contains parsing errors and is not fully formatted")
If you are concerned about the emacs plugin opening buffers, note that we can modify it.
There was a problem hiding this comment.
I think we can assume that errors are already reported by the editor.
There was a problem hiding this comment.
Not necessarily, OCamlformat must parse the entire file so it's sensitive to all parsing errors that the editor won't see. OCamlformat may be called more often than the compiler.
Also, the editor will most probably always pass this option. The user may be confused by the "Applied OCamlformat" message but nothing changed.
There was a problem hiding this comment.
I applied the recovery only when the parsing failed and a warning is emitted (maybe more than once, should be fixed in another PR)
| in | ||
| let mapper = make_mapper () in | ||
| ignore ((m mapper) ast); | ||
| normalize_locs !loc_list |
There was a problem hiding this comment.
This function is collecting the locs of invalid parts of the code.
But we have an ast here, why not use it instead of going back to a string in the next function ?
The Ast_mapper and Stack tricks used here can be used to map "structure item that contains an annot, deeply" to "invalid ast ext node" directly.
There was a problem hiding this comment.
I thought to have a string output was a simpler interface for ocamlformat, as it is what is processed in Translation_unit, and returning the ast types is tricky as we have an OMP layer which is not the same as the one of ocamlformat (Migrate_ast is duplicated)
There was a problem hiding this comment.
Having two different AST/parser seems like something that could cause bugs. How hard to implement and what are the drawback to use a single implementation ?
Having control on the parser's code may solve a few issues too (eg. to modify it or to avoid changes in behavior between OMP's versions, which could cause formatting changes between the same version of OCamlformat)
There was a problem hiding this comment.
It's necessary if we want this parser (pwyc) to be independent from ocamlformat, which is the what we agreed upon. Maybe the Migrate_ast could be factorized in the future, but I don't think it should be done in this PR.
There was a problem hiding this comment.
My question was about using the AST and parser from pwyc in OCamlformat, dropping OMP. The dependency stays the same.
My question is, what are the drawbacks of doing that ?
| Format.fprintf Format.err_formatter | ||
| "Warning: %s is invalid, recovering.\n%!" input_name ; | ||
| Ok t_new ) | ||
| else internal_error (`Cannot_parse exn) (exn_args ()) |
There was a problem hiding this comment.
This code is also in parse_result below. Let's share it.
|
|
||
| let parse_result xunit conf (opts : Conf.opts) ~source ~input_name = | ||
| match parse xunit.parse conf ~source with | ||
| | exception exn -> |
There was a problem hiding this comment.
Looking at format above, parse could raise several kinds of exceptions:
| exception Sys_error msg -> Error (User_error msg)
| exception Warning50 l -> internal_error (`Warning50 l) (exn_args ())
| exception exn -> internal_error (`Cannot_parse exn) (exn_args ())
The corresponding errors are no longer reported correctly. I'm not sure it makes sense to apply recovering when the first two (Sys_error, Warning50) happen.
|
I'd like to explain the dual parser design a bit. The idea is generally what's described in #906. One important aspect is that it leave the secondary parser out of the happy path. This ensures that:
|
emillon
left a comment
There was a problem hiding this comment.
I think this is good to go. There are probably tons of ways we can improve this but as long as this works in is properly isolated this can happen on master.
|
HUGE thanks to @gpetiot who's been tirelessly working on this feature for several months! 🎉 |
Thanks! I will merge it now, but everyone can of course improve it in subsequent PRs |
|
This is awesome ! Congrats I feel like this has been merged too fast, however. I think the error handling may be broken and the code in |
CHANGES: #### New features + Add an option `--format-invalid-files` to print unparsable parts of the input as verbatim text. This feature is still experimental. (ocaml-ppx/ocamlformat#1026) (Guillaume Petiot) + Support multi-indices extended indexing operators (ocaml-ppx/ocamlformat#1279, ocaml-ppx/ocamlformat#1277) (Jules Aguillon, Guillaume Petiot) This feature has been added in OCaml 4.10.0 + Handle OCaml 4.10.0 AST (ocaml-ppx/ocamlformat#1276) (Guillaume Petiot) + Preserve functor syntax for consistency (ocaml-ppx/ocamlformat#1312) (Guillaume Petiot) Previously both functor syntax: `module M = functor (K : S) -> struct end` and `module M (K : S) = struct end` would be formatted as the latter, the original syntax is now preserved. #### Changes + Add the option `doc-comments-val=before|after` (ocaml-ppx/ocamlformat#1012) (Jules Aguillon) This option set the placement of documentation comment on `val` and `external` only. It is set to `after` by default. + The default for `doc-comments` is changed from `after` to `before` (ocaml-ppx/ocamlformat#1012, ocaml-ppx/ocamlformat#1325) (Jules Aguillon) This affects both `conventional` (default) and `ocamlformat` profiles. + Some options are now deprecated: * `doc-comments` (ocaml-ppx/ocamlformat#1293, ocaml-ppx/ocamlformat#1012) This option depends on a flawed heuristic. It is replaced by `doc-comments-val` for `val` and `external` declarations. There is no equivalent to this option in the general case. * `escape-chars`, `escape-strings` and `extension-sugar` (ocaml-ppx/ocamlformat#1293) These options are rarely used and their default behavior is considered to be the right behavior. + Add space between `row_field` attributes and the label or arguments, to be consistent with the non-polymorphic case. (ocaml-ppx/ocamlformat#1299) (Craig Ferguson) #### Bug fixes + Fix missing parentheses around `let open` (ocaml-ppx/ocamlformat#1229) (Jules Aguillon) eg. `M.f (M.(x) [@attr])` would be formatted to `M.f M.(x) [@attr]`, which would crash OCamlformat + Remove unecessary parentheses with attributes in some structure items: * extensions and eval items (ocaml-ppx/ocamlformat#1230) (Jules Aguillon) eg. the expression `[%ext (() [@attr])]` or the structure item `(() [@attr]) ;;` * `let _ = ...` constructs (ocaml-ppx/ocamlformat#1244) (Etienne Millon) + Fix some bugs related to comments: * after a function on the rhs of an infix (ocaml-ppx/ocamlformat#1231) (Jules Aguillon) eg. the comment in `(x >>= fun y -> y (* A *))` would be dropped * in module unpack (ocaml-ppx/ocamlformat#1309) (Jules Aguillon) eg. in the module expression `module M = (val x : S (* A *))` + Fix formatting of empty signature payload `[%a:]` (ocaml-ppx/ocamlformat#1236) (Etienne Millon) + Fix parenthesizing when accessing field of construct application (ocaml-ppx/ocamlformat#1247) (Guillaume Petiot) + Fix formatting of attributes on object overrides `{< >}` (ocaml-ppx/ocamlformat#1238) (Etienne Millon) + Fix attributes on coercion (ocaml-ppx/ocamlformat#1239) (Etienne Millon) + Fix formatting of attributes on packed modules (ocaml-ppx/ocamlformat#1243) (Etienne Millon) + Fix parens around binop operations with attributes (ocaml-ppx/ocamlformat#1252, ocaml-ppx/ocamlformat#1306) (Guillaume Petiot, Craig Ferguson) + Remove unecessary parentheses in the argument of indexing operators (ocaml-ppx/ocamlformat#1280) (Jules Aguillon) + Retain attributes on various AST nodes: * field set expressions, e.g. `(a.x <- b) [@A]` (ocaml-ppx/ocamlformat#1284) (Craig Ferguson) * instance variable set expressions, e.g. `(a <- b) [@A]` (ocaml-ppx/ocamlformat#1288) (Craig Ferguson) * indexing operators, e.g. `(a.(b)) [@A]` (ocaml-ppx/ocamlformat#1300) (Craig Ferguson) * sequences, e.g. `(a; b) [@A]` (ocaml-ppx/ocamlformat#1291) (Craig Ferguson) + Avoid unnecessary spacing after object types inside records and polymorphic variants, e.g. `{foo : < .. > [@A]}` and `{ foo : < .. > }` (ocaml-ppx/ocamlformat#1296) (Craig Ferguson) + Fix missing parentheses around tuples with attributes. (ocaml-ppx/ocamlformat#1301) (Craig Ferguson) Previously, `f ((0, 0) [@A])` would be formatted to `f (0, 0) [@A]`, crashing OCamlformat. + Avoid emitting `>]` when an object type is contained in an extension point or attribute payload (ocaml-ppx/ocamlformat#1298) (Craig Ferguson) + Fix crash on the expression `(0).*(0)` (ocaml-ppx/ocamlformat#1304) (Jules Aguillon) It was formatting to `0.*(0)` which parses as an other expression. + Preserve empty doc-comments syntax. (ocaml-ppx/ocamlformat#1311) (Guillaume Petiot) Previously `(**)` would be formatted to `(***)`. + Do not crash when a comment contains just a newline (ocaml-ppx/ocamlformat#1290) (Etienne Millon) + Handle lazy patterns as arguments to `class` (ocaml-ppx/ocamlformat#1289) (Etienne Millon) + Preserve cinaps comments containing unparsable code (ocaml-ppx/ocamlformat#1303) (Jules Aguillon) Previously, OCamlformat would fallback to the "wrapping" logic, making the comment unreadable and crashing in some cases. + Fix normalization of attributes, fixing the docstrings in attributes (ocaml-ppx/ocamlformat#1314) (Guillaume Petiot) + Add missing parentheses around OR-patterns with attributes (ocaml-ppx/ocamlformat#1317) (Guillaume Petiot) + Fix spacing inside parens for symbols when the spacing was handled by the englobing exp (ocaml-ppx/ocamlformat#1316) (Guillaume Petiot) + Fix invalid (unparsable) docstrings (ocaml-ppx/ocamlformat#1315) (Guillaume Petiot) When parsing a comment raises an error in odoc, it is printed as-is. + Fix parenthesizing of optional arguments rebound to non-variables, e.g. `let f ?a:(A) = ()` rather than the unparsable `let f ?a:A = ()` (ocaml-ppx/ocamlformat#1305) (Craig Ferguson)
Fix #906