Skip to content

Improve: format invalid files#1026

Merged
gpetiot merged 76 commits intoocaml-ppx:masterfrom
gpetiot:merlin-parser
Feb 27, 2020
Merged

Improve: format invalid files#1026
gpetiot merged 76 commits intoocaml-ppx:masterfrom
gpetiot:merlin-parser

Conversation

@gpetiot
Copy link
Copy Markdown
Collaborator

@gpetiot gpetiot commented Sep 24, 2019

Fix #906

@gpetiot gpetiot requested review from Julow and emillon February 19, 2020 14:07
@gpetiot
Copy link
Copy Markdown
Collaborator Author

gpetiot commented Feb 21, 2020

The ocaml-ci target lint weirdly fails, does anyone has the same failure locally?
edit: well the error evolves each time we reload, that's new!

Copy link
Copy Markdown
Collaborator

@emillon emillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's starting to look good! If we're going to maintain this library here (I think we should), a subsequent step should be making sure it feels home in this project, including making the library private and probably under the same project as ocamlformat (while remaining under a different directory for maintenance and legal purposes). This would block a release (by making dune-release try to release both I think) so this should probably be done in this PR.

{ init_cmts: debug:bool -> Source.t -> 'a -> Cmt.t list -> Cmts.t
; fmt: debug:bool -> Source.t -> Cmts.t -> Conf.t -> 'a -> Fmt.t
; parse: Lexing.lexbuf -> 'a
; make_parsable: string -> string
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_parsable is a bit of a strange name (and probably a frenchism). What do you think about recover? The type of the record field can also be (string -> string) option to indicate that there's a nice default implementation available (the identity function)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on renaming the function, but I don't see what having an option type here will achieve, it will force us to test both option values even though we know the appropriate recover function is always provided

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an API design comment. string -> string is great and it stay that way. Let me explain what I meant:

Suppose you're a user of that API and you want to create a value of type 'a t. You have to provide a make_parsable function of type string -> string for your language right? Suppose you do not care about that and want to not use that feature. What should you pass? fun x -> x? fun _ -> ""? fun _ -> assert false? Difficult to tell without looking at the other uses! Asking for a (string -> string) option makes that a bit more obvious - you can pass None. It clarifies the fact that there's a natural default value. Of course, you need to add a Option.value ~default:Fn.id where you use that function. (it's also possible to keep storing string -> string but do this default value thing in a constructor)

Copy link
Copy Markdown
Collaborator

@Julow Julow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the CI error. I submitted an issue ocurrent/ocaml-ci#120

in
match parse xunit.parse conf ~source:fmted with
let fmted_parsable =
if opts.Conf.partial then xunit.make_parsable fmted else fmted
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do that only if the parse failed ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this seems to ignore parsing errors. Should we show them ? (eg. as warnings)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this option is set, I think the parsing errors should be ignored.
The code is written this way to avoid having to duplicate too much code and have a minimal impact on the current implementation. (as this PR is already hard enough to review)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this make duplicated code or a big impact ?

Showing all the parsing errors can be unhelpful indeed. What about showing a single warning in this case ? (eg. "This file contains parsing errors and is not fully formatted")

If you are concerned about the emacs plugin opening buffers, note that we can modify it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assume that errors are already reported by the editor.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, OCamlformat must parse the entire file so it's sensitive to all parsing errors that the editor won't see. OCamlformat may be called more often than the compiler.

Also, the editor will most probably always pass this option. The user may be confused by the "Applied OCamlformat" message but nothing changed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the recovery only when the parsing failed and a warning is emitted (maybe more than once, should be fixed in another PR)

@gpetiot gpetiot changed the title Improve: add an option --partial to ignore unparsable parts of the input Improve: format invalid files Feb 24, 2020
@gpetiot gpetiot requested review from Julow and emillon February 24, 2020 19:21
in
let mapper = make_mapper () in
ignore ((m mapper) ast);
normalize_locs !loc_list
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is collecting the locs of invalid parts of the code.
But we have an ast here, why not use it instead of going back to a string in the next function ?
The Ast_mapper and Stack tricks used here can be used to map "structure item that contains an annot, deeply" to "invalid ast ext node" directly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to have a string output was a simpler interface for ocamlformat, as it is what is processed in Translation_unit, and returning the ast types is tricky as we have an OMP layer which is not the same as the one of ocamlformat (Migrate_ast is duplicated)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two different AST/parser seems like something that could cause bugs. How hard to implement and what are the drawback to use a single implementation ?

Having control on the parser's code may solve a few issues too (eg. to modify it or to avoid changes in behavior between OMP's versions, which could cause formatting changes between the same version of OCamlformat)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary if we want this parser (pwyc) to be independent from ocamlformat, which is the what we agreed upon. Maybe the Migrate_ast could be factorized in the future, but I don't think it should be done in this PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was about using the AST and parser from pwyc in OCamlformat, dropping OMP. The dependency stays the same.
My question is, what are the drawbacks of doing that ?

@gpetiot gpetiot requested a review from Julow February 26, 2020 19:20
Format.fprintf Format.err_formatter
"Warning: %s is invalid, recovering.\n%!" input_name ;
Ok t_new )
else internal_error (`Cannot_parse exn) (exn_args ())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is also in parse_result below. Let's share it.


let parse_result xunit conf (opts : Conf.opts) ~source ~input_name =
match parse xunit.parse conf ~source with
| exception exn ->
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at format above, parse could raise several kinds of exceptions:

      | exception Sys_error msg -> Error (User_error msg)
      | exception Warning50 l -> internal_error (`Warning50 l) (exn_args ())
      | exception exn -> internal_error (`Cannot_parse exn) (exn_args ())

The corresponding errors are no longer reported correctly. I'm not sure it makes sense to apply recovering when the first two (Sys_error, Warning50) happen.

@emillon
Copy link
Copy Markdown
Collaborator

emillon commented Feb 27, 2020

I'd like to explain the dual parser design a bit. The idea is generally what's described in #906. One important aspect is that it leave the secondary parser out of the happy path. This ensures that:

  • we keep using the system parser and OMP for all valid code.
  • we do not have to maintain a fork of the parser. When a new version comes out, updating the recovery parser is not on the critical path.
  • this feature is easy to revert.
  • the string -> string API allows a clear separation between the ocaml parser and the rest of ocamlformat. This is important for both technical and licensing reasons.

Copy link
Copy Markdown
Collaborator

@emillon emillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go. There are probably tons of ways we can improve this but as long as this works in is properly isolated this can happen on master.

@emillon
Copy link
Copy Markdown
Collaborator

emillon commented Feb 27, 2020

HUGE thanks to @gpetiot who's been tirelessly working on this feature for several months! 🎉

@gpetiot
Copy link
Copy Markdown
Collaborator Author

gpetiot commented Feb 27, 2020

I think this is good to go. There are probably tons of ways we can improve this but as long as this works in is properly isolated this can happen on master.

Thanks! I will merge it now, but everyone can of course improve it in subsequent PRs

@gpetiot gpetiot merged commit 27466be into ocaml-ppx:master Feb 27, 2020
@gpetiot gpetiot deleted the merlin-parser branch February 27, 2020 17:31
@Julow
Copy link
Copy Markdown
Collaborator

Julow commented Feb 27, 2020

This is awesome ! Congrats

I feel like this has been merged too fast, however. I think the error handling may be broken and the code in Translation_unit needs improvement.

Julow added a commit to Julow/opam-repository that referenced this pull request Apr 2, 2020
CHANGES:

#### New features

  + Add an option `--format-invalid-files` to print unparsable parts of the input as verbatim text. This feature is still experimental. (ocaml-ppx/ocamlformat#1026) (Guillaume Petiot)

  + Support multi-indices extended indexing operators (ocaml-ppx/ocamlformat#1279, ocaml-ppx/ocamlformat#1277) (Jules Aguillon, Guillaume Petiot)
    This feature has been added in OCaml 4.10.0

  + Handle OCaml 4.10.0 AST (ocaml-ppx/ocamlformat#1276) (Guillaume Petiot)

  + Preserve functor syntax for consistency (ocaml-ppx/ocamlformat#1312) (Guillaume Petiot)
    Previously both functor syntax: `module M = functor (K : S) -> struct end` and `module M (K : S) = struct end` would be formatted as the latter, the original syntax is now preserved.

#### Changes

  + Add the option `doc-comments-val=before|after` (ocaml-ppx/ocamlformat#1012) (Jules Aguillon)
    This option set the placement of documentation comment on `val` and `external` only.
    It is set to `after` by default.

  + The default for `doc-comments` is changed from `after` to `before` (ocaml-ppx/ocamlformat#1012, ocaml-ppx/ocamlformat#1325) (Jules Aguillon)
    This affects both `conventional` (default) and `ocamlformat` profiles.

  + Some options are now deprecated:
    * `doc-comments` (ocaml-ppx/ocamlformat#1293, ocaml-ppx/ocamlformat#1012)
      This option depends on a flawed heuristic.
      It is replaced by `doc-comments-val` for `val` and `external` declarations.
      There is no equivalent to this option in the general case.
    * `escape-chars`, `escape-strings` and `extension-sugar` (ocaml-ppx/ocamlformat#1293)
      These options are rarely used and their default behavior is considered to be the right behavior.

  + Add space between `row_field` attributes and the label or arguments, to be
    consistent with the non-polymorphic case. (ocaml-ppx/ocamlformat#1299) (Craig Ferguson)

#### Bug fixes

  + Fix missing parentheses around `let open` (ocaml-ppx/ocamlformat#1229) (Jules Aguillon)
    eg. `M.f (M.(x) [@attr])` would be formatted to `M.f M.(x) [@attr]`, which would crash OCamlformat

  + Remove unecessary parentheses with attributes in some structure items:
    * extensions and eval items (ocaml-ppx/ocamlformat#1230) (Jules Aguillon)
      eg. the expression `[%ext (() [@attr])]` or the structure item `(() [@attr]) ;;`
    * `let _ = ...`  constructs (ocaml-ppx/ocamlformat#1244) (Etienne Millon)

  + Fix some bugs related to comments:
    * after a function on the rhs of an infix (ocaml-ppx/ocamlformat#1231) (Jules Aguillon)
      eg. the comment in `(x >>= fun y -> y (* A *))` would be dropped
    * in module unpack (ocaml-ppx/ocamlformat#1309) (Jules Aguillon)
      eg. in the module expression `module M = (val x : S (* A *))`

  + Fix formatting of empty signature payload `[%a:]` (ocaml-ppx/ocamlformat#1236) (Etienne Millon)

  + Fix parenthesizing when accessing field of construct application (ocaml-ppx/ocamlformat#1247) (Guillaume Petiot)

  + Fix formatting of attributes on object overrides `{< >}` (ocaml-ppx/ocamlformat#1238) (Etienne
    Millon)

  + Fix attributes on coercion (ocaml-ppx/ocamlformat#1239) (Etienne Millon)

  + Fix formatting of attributes on packed modules (ocaml-ppx/ocamlformat#1243) (Etienne Millon)

  + Fix parens around binop operations with attributes (ocaml-ppx/ocamlformat#1252, ocaml-ppx/ocamlformat#1306) (Guillaume Petiot, Craig Ferguson)

  + Remove unecessary parentheses in the argument of indexing operators (ocaml-ppx/ocamlformat#1280) (Jules Aguillon)

  + Retain attributes on various AST nodes:
    * field set expressions, e.g. `(a.x <- b) [@A]` (ocaml-ppx/ocamlformat#1284) (Craig Ferguson)
    * instance variable set expressions, e.g. `(a <- b) [@A]` (ocaml-ppx/ocamlformat#1288) (Craig Ferguson)
    * indexing operators, e.g. `(a.(b)) [@A]` (ocaml-ppx/ocamlformat#1300) (Craig Ferguson)
    * sequences, e.g. `(a; b) [@A]` (ocaml-ppx/ocamlformat#1291) (Craig Ferguson)

  + Avoid unnecessary spacing after object types inside records and polymorphic variants,
    e.g. `{foo : < .. > [@A]}` and `{ foo : < .. > }` (ocaml-ppx/ocamlformat#1296) (Craig Ferguson)

  + Fix missing parentheses around tuples with attributes. (ocaml-ppx/ocamlformat#1301) (Craig Ferguson)
    Previously, `f ((0, 0) [@A])` would be formatted to `f (0, 0) [@A]`, crashing OCamlformat.

  + Avoid emitting `>]` when an object type is contained in an extension point
    or attribute payload (ocaml-ppx/ocamlformat#1298) (Craig Ferguson)

  + Fix crash on the expression `(0).*(0)` (ocaml-ppx/ocamlformat#1304) (Jules Aguillon)
    It was formatting to `0.*(0)` which parses as an other expression.

  + Preserve empty doc-comments syntax. (ocaml-ppx/ocamlformat#1311) (Guillaume Petiot)
    Previously `(**)` would be formatted to `(***)`.

  + Do not crash when a comment contains just a newline (ocaml-ppx/ocamlformat#1290) (Etienne Millon)

  + Handle lazy patterns as arguments to `class` (ocaml-ppx/ocamlformat#1289) (Etienne Millon)

  + Preserve cinaps comments containing unparsable code (ocaml-ppx/ocamlformat#1303) (Jules Aguillon)
    Previously, OCamlformat would fallback to the "wrapping" logic, making the comment
    unreadable and crashing in some cases.

  + Fix normalization of attributes, fixing the docstrings in attributes (ocaml-ppx/ocamlformat#1314) (Guillaume Petiot)

  + Add missing parentheses around OR-patterns with attributes (ocaml-ppx/ocamlformat#1317) (Guillaume Petiot)

  + Fix spacing inside parens for symbols when the spacing was handled by the englobing exp (ocaml-ppx/ocamlformat#1316) (Guillaume Petiot)

  + Fix invalid (unparsable) docstrings (ocaml-ppx/ocamlformat#1315) (Guillaume Petiot)
    When parsing a comment raises an error in odoc, it is printed as-is.

  + Fix parenthesizing of optional arguments rebound to non-variables, e.g. `let
    f ?a:(A) = ()` rather than the unparsable `let f ?a:A = ()` (ocaml-ppx/ocamlformat#1305) (Craig Ferguson)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not fail on incorrect files

4 participants