Skip to content

[WIP] Warning for "syntactically dangerous" nested pattern matching#716

Closed
alainfrisch wants to merge 3 commits intoocaml:trunkfrom
alainfrisch:warning_nested_match
Closed

[WIP] Warning for "syntactically dangerous" nested pattern matching#716
alainfrisch wants to merge 3 commits intoocaml:trunkfrom
alainfrisch:warning_nested_match

Conversation

@alainfrisch
Copy link
Copy Markdown
Contributor

Follow up to #715 .

This is to address a common concern with OCaml syntax, namely the lack of a closing token for the pattern matching constructs. We want to detect cases such as:

  let f = function
     | A -> ...
     | B x ->
           match x with
           | A -> ...
           | B x -> ...
           | C -> ...
     | C -> ...

The compiler considers that the final clause belongs to the nested pattern matching while the intent of the programmer is that it should belong to the outermost one. This situation can typically arise after moving pattern matching clauses, e.g. starting from the valid:

  let f = function
     | A -> ...
     | C -> ...
     | B x ->
           match x with
           | A -> ...
           | B x -> ...
           | C -> ...

This PR adds a warning that would fire on the two cases above, telling the user that the inner pattern matching should be delimited (by parentheses or begin...end). This is done by keeping in the Parsetree the info about which sub-expressions are delimited (using an attribute) and checking for nested pattern matchings by traversing the Parsetree before type-checking (reusing an existing traversal).

As discussed in #715, this could be complemented/combined with another criteria based on badly aligned patterns in a given pattern matching, which would also indicate a suspicious case. This would be to really detect bad cases, while the current PR is more about encouraging a more robust style where nested pattern matching are explicitly delimited (even when not required because they are in the last clause of the surrounding pattern matching).

@alainfrisch alainfrisch changed the title Warning for "syntactically dangereous" nested pattern matching [WIP] Warning for "syntactically dangereous" nested pattern matching Jul 26, 2016
@alainfrisch alainfrisch mentioned this pull request Jul 26, 2016
@mshinwell mshinwell changed the title [WIP] Warning for "syntactically dangereous" nested pattern matching [WIP] Warning for "syntactically dangerous" nested pattern matching Jul 26, 2016
@trefis
Copy link
Copy Markdown
Contributor

trefis commented Jul 26, 2016

For what it's worth after the discussions of #278 I wrote something very similar to what you are doing here. My code was keep track of parentheses of if branches, I just made the result available here .

My implementation is slightly different from yours, following @damiendoligez input (cf. #278 (comment) ) I implemented the check directly in the parser instead of adding information to the parsetree to allow doing the check at a later stage.

We are still experimenting with this check internally (to see how useful it is, how heavy it makes the code, etc.) and I am not proposing it to be merged yet. However I thought you'd like to aware of the approach I linked since we basically perform the same check.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

I implemented the check directly in the parser instead of adding information to the parsetree to allow doing the check at a later stage.

I was thinking about that, but it's more complex and intrusive in the parser.

Do you see any drawback in my approach? If you are concerned with extra attributes in the parsetree, they could easily be removed (from .cmt files; and one could also run the check and remove attributes before passing the AST to ppx). But keeping them could be useful for external tools as well (the check could be implemented purely like that, btw, but if it is not a built-in warning, beginners will not use it). One could also keep the kind of delimiter in the attribute payload (being..end or parenthesis) and use the info in pprintast.ml (to get printing closer to the source code formatting).

@yminsky
Copy link
Copy Markdown

yminsky commented Jul 27, 2016

Can you more precisely document under what circumstances the check fires? Is it only when a match is directly nested? e.g., how do these test cases fare?

match expr with
| pattern ->
  statement;
  match expr with
  | pattern -> statement

Or this one?

match expr with
| pattern ->
  begin 
     statement;
     match expr with
     | pattern -> statement
   end

I think that this patch should probably come along with a set of top-level expect tests that demonstrate when the check does and doesn't fire.

For what it's worth, my preference would be to think about this the other way: the default should be to require delimiting around every match statement, with some carefully considered safe harbors when it can be dropped. The idea is to teach users that they can reliably find the end of a match statement by looking for the begin/end or parens surrounding them. This brings us closer to what I think the "right" syntax would have been be, which is:

match <expr> with
| <pattern> -> <action>
| <pattern> -> <action>
end

-> check_nested_match e
| Pexp_ifthenelse (_, e1, Some e2)
| Pexp_sequence (e1, e2) ->
check_nested_match e1; check_nested_match e2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to recurse on e1 here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, considering the precedence in the parser.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

alainfrisch commented Jul 27, 2016

e.g., how do these test cases fare?

The first would trigger the warning, not the second one.

Basically, the current criteria can be described as: whenever we reach a pattern matching case (in a match/try/function), we traverse some paths in the AST (see below) and if one path reaches another match/try/function construct, the warning fires.

The traversal stops on begin...end or parentheses. Otherwise, it follows some edges currently defined by:

    | Pexp_let (_, _, e)
    | Pexp_letmodule (_, _, e)
    | Pexp_letexception (_, e)
    | Pexp_ifthenelse (_, e, None)
      -> check_nested_match e
    | Pexp_ifthenelse (_, e1, Some e2)
    | Pexp_sequence (e1, e2) ->
        check_nested_match e1; check_nested_match e2

We should probably tweak these rules and be exhaustive in this pattern matching as suggested by @trefis.

Globally, the idea is to enforce that all paths between two nested matching constructions should traverse either some parenthesis/begin...end or other kinds of "good" edges (notion to be refined).

The idea is to teach users that they can reliably find the end of a match statement by looking for the begin/end or parens surrounding them.

Would you prefer to encourage/force people to use begin...end in

 function
    | Some x ->
        let y =
          begin match foo x with Some a -> a | None -> 0 end
        in
        ...

I think it's pretty unambiguous and I don't think requiring explicit delimitation in such cases would be a good idea.

FWIW, I'm not interested in putting too much effort in this proposal. I intended it more as a proof-of-concept for something I believe would be a more realistic alternative to #715 .

@yminsky
Copy link
Copy Markdown

yminsky commented Jul 27, 2016

Your semantics seem pretty good, actually, though it's not what I would have initially thought of. I'd be curious to see how it plays out in real code. One thing we might want to do is try out this rule internally, and see how it fares in non-trivial projects.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

alainfrisch commented Jul 27, 2016

I actually tried to enable the warning on OCaml code base itself (stdlib, compiler, tools), and it was actually quite painful. A lot of

    match ... with
    | ... -> ...
    | ... ->
         try ...
         with Not_found -> ...

or

    match ... with
    | ... -> ...
    | ... ->
         match ... with
         | [] -> ...
         | hd::tl -> ...

If the goal is to report "suspicious" nesting rather than "dangerous" cases as above, one could add the heuristic that if the inner expression is a try...with with a single case or a match...with with two cases, then it is probably valid. This would reduce significantly the number of warnings fired for the core distribution.

With this goal ("suspicious" nesting), something based on indentation might also be worth investigating. But I'm not so sure it is the right approach to do that in the compiler. One could simply check with e.g. ocp-ident that the code is already fully well-indented (this would also address your case with if..then..else and bind), either at compilation time, commit time, or live in the editor.

@garrigue
Copy link
Copy Markdown
Contributor

I actually tried to enable the warning on OCaml code base itself (stdlib, compiler, tools), and it was actually quite painful.

I don't think you should be overly concerned about that.
The current syntax allows these kinds of cascading matches, and the compiler code uses and abuses of them, but this does not mean this is good practice to use them.
I'd be more than happy to fix my code if that ends up giving me more security (we're just talking about adding parentheses...)

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Jul 28, 2016

Do you see any drawback in my approach?

Not really no. @damiendoligez or @lpw25 should perhaps comment, as they seemed to have a stronger opinion on the issue.

I do agree that keeping the information as attributes could be useful someday to some external tool.

@ghost
Copy link
Copy Markdown

ghost commented Mar 12, 2018

Doing this check in the parser as done in @trefis's patch seems more natural to me as well. I heard there might be a work-in-progress to revive the if-then-else check. I think we should consider this warning at the same time, given that the code to implement such warnings could be shared.

@alainfrisch
Copy link
Copy Markdown
Contributor Author

Doing this check in the parser as done in @trefis's patch seems more natural to me as well.

No strong opinion on that, but:

  • Keep in mind that it means that warning-controlling attributes don't work with warnings triggered by the parser. One possibility is to insert ppwarning attributes so that warnings are actually raised later during type-checking, even if the decision is done by the parser.

  • In general, I'm a bit reluctant to put too much logic in the parser, especially if this complexifies the grammar. Even if the grammar is not modified (but only "semantic actions"), any such logic makes it more difficult to be use a different concrete syntax (such as Reason).

Anyway, I don't intend to put more energy on this specific PR, so knowing other people are working on a similar goal, I prefer to close it for now.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Mar 12, 2018

any such logic makes it more difficult to be use a different concrete syntax (such as Reason).

It's not clear that you'd want to duplicate such a logic for a different concrete syntax though. I see the fact that the warning is raised by the parser as a sign that it is very tied to the concrete syntax; I wouldn't expect it to be necessary for another syntax.

@DemiMarie
Copy link
Copy Markdown
Contributor

I would like to see this reopened, since it is a major footgun.

@kit-ty-kate
Copy link
Copy Markdown
Member

Anyway, I don't intend to put more energy on this specific PR, so knowing other people are working on a similar goal, I prefer to close it for now.

I've asked around and Janestreet is not working on something similar anymore. Would anybody else be interested to take this long-standing issue back?

EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request Dec 17, 2021
…turn

runtime: extern_free_position_table should return on extern_flags & NO_SHARING
@ghost
Copy link
Copy Markdown

ghost commented Jun 29, 2022

Anyway, I don't intend to put more energy on this specific PR, so knowing other people are working on a similar goal, I prefer to close it for now.

I've asked around and Janestreet is not working on something similar anymore. Would anybody else be interested to take this long-standing issue back?

I am interested, but I don't know what the next step(s) would be?

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants