[WIP] Warning for "syntactically dangerous" nested pattern matching#716
[WIP] Warning for "syntactically dangerous" nested pattern matching#716alainfrisch wants to merge 3 commits intoocaml:trunkfrom
Conversation
|
For what it's worth after the discussions of #278 I wrote something very similar to what you are doing here. My code was keep track of parentheses of if branches, I just made the result available here . My implementation is slightly different from yours, following @damiendoligez input (cf. #278 (comment) ) I implemented the check directly in the parser instead of adding information to the parsetree to allow doing the check at a later stage. We are still experimenting with this check internally (to see how useful it is, how heavy it makes the code, etc.) and I am not proposing it to be merged yet. However I thought you'd like to aware of the approach I linked since we basically perform the same check. |
I was thinking about that, but it's more complex and intrusive in the parser. Do you see any drawback in my approach? If you are concerned with extra attributes in the parsetree, they could easily be removed (from .cmt files; and one could also run the check and remove attributes before passing the AST to ppx). But keeping them could be useful for external tools as well (the check could be implemented purely like that, btw, but if it is not a built-in warning, beginners will not use it). One could also keep the kind of delimiter in the attribute payload (being..end or parenthesis) and use the info in pprintast.ml (to get printing closer to the source code formatting). |
|
Can you more precisely document under what circumstances the check fires? Is it only when a match is directly nested? e.g., how do these test cases fare? Or this one? I think that this patch should probably come along with a set of top-level expect tests that demonstrate when the check does and doesn't fire. For what it's worth, my preference would be to think about this the other way: the default should be to require delimiting around every match statement, with some carefully considered safe harbors when it can be dropped. The idea is to teach users that they can reliably find the end of a match statement by looking for the begin/end or parens surrounding them. This brings us closer to what I think the "right" syntax would have been be, which is: |
| -> check_nested_match e | ||
| | Pexp_ifthenelse (_, e1, Some e2) | ||
| | Pexp_sequence (e1, e2) -> | ||
| check_nested_match e1; check_nested_match e2 |
There was a problem hiding this comment.
Is it really necessary to recurse on e1 here?
There was a problem hiding this comment.
Probably not, considering the precedence in the parser.
The first would trigger the warning, not the second one. Basically, the current criteria can be described as: whenever we reach a pattern matching case (in a The traversal stops on | Pexp_let (_, _, e)
| Pexp_letmodule (_, _, e)
| Pexp_letexception (_, e)
| Pexp_ifthenelse (_, e, None)
-> check_nested_match e
| Pexp_ifthenelse (_, e1, Some e2)
| Pexp_sequence (e1, e2) ->
check_nested_match e1; check_nested_match e2We should probably tweak these rules and be exhaustive in this pattern matching as suggested by @trefis. Globally, the idea is to enforce that all paths between two nested matching constructions should traverse either some parenthesis/begin...end or other kinds of "good" edges (notion to be refined).
Would you prefer to encourage/force people to use function
| Some x ->
let y =
begin match foo x with Some a -> a | None -> 0 end
in
...I think it's pretty unambiguous and I don't think requiring explicit delimitation in such cases would be a good idea. FWIW, I'm not interested in putting too much effort in this proposal. I intended it more as a proof-of-concept for something I believe would be a more realistic alternative to #715 . |
|
Your semantics seem pretty good, actually, though it's not what I would have initially thought of. I'd be curious to see how it plays out in real code. One thing we might want to do is try out this rule internally, and see how it fares in non-trivial projects. |
|
I actually tried to enable the warning on OCaml code base itself (stdlib, compiler, tools), and it was actually quite painful. A lot of match ... with
| ... -> ...
| ... ->
try ...
with Not_found -> ...or match ... with
| ... -> ...
| ... ->
match ... with
| [] -> ...
| hd::tl -> ...If the goal is to report "suspicious" nesting rather than "dangerous" cases as above, one could add the heuristic that if the inner expression is a With this goal ("suspicious" nesting), something based on indentation might also be worth investigating. But I'm not so sure it is the right approach to do that in the compiler. One could simply check with e.g. ocp-ident that the code is already fully well-indented (this would also address your case with if..then..else and bind), either at compilation time, commit time, or live in the editor. |
I don't think you should be overly concerned about that. |
Not really no. @damiendoligez or @lpw25 should perhaps comment, as they seemed to have a stronger opinion on the issue. I do agree that keeping the information as attributes could be useful someday to some external tool. |
|
Doing this check in the parser as done in @trefis's patch seems more natural to me as well. I heard there might be a work-in-progress to revive the if-then-else check. I think we should consider this warning at the same time, given that the code to implement such warnings could be shared. |
No strong opinion on that, but:
Anyway, I don't intend to put more energy on this specific PR, so knowing other people are working on a similar goal, I prefer to close it for now. |
It's not clear that you'd want to duplicate such a logic for a different concrete syntax though. I see the fact that the warning is raised by the parser as a sign that it is very tied to the concrete syntax; I wouldn't expect it to be necessary for another syntax. |
|
I would like to see this reopened, since it is a major footgun. |
I've asked around and Janestreet is not working on something similar anymore. Would anybody else be interested to take this long-standing issue back? |
…turn runtime: extern_free_position_table should return on extern_flags & NO_SHARING
I am interested, but I don't know what the next step(s) would be? |
Follow up to #715 .
This is to address a common concern with OCaml syntax, namely the lack of a closing token for the pattern matching constructs. We want to detect cases such as:
The compiler considers that the final clause belongs to the nested pattern matching while the intent of the programmer is that it should belong to the outermost one. This situation can typically arise after moving pattern matching clauses, e.g. starting from the valid:
This PR adds a warning that would fire on the two cases above, telling the user that the inner pattern matching should be delimited (by parentheses or
begin...end). This is done by keeping in the Parsetree the info about which sub-expressions are delimited (using an attribute) and checking for nested pattern matchings by traversing the Parsetree before type-checking (reusing an existing traversal).As discussed in #715, this could be complemented/combined with another criteria based on badly aligned patterns in a given pattern matching, which would also indicate a suspicious case. This would be to really detect bad cases, while the current PR is more about encouraging a more robust style where nested pattern matching are explicitly delimited (even when not required because they are in the last clause of the surrounding pattern matching).