Ambiguous pattern variables: add negative rows#1552
Conversation
trefis
left a comment
There was a problem hiding this comment.
The first commit is correct and indeed makes the situation nicer for my PR interested in coherence of the first column.
The second commit however seems incorrect. I'm tempted to believe the general idea is sound and that you only need to be a bit more careful in the implementation. But we'll see.
typing/parmatch.ml
Outdated
| | { default; constrs = [] } -> | ||
| (* the first column contains no head constructor; | ||
| they are all _ after simplification, so it can be dropped *) | ||
| let default_ns = List.map snd ns in |
There was a problem hiding this comment.
That line looks fishy.
I spent a long time thinking about the coherence implications of that line, that is: clearly the default_ns (which seems like a bad name to me btw, default_ns is in no way the default matrix of ns) you produce can contain incoherences, but does it matter?
I'm still not quite sure but I believe it does and that we should care about the coherence of ns.
Here we probably want to build the specialized submatrices of ns and recurse on each of them (keep default for the other argument on each of the recursive call). This is just an intuition however and it will be easier to test out if this PR is rebased on top of mine.
In any case, I took some time to produce a convoluted example introducing incoherence in specific places and trying to see how the new implementation would behave on it. At which point I realized I could greatly simplify my test case to the following code (where all the columns are coherent at every step):
type a = A1 | A2
type 'a alg =
| Val of 'a
| Binop of 'a alg * 'a alg
let cmp (pred : a -> bool) (x : a alg) (y : a alg) =
match x, y with
| Val A1, Val A1 -> ()
| Val x, _
| _, Val x when pred x -> ()
| _, _ -> ()With your implementation the warning doesn't trigger on this example. I think it should.
I tried the idea I mentioned above (recursing on each of the specialized submatrices of ns) and it does warn on that specific example (make core then ./boot/ocamlrun ./ocamlc -nostdlib -nopervasives -w +57 simple_amb.ml) but make world.opt fails because of the assert (ns = []) a few lines above.
So my implementation is not quite right either, but you can probably get something along this line working.
There was a problem hiding this comment.
The diff of my attempt is the following btw:
diff --git a/typing/parmatch.ml b/typing/parmatch.ml
index 16949df57..3fbac8c43 100644
--- a/typing/parmatch.ml
+++ b/typing/parmatch.ml
@@ -2195,8 +2195,17 @@ let rec matrix_stable_vars ns rs = match ns, rs with
| { default; constrs = [] } ->
(* the first column contains no head constructor;
they are all _ after simplification, so it can be dropped *)
- let default_ns = List.map snd ns in
- matrix_stable_vars default_ns default
+ let q0_ns = discr_pat omega ns in
+ begin match build_specialized_submatrices ~extend_row:(@) q0_ns ns with
+ | { default = default_ns ; constrs = [] } ->
+ matrix_stable_vars default_ns default
+ | { default = _; constrs = constrs_ns } ->
+ let submat_stable =
+ List.map (fun (_, sub_ns) -> matrix_stable_vars sub_ns default)
+ constrs_ns
+ in
+ List.fold_left stable_inter All submat_stable
+ end
| { default = _; constrs } ->
(* A stable variable must be stable in each submatrix.
There was a problem hiding this comment.
(fun (_, sub_ns) -> matrix_stable_vars sub_ns default) does not respect arity, as sub_ns may in general have more columns than ns (while default always has one less). In the non-default case here it is safer build the specialized matrix of rs against the head of each constr_ns group (so to drop the decomposition of rs), which suggests that we are really trying to split on both matrices at once instead of decomposing them independently. Also it is not clear whether it is right to forget about default_ns in that case (when constrs_ns is non-empty): default_ns has less content than any constrs_ns, but precisely having less negative information can lead to accepting more positive rows, and thus finding unstable variables.
|
Thanks, indeed Regarding the first commit, that it is the only one that I suggested that you could cherry-pick into #1550 -- not the second commit which indeed is independent. This code is non-trivial to write because it is the only one where we have both a negative matrix and a positive matrix. In other functions (for example the The reason why the other functions get away with only a positive row (instead of the general case of a positive matrix) is that they have an easy operation to join the results of two "parallel" calls, the calls on the two sides of an or-pattern (for satisfiable the join is just I think I'll wait until #1550 is merged to make another iteration on this PR, taking your comments/suggestions into account. |
2186cb5 to
b8785f6
Compare
|
@trefis: I just rebased this PR, and I changed the algorithm to split on all rows (positive and negative) at the same time -- kind of cool. Let me know what you think. (I have left the handling of the default matrix in the non-empty-constrs case unchanged, while it is not completely clear to me that it is actually correct. But I think it is.) |
trefis
left a comment
There was a problem hiding this comment.
The code is surprisingly simpler than I would have expected! :)
However, trying it on the example I gave during the previous round of review (that github now hides) I still get no warning.
Can you confirm whether you agree or not that a warning should be emitted?
And regardless of the answer, do you mind adding it to the testsuite?
For reference, the test case was:
type a = A1 | A2
type 'a alg =
| Val of 'a
| Binop of 'a alg * 'a alg
let cmp (pred : a -> bool) (x : a alg) (y : a alg) =
match x, y with
| Val A1, Val A1 -> ()
| Val x, _
| _, Val x when pred x -> ()
| _, _ -> ()
typing/parmatch.ml
Outdated
| @@ -1558,21 +1558,14 @@ and push_no_or_column rs = List.map push_no_or rs | |||
| (* Those are adaptations of the previous homonymous functions that | |||
There was a problem hiding this comment.
That comment should be updated (or simply removed?), you're only "adapting" one function, and its name is different.
typing/parmatch.ml
Outdated
| To track this information, the matrices we analyze contain both *positive* rows, | ||
| that describe the rows currently being analyzed (of type amb_row, so that their | ||
| varsets are tracked) and *negative rows*, that describe the cases already | ||
| matched against. A variable is stable if there exist two values that |
There was a problem hiding this comment.
I think you meant "unstable" here?
|
OK, I think I know what the issue is: In the case where
But that's actually wrong. |
|
On the example I have given, the problem appears as follow Which will be split into: And then you recurse only on Which is further split into: And again: And one last time: At which point when you recurse on the only The thing that is a bit sad is that if we recurse on the default matrix as well as on the |
|
Also, I meant to say: How about introducing submodules which will contain a "row" type and the correspond simplification functions? |
Actually, that hints at the fact that we probably want to call |
|
Excellent observations and interesting feedback -- as I am getting used to. Thanks! |
b8785f6 to
a10bff1
Compare
|
I made a first batch of easy changes that addresses some of your comment (and optimizes the special case where there is no positive row left in the matrix). In particular, I believe that the |
|
(I'm also interested in playing with |
|
I just pushed a new commit that uses |
|
I think I agree with you that the current implementation is correct. I'll resume nitpicking then:
|
I think we are misunderstanding each other here and we should dig in. The comment is there (only) in the case where the rows are empty (a non-empty matrix with several empty rows). One or several empty positive rows matches exactly one vector of values (the empty vector), but the presence of a negative row means that this empty vector is rejected, so the matrix matches nothing.
Note that, in |
|
(Most other functions in Parmatch have this characteristic that they are insensitive to the row order. The order comes in in the choice of who is |
|
@trefis two points:
|
b843e58 to
3e2a642
Compare
Going back to the discussion about ordering and such: Now, I guess your interpretation of the polarity in some way leads you to saying that "if I have a negative row, then the matrix doesn't filter the empty vector": you are changing the definition of what it means for a matrix to filter a value in presence of polarity. That might be fine, but I wasn't reasoning about it in the same way and it is also never made explicit in your documentation/comments (but of course I might have missed the relevant comment). The way I was thinking about it was more in line with the other comments you've placed in the code (on the definition of
Indeed even though in the matrix the rows order is irrelevant (and not guaranteed) the guarantee you have is we consider the clauses of the pattern matching in order, and grow our set of negative rows as we go. I see this as coherent with the last part of your message above, i.e.
which is what I was roughly trying to express previously. Does that make sense? |
maranget
left a comment
There was a problem hiding this comment.
Fine idea to introduce negative rows, as it leads nice code sharing.
…nctions This is necessary to use several distinct simplify_first_col at once, as we wish to do to handle negative rows in matrix_stable_vars.
3e2a642 to
6e41d65
Compare
|
I made two small changes suggested by Luc during his review:
Edit: the plan is to merge the PR if the CI passes. @trefis, if you wish to tell Github that the changes you requested have been performed, now is the time. |
trefis
left a comment
There was a problem hiding this comment.
All the bugs/issues have been fixed and I think we're all happy with the code changes.
Personally I'd still change the comment we discussed in our last exchange as nothing in the file explains it, and I believe it could be formulated in another way which doesn't require extra knowledge.
That said, it's not really important and I wouldn't be very sad if you merged as is.
(N.B. I assumed we agreed on my last message which you haven't replied to)
| | [] -> All | ||
| | ((Positive {row = []; _} | Negative []) :: _) as empty_rows -> | ||
| let exception Negative_empty_row in | ||
| (* if at least one empty row is negative, the matrix matches no value *) |
There was a problem hiding this comment.
No change to that comment then?
|
@trefis thinking about this more, I think that it is correct to say explicitly that the semantics of a signed matrix is different from the semantics of a non-signed matrix: their set-of-values interpretation is not the same. To this end, I amended the comment introducing To track this information, the matrices we analyze contain both
*positive* rows, that describe the rows currently being analyzed
- (of type amb_row, so that their varsets are tracked) and *negative
- rows*, that describe the cases already matched against. A variable
- is stable if, for any value not matched by any of the negative
- rows, the environment captured by any of the matching positive rows
- is identical.
+ (of type Varsets.row, so that their varsets are tracked) and
+ *negative rows*, that describe the cases already matched against.
+
+ The values matched by a signed matrix are the values matched by
+ some of the positive rows but none of the negative rows. In
+ particular, a variable is stable if, for any value not matched by
+ any of the negative rows, the environment captured by any of the
+ matching positive rows is identical.
*)
type ('a, 'b) signed = Positive of 'a | Negative of 'bI believe that the comment in the function itself is now correct with respect to that interpretation. (Maybe it would be even clearer to explicitly distinguish the "semantics as a matrix" and the "semantics as a signed matrix" (the signed semantics), and say something like "the signed semantics of |
6e41d65 to
f0f985b
Compare
Yep, that's the other way to fix "the issue". Your new comment on |
Consider a set of pattern-matching clauses of the form
currently this code would raise a warning 57: the variable
xin the guarded pattern((Foo x, _) | (_, Foo x))is ambiguous. However, this ambiguity is spurious, as the values that expose the ambiguity are of the shape(Foo v, Foo v'), and would all have been caught by the first clause.This PR adds support for negative rows in the code computing stable pattern variables.
Note: Clauses that are guarded are not kept as negative clauses below (as we cannot reason on whether they catch values of the corresponding shape or not without analyzing the guard), so the following example would still raise a warning -- which I find slightly frustrating:
The first commit of this PR is a refactoring (that does not change the semantics) that I wrote as part of a larger PR to handle ill-typed columns in the ambiguous variable code (related to #1550, cc @trefis). In the inconsistent case, one needs to say that all variables are stable. We need to do the same in the case where we end up with an empty row to analyze but there remains a negative matrix -- filled with empty rows -- as this means that no value can ever reach this sub-matrix.