Parmatch column coherence by trefis · Pull Request #1550 · ocaml/ocaml

trefis · 2017-12-29T17:24:11Z

Spin off of #1518 following discussions with Gabriel and Luc.

This version of the patch introduce an explicit coherence check at various points in parmatch.
There is a somewhat lengthy description of the check in the code itself, so I won't go into the details here (improvements to that documentation are very welcome though!).

A perhaps more readable version of the diff (as it ignores whitespaces changes) on parmatch.ml can be seen here.

A note on pattern typing

Currently patterns are typechecked from "left to right", that is

type _ t = I : int t | C : char t

let f (type a) (t : a t) (a : a) =
  match t, a with
  | I, 3 -> ()
  | C, 'a' -> ()

is well typed, but the following is not:

let f (type a) (t : a t) (a : a) =
  match a, t with
  | 3, I -> ()
  | 'a', C -> ()

From the point of view of typing alone, one could imagine lifting that restriction at some point in the future and allowing both versions to typecheck.

In pratice however, in their current state, neither parmatch nor matching would survive such a change.

Incidentally Parmatch.every_satisfiable was reversing the order of columns between being called and calling satisfiable.
You'll notice the List.rev, hidden in the commit introducing the coherence check, fixing that problem.

Limitations of the current coherence check

In various places I give a brief explanation of what can happen when we accept a column as coherent while it is in fact ill-typed.
The potential outcomes include false positives and negatives.

However:

for the checks which have an impact on the soundness (i.e. exhaustivity and usefulness) we always error on the safe side: we never say something is exhaustive or unused when it in fact isn't.
the proposed coherence check catches all the cases (and some more) which resulted in assertion failures/fatal errors in parmatch. The issues that are not detected by the current check were already silently accepted before (see for example the test added by the first commit whose result is left unchanged by the other patches).

gasche · 2018-01-01T11:25:19Z

typing/parmatch.ml

+   when an "incoherence" is not detected by this check.
+*)
+
+let first_column_is_coherent matrix =


This may be a detail, but I think it would be slightly nicer to have a coherent_heads h1 h2 binary function that checks whether two discriminant heads are coherent with each other, and expose this function in the module -- I think it may be of use later. Then you can implement the coherence check with a fold_left, or with this fairly amusing trick

let coherent_column = function | [] -> true | r::rs -> List.for_all2 (fun (h1, _) (h2, _) -> coherent_heads h1 h2) (r::rs) (rs @ [r])

Your trick is cute, but:

I think mine is slightly more efficient

I don't think yours works: what happens if the column is (1 _ 'a' _)?

As for the coherent_heads function, you just want my local same_kind lifted out (making its "discr_pat" a parameter)?
There is no need for it yet, I'd rather keep things as is and make the function globally available only when (if ever) we happen to need it.

Indeed, the trick is wrong for the reason you give, the relation is not transitive. This raises a question: is it actually the case that only checking coherence of the first non-any pattern with all others is enough? Should we test coherence between all couples of patterns? Or is the relation transitive on all non-any patterns?

is it actually the case that only checking coherence of the first non-any pattern with all others is enough?

Yes.
(the proof is left as an exercise for the careful reviewer)

gasche · 2018-01-01T11:25:59Z

typing/parmatch.ml

+          | Const_nativeint _, Const_nativeint _
+          | Const_float _, Const_float _
+          | Const_string _, Const_string _ -> true
+          | _, _ -> false


Could this be made non-fragile, by listing the alternatives at least in the first one?

gasche · 2018-01-01T11:30:18Z

typing/parmatch.ml

+   incompatibility: clearly this is fine.
+   - if we end up returning [true] then we're saying that [qs] is useful while
+   it is not. This is sad but not the end of the world, we're just allowing dead
+   code to survive.


I think that the correct check is not of whether pss is coherent, but whether pss followed by qs is coherent. See remarks below.

That seems right. I changed all such places.

gasche · 2018-01-01T13:25:44Z

typing/parmatch.ml

-                  satisfiable pss (simple_match_args p omega @ qs))
-                constrs
-            else
+        if not (first_column_is_coherent pss) then (


Here q is omega, so coherent pss is the same as coherent (q :: pss), and this test is correct -- but a comment could point that out.

I'm starting to feel like we have enough comments that really simple ones (like this one would be) can actually be omitted (I believe that, as is, one can fairly quickly recover the reason for not checking q :: pss)

Your code, your choice.

gasche · 2018-01-01T13:26:55Z

typing/parmatch.ml

        let q0 = discr_pat q pss in
-        satisfiable (build_specialized_submatrix ~extend_row:(@) q0 pss)
-          (simple_match_args q0 q @ qs)
+        if not (first_column_is_coherent pss) then (


Here I think that we should check first_column_is_coherent (q0 :: pss) -- that's the idea anyway, I suspect that the types don't match and that the API would have to change slightly.

gasche · 2018-01-01T13:29:46Z

typing/parmatch.ml

-            (list_satisfying_vectors
-               (build_specialized_submatrix ~extend_row:(@) q0 pss)
-               (simple_match_args q0 q @ qs))
+          if not (first_column_is_coherent pss) then (


again we should check also if q0 is coherent with the rest.

gasche · 2018-01-01T13:34:43Z

typing/parmatch.ml

+         matrix are stable. *)
+      List.fold_left (fun acc (_, { varsets; _ }) ->
+        List.fold_left IdSet.union acc varsets
+      ) IdSet.empty rs


I have worked on a better implementation of this on my own (a distinguished All value), as I thought that this would be outside the scope of your PR. I hope to send a PR for this soon.

This is now pushed as 63ed2cc, please take a look, feel free to ask for changes, and eventually to integrate into the present PR if you wish to -- I can rebase my own on top of the final version of this commit.

The idea of your implementation seems nice indeed (I haven't actually reviewed the actual code just yet), but IMO they are independent (which is a good thing) and should be reviewed and merged separately (that is, I don't want to "merge" the GPRs together before actually merging into trunk).

gasche · 2018-01-01T13:35:25Z

Approved in general, but see comments.

As per Gabriel's remark on GPR ocaml#1550

gasche · 2018-01-02T10:59:27Z

typing/parmatch.ml

        (* The handling of incoherent matrices is kept in line with
           [satisfiable] *)
-        if not (first_column_is_coherent pss) then
+        if not (first_column_is_coherent ((uq, make_row rem) :: pss)) then


maybe we could write (all_coherent (first_column pss)) and (all_coherent (uq :: first_column pss)) to avoid having to come up with the rest of the row that we don't actually care about?

Ah, that's what you meant by "the API might need some adjusting"!

Yes, I too find my current implementation a bit distasteful and yours seems prettier. But I'm also worried that yours means iterating over the rows once more, and allocating a fresh list with as many elements as there are rows, whereas with the current implementation we don't need to iterate and don't allocate much.

I'm tempted to keep my implementation because of these (probably unwarranted) performance concerns. But I'll welcome your input on that issue and am ready to change my mind if you feel differently.

I do think that the performance concern is unwarranted (the compiler already allocates, this case of the function already allocates, and we have no latency constraints), but from your answers regarding matrix_stable_vars it looks like I should send a cleanup PR after this one is merged anyway, so I don't mind proposing some API changes at the same time.

Alright, sounds good to me :)
I will leave things as is for now then.

trefis · 2018-01-02T11:25:32Z

Btw @gasche , can you suggest/push a Changes entry?

gasche · 2018-01-02T14:47:25Z

Changes, in section "Working version > Internal/compiler-libs changes"

- GPR#1550: make pattern-matching warnings more robust to ill-typed columns
  (Thomas Refis, with help from Gabriel Scherer and Luc Maranget)

trefis · 2018-01-02T14:56:56Z

Thanks, I added the Changes entry (but I put it in the bugfixes section).

gasche · 2018-01-02T14:58:17Z

If you're not interested in (1) being perfectionist over the coherence API in this PR or (2) removing the matrix_stable_vars part, then I think there is nothing left to change/discuss for the PR. My approval still stand, please rebase as you feel is appropriate and merge (or ask me to if you prefer).

trefis · 2018-01-03T11:48:11Z

I rebased, squashed a few of the commits together, and added a "style commit" on top doing the API changes proposed (i.e. the split of first_column_coherence into first_column and all_coherent).
I also renamed my local same_kind (in all_coherent) to coherent_heads which you seemed to like better.

As for the matrix_stable_vars part, I didn't change it, I still think it's better left as part of your GPR.

N.B. I left the style commit on its own so it's easier to review, but I think it should eventually be squash into the "explicitly check coherence of first column" one. I'll be happy to squash it and merge the PR in trunk once you've read it and confirmed one last time.

gasche · 2018-01-03T13:23:34Z

Yes, I think the API change makes for nicer code, thanks!

One minor question: all_coherent (q :: first_column pss) raises the question of what is the domain of all_coherent (and thus of the coherent_heads function): does it expected discriminators, as always returned by simplify_first_col, or arbitrary "simplified patterns" (without Tpar_var and Tpat_or)? If I read the code correctly, q above may be a general simple pattern rather than discriminators (its sub-arguments are non-omegas), so I think the natural answer is "simplified patterns" -- which suggests that the name suggestions I made is not the best, but oh well.

trefis · 2018-01-03T13:29:34Z

does it expected discriminators, as always returned by simplify_first_col, or arbitrary "simplified patterns" (without Tpar_var and Tpat_or)? If I read the code correctly, q above may be a general simple pattern rather than discriminators (its sub-arguments are non-omegas), so I think the natural answer is "simplified patterns" -- which suggests that the name suggestions I made is not the best, but oh well.

Indeed just "simplified" patterns, as documented.
Note: I didn't know about the name "discriminators", we usually use "normalized" in parmatch (and I believe matching) to mean "where the arguments have been replaced by omega".
In any case, I think the name of the helper function is fine: it only looks at the head of the patterns.

gasche · 2018-01-03T13:30:44Z

Feel free to merge.

Also extends the testsuite.

trefis · 2018-01-03T13:33:46Z

Thanks, I squashed the style commit and merged.

gasche · 2018-01-21T20:55:16Z

We may need to get this in 4.06 anyway, see MPR#7713.

gasche · 2018-01-26T14:01:45Z

typing/parmatch.ml

  | Tpat_constant(c1), Tpat_constant(c2) -> const_compare c1 c2 = 0
-  | Tpat_tuple _, Tpat_tuple _ -> true
  | Tpat_lazy _, Tpat_lazy _ -> true
  | Tpat_record _ , Tpat_record _ -> true


Note: we could check Array.length lbl1.lbl_all = Array.length lbl2.lbl_all here as you do for the coherence check (or some richer check in general). (Currently the tuples are checked but not the records.) If one of the two pattern is empty (no label), they may match.

gasche · 2018-01-26T14:03:16Z

typing/parmatch.ml

+      end
+    | Tpat_tuple l1, Tpat_tuple l2 -> List.length l1 = List.length l2
+    | Tpat_record ((_, lbl1, _) :: _, _), Tpat_record ((_, lbl2, _) :: _, _) ->
+      Array.length lbl1.lbl_all = Array.length lbl2.lbl_all


By the way, why are we checking the length only and not the lbl_all value itself?

gasche · 2018-01-26T14:04:40Z

typing/parmatch.ml

+      Array.length lbl1.lbl_all = Array.length lbl2.lbl_all
+    | Tpat_any, _
+    | _, Tpat_any
+    | Tpat_record ([], _), Tpat_record ([], _)


Actually I think that this line is buggy: an empty record pattern is coherent with any other record pattern, not just empty ones.

trefis · 2018-01-26T14:31:39Z

The check could be made stricter in various places, and it would probably be better. It wasn't done because it wasn't "necessary". There is a lot of room for improvement.

For the empty record pattern, I disagree with you.
Currently an empty record pattern won't typecheck anyway, so the question is hypothetical. But assuming we allow them, I would expect an empty record pattern to match values of an empty record type, and that's it.
Perhaps that's not how things will happen, perhaps it will indeed mean "I know I'm getting a record, but I don't care about its content". But saying "this line is buggy" seems a bit strong given the current situation.
If you want, you could map Tpat_record ([], _), _ | _, Tpat_record ([], _) to assert false and we'll revisit in due time. But I think the current situation is probably fine/better.

gasche · 2018-01-26T14:34:45Z

typing/parmatch.ml

+    (* only omegas on the column: the column is coherent. *)
+    true
+  | discr_pat ->
+    List.for_all (coherent_heads discr_pat) column


Note: this line is wrong if we accept the empty record pattern at any record type (because then the relation is not transitive anymore).

gasche · 2018-01-26T15:21:26Z

We did another review of the PR with @maranget, with the merge to the 4.06 branch in mind. We have carefully reviewed and believe that the PR is correct. The review turned out two things:

Matching and coherence checks on (non-empty) records could be refined. This does not require any change for 4.06, but it is a change we could consider for trunk
The handling of empty record patterns in the coherence check is wrong. This is of no consequence for the compiler given that empty record patterns are rejected by the type-checker anyway, so the PR is safe, but it would still be nicer to remove the faulty line or fix it ((Tpat_record ([], _), Tpat_record _) | (Tpat_record _, Tpat_record ([], _))). It probably wasn't a very good idea to introduce empty records in the code in the first place.

gasche · 2018-01-26T15:24:17Z

@trefis, would you consider submitting a rebased version of this PR against the 4.06 branch?

trefis · 2018-01-26T15:41:12Z

@trefis, would you consider submitting a rebased version of this PR against the 4.06 branch?

Will do.
Of course this GPR is based on top of the big rewrite of parmatch, which I assume we're not going to backport, so it'll have to be adapted. You'll have to carefully review one more time.

gasche · 2018-01-26T16:31:22Z

I'm not sure which way is best, but it may be necessary to try to rebase to see how well it goes before making a decision...

cherry-picked from b573eaa (ocaml#1550)

cherry-picked from f5619dc (ocaml#1550)

cherry-picked from ad91c46 (ocaml#1550)

Also extends the testsuite. originally cherry-picked from fa5a4bb (ocaml#1550), with a careful rebase by Thomas Refis to account for the fact that the refactoring GPR#1448, which the original patch relied on, is not present in the 4.06 branch.

Co-authored-by: Paul-Elliot <peada@free.fr> Co-authored-by: Christine Rose <christinerose@users.noreply.github.com>

gasche approved these changes Jan 1, 2018

View reviewed changes

gasche mentioned this pull request Jan 1, 2018

Ambiguous pattern variables: add negative rows #1552

Merged

trefis added a commit to trefis/ocaml that referenced this pull request Jan 2, 2018

parmatch: in satisfiable check coherence of (pss + qs)

2915db2

As per Gabriel's remark on GPR ocaml#1550

gasche reviewed Jan 2, 2018

View reviewed changes

trefis added 3 commits January 3, 2018 11:23

parmatch: add a test showing omissions of the fragile warning

b573eaa

parmatch: more robust compatibility check

f5619dc

parmatch: more robust tuple matching

ad91c46

trefis force-pushed the parmatch-column-coherence branch from ea42840 to 83a4701 Compare January 3, 2018 11:44

trefis added a commit to trefis/ocaml that referenced this pull request Jan 3, 2018

Changes for GPR ocaml#1550

f8454e6

trefis added 2 commits January 3, 2018 13:32

parmatch: explicitely check coherence of first column

fa5a4bb

Also extends the testsuite.

Changes for GPR ocaml#1550

9185e77

trefis force-pushed the parmatch-column-coherence branch from 83a4701 to 9185e77 Compare January 3, 2018 13:32

trefis merged commit 64fb4f5 into ocaml:trunk Jan 3, 2018

trefis deleted the parmatch-column-coherence branch January 3, 2018 13:33

This was referenced Jan 3, 2018

parmatch.ml: refactor matrix_stable_vars #1555

Merged

Pattern matching refactoring #1560

Merged

gasche reviewed Jan 26, 2018

View reviewed changes

trefis mentioned this pull request Jan 29, 2018

Backport GPR#1550 on 4.06 #1587

Closed

gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

parmatch: add a test showing omissions of the fragile warning

957916f

cherry-picked from b573eaa (ocaml#1550)

gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

parmatch: more robust compatibility check

128275f

cherry-picked from f5619dc (ocaml#1550)

gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018

parmatch: more robust tuple matching

e6b827b

cherry-picked from ad91c46 (ocaml#1550)

vicuna mentioned this pull request Mar 14, 2019

assertion failure exclusively with 4.06.1+rc1 #7713

Closed

EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024

Add changelog for Odoc 2.3.0 (ocaml#1550)

3fe2b31

Co-authored-by: Paul-Elliot <peada@free.fr> Co-authored-by: Christine Rose <christinerose@users.noreply.github.com>

Conversation

trefis commented Dec 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A note on pattern typing

Limitations of the current coherence check

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gasche Jan 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gasche commented Jan 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trefis commented Jan 2, 2018

Uh oh!

gasche commented Jan 2, 2018

Uh oh!

trefis commented Jan 2, 2018

Uh oh!

gasche commented Jan 2, 2018

Uh oh!

trefis commented Jan 3, 2018

Uh oh!

gasche commented Jan 3, 2018

Uh oh!

trefis commented Jan 3, 2018

Uh oh!

gasche commented Jan 3, 2018

Uh oh!

trefis commented Jan 3, 2018

Uh oh!

gasche commented Jan 21, 2018

Uh oh!

gasche Jan 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trefis commented Jan 26, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trefis commented Dec 29, 2017 •

edited

Loading

gasche Jan 1, 2018 •

edited

Loading

gasche Jan 26, 2018 •

edited

Loading

gasche commented Jan 26, 2018 •

edited

Loading

trefis commented Jan 26, 2018 •

edited

Loading