Skip to content

Parmatch column coherence#1550

Merged
trefis merged 5 commits intoocaml:trunkfrom
trefis:parmatch-column-coherence
Jan 3, 2018
Merged

Parmatch column coherence#1550
trefis merged 5 commits intoocaml:trunkfrom
trefis:parmatch-column-coherence

Conversation

@trefis
Copy link
Copy Markdown
Contributor

@trefis trefis commented Dec 29, 2017

Spin off of #1518 following discussions with Gabriel and Luc.

This version of the patch introduce an explicit coherence check at various points in parmatch.
There is a somewhat lengthy description of the check in the code itself, so I won't go into the details here (improvements to that documentation are very welcome though!).

A perhaps more readable version of the diff (as it ignores whitespaces changes) on parmatch.ml can be seen here.

A note on pattern typing

Currently patterns are typechecked from "left to right", that is

type _ t = I : int t | C : char t

let f (type a) (t : a t) (a : a) =
  match t, a with
  | I, 3 -> ()
  | C, 'a' -> ()

is well typed, but the following is not:

let f (type a) (t : a t) (a : a) =
  match a, t with
  | 3, I -> ()
  | 'a', C -> ()

From the point of view of typing alone, one could imagine lifting that restriction at some point in the future and allowing both versions to typecheck.

In pratice however, in their current state, neither parmatch nor matching would survive such a change.

Incidentally Parmatch.every_satisfiable was reversing the order of columns between being called and calling satisfiable.
You'll notice the List.rev, hidden in the commit introducing the coherence check, fixing that problem.

Limitations of the current coherence check

In various places I give a brief explanation of what can happen when we accept a column as coherent while it is in fact ill-typed.
The potential outcomes include false positives and negatives.

However:

  • for the checks which have an impact on the soundness (i.e. exhaustivity and usefulness) we always error on the safe side: we never say something is exhaustive or unused when it in fact isn't.
  • the proposed coherence check catches all the cases (and some more) which resulted in assertion failures/fatal errors in parmatch. The issues that are not detected by the current check were already silently accepted before (see for example the test added by the first commit whose result is left unchanged by the other patches).

when an "incoherence" is not detected by this check.
*)

let first_column_is_coherent matrix =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a detail, but I think it would be slightly nicer to have a coherent_heads h1 h2 binary function that checks whether two discriminant heads are coherent with each other, and expose this function in the module -- I think it may be of use later. Then you can implement the coherence check with a fold_left, or with this fairly amusing trick

let coherent_column = function
| [] -> true
| r::rs ->
  List.for_all2 (fun (h1, _) (h2, _) -> coherent_heads h1 h2)
    (r::rs) (rs @ [r])

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your trick is cute, but:

  • I think mine is slightly more efficient
  • I don't think yours works: what happens if the column is (1 _ 'a' _)?

As for the coherent_heads function, you just want my local same_kind lifted out (making its "discr_pat" a parameter)?
There is no need for it yet, I'd rather keep things as is and make the function globally available only when (if ever) we happen to need it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the trick is wrong for the reason you give, the relation is not transitive. This raises a question: is it actually the case that only checking coherence of the first non-any pattern with all others is enough? Should we test coherence between all couples of patterns? Or is the relation transitive on all non-any patterns?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actually the case that only checking coherence of the first non-any pattern with all others is enough?

Yes.
(the proof is left as an exercise for the careful reviewer)

| Const_nativeint _, Const_nativeint _
| Const_float _, Const_float _
| Const_string _, Const_string _ -> true
| _, _ -> false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be made non-fragile, by listing the alternatives at least in the first one?

incompatibility: clearly this is fine.
- if we end up returning [true] then we're saying that [qs] is useful while
it is not. This is sad but not the end of the world, we're just allowing dead
code to survive.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the correct check is not of whether pss is coherent, but whether pss followed by qs is coherent. See remarks below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems right. I changed all such places.

satisfiable pss (simple_match_args p omega @ qs))
constrs
else
if not (first_column_is_coherent pss) then (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here q is omega, so coherent pss is the same as coherent (q :: pss), and this test is correct -- but a comment could point that out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to feel like we have enough comments that really simple ones (like this one would be) can actually be omitted (I believe that, as is, one can fairly quickly recover the reason for not checking q :: pss)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code, your choice.

let q0 = discr_pat q pss in
satisfiable (build_specialized_submatrix ~extend_row:(@) q0 pss)
(simple_match_args q0 q @ qs)
if not (first_column_is_coherent pss) then (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think that we should check first_column_is_coherent (q0 :: pss) -- that's the idea anyway, I suspect that the types don't match and that the API would have to change slightly.

(list_satisfying_vectors
(build_specialized_submatrix ~extend_row:(@) q0 pss)
(simple_match_args q0 q @ qs))
if not (first_column_is_coherent pss) then (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again we should check also if q0 is coherent with the rest.

matrix are stable. *)
List.fold_left (fun acc (_, { varsets; _ }) ->
List.fold_left IdSet.union acc varsets
) IdSet.empty rs
Copy link
Copy Markdown
Member

@gasche gasche Jan 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have worked on a better implementation of this on my own (a distinguished All value), as I thought that this would be outside the scope of your PR. I hope to send a PR for this soon.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now pushed as 63ed2cc, please take a look, feel free to ask for changes, and eventually to integrate into the present PR if you wish to -- I can rebase my own on top of the final version of this commit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of your implementation seems nice indeed (I haven't actually reviewed the actual code just yet), but IMO they are independent (which is a good thing) and should be reviewed and merged separately (that is, I don't want to "merge" the GPRs together before actually merging into trunk).

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 1, 2018

Approved in general, but see comments.

trefis added a commit to trefis/ocaml that referenced this pull request Jan 2, 2018
(* The handling of incoherent matrices is kept in line with
[satisfiable] *)
if not (first_column_is_coherent pss) then
if not (first_column_is_coherent ((uq, make_row rem) :: pss)) then
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could write (all_coherent (first_column pss)) and (all_coherent (uq :: first_column pss)) to avoid having to come up with the rest of the row that we don't actually care about?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's what you meant by "the API might need some adjusting"!

Yes, I too find my current implementation a bit distasteful and yours seems prettier. But I'm also worried that yours means iterating over the rows once more, and allocating a fresh list with as many elements as there are rows, whereas with the current implementation we don't need to iterate and don't allocate much.

I'm tempted to keep my implementation because of these (probably unwarranted) performance concerns. But I'll welcome your input on that issue and am ready to change my mind if you feel differently.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that the performance concern is unwarranted (the compiler already allocates, this case of the function already allocates, and we have no latency constraints), but from your answers regarding matrix_stable_vars it looks like I should send a cleanup PR after this one is merged anyway, so I don't mind proposing some API changes at the same time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, sounds good to me :)
I will leave things as is for now then.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 2, 2018

Btw @gasche , can you suggest/push a Changes entry?

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 2, 2018

Changes, in section "Working version > Internal/compiler-libs changes"

- GPR#1550: make pattern-matching warnings more robust to ill-typed columns
  (Thomas Refis, with help from Gabriel Scherer and Luc Maranget)

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 2, 2018

Thanks, I added the Changes entry (but I put it in the bugfixes section).

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 2, 2018

If you're not interested in (1) being perfectionist over the coherence API in this PR or (2) removing the matrix_stable_vars part, then I think there is nothing left to change/discuss for the PR. My approval still stand, please rebase as you feel is appropriate and merge (or ask me to if you prefer).

@trefis trefis force-pushed the parmatch-column-coherence branch from ea42840 to 83a4701 Compare January 3, 2018 11:44
trefis added a commit to trefis/ocaml that referenced this pull request Jan 3, 2018
@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 3, 2018

I rebased, squashed a few of the commits together, and added a "style commit" on top doing the API changes proposed (i.e. the split of first_column_coherence into first_column and all_coherent).
I also renamed my local same_kind (in all_coherent) to coherent_heads which you seemed to like better.

As for the matrix_stable_vars part, I didn't change it, I still think it's better left as part of your GPR.

N.B. I left the style commit on its own so it's easier to review, but I think it should eventually be squash into the "explicitly check coherence of first column" one. I'll be happy to squash it and merge the PR in trunk once you've read it and confirmed one last time.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 3, 2018

Yes, I think the API change makes for nicer code, thanks!

One minor question: all_coherent (q :: first_column pss) raises the question of what is the domain of all_coherent (and thus of the coherent_heads function): does it expected discriminators, as always returned by simplify_first_col, or arbitrary "simplified patterns" (without Tpar_var and Tpat_or)? If I read the code correctly, q above may be a general simple pattern rather than discriminators (its sub-arguments are non-omegas), so I think the natural answer is "simplified patterns" -- which suggests that the name suggestions I made is not the best, but oh well.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 3, 2018

does it expected discriminators, as always returned by simplify_first_col, or arbitrary "simplified patterns" (without Tpar_var and Tpat_or)? If I read the code correctly, q above may be a general simple pattern rather than discriminators (its sub-arguments are non-omegas), so I think the natural answer is "simplified patterns" -- which suggests that the name suggestions I made is not the best, but oh well.

Indeed just "simplified" patterns, as documented.
Note: I didn't know about the name "discriminators", we usually use "normalized" in parmatch (and I believe matching) to mean "where the arguments have been replaced by omega".
In any case, I think the name of the helper function is fine: it only looks at the head of the patterns.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 3, 2018

Feel free to merge.

@trefis trefis force-pushed the parmatch-column-coherence branch from 83a4701 to 9185e77 Compare January 3, 2018 13:32
@trefis trefis merged commit 64fb4f5 into ocaml:trunk Jan 3, 2018
@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 3, 2018

Thanks, I squashed the style commit and merged.

@trefis trefis deleted the parmatch-column-coherence branch January 3, 2018 13:33
@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 21, 2018

We may need to get this in 4.06 anyway, see MPR#7713.

| Tpat_constant(c1), Tpat_constant(c2) -> const_compare c1 c2 = 0
| Tpat_tuple _, Tpat_tuple _ -> true
| Tpat_lazy _, Tpat_lazy _ -> true
| Tpat_record _ , Tpat_record _ -> true
Copy link
Copy Markdown
Member

@gasche gasche Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we could check Array.length lbl1.lbl_all = Array.length lbl2.lbl_all here as you do for the coherence check (or some richer check in general). (Currently the tuples are checked but not the records.) If one of the two pattern is empty (no label), they may match.

end
| Tpat_tuple l1, Tpat_tuple l2 -> List.length l1 = List.length l2
| Tpat_record ((_, lbl1, _) :: _, _), Tpat_record ((_, lbl2, _) :: _, _) ->
Array.length lbl1.lbl_all = Array.length lbl2.lbl_all
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, why are we checking the length only and not the lbl_all value itself?

Array.length lbl1.lbl_all = Array.length lbl2.lbl_all
| Tpat_any, _
| _, Tpat_any
| Tpat_record ([], _), Tpat_record ([], _)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think that this line is buggy: an empty record pattern is coherent with any other record pattern, not just empty ones.

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 26, 2018

The check could be made stricter in various places, and it would probably be better. It wasn't done because it wasn't "necessary". There is a lot of room for improvement.

For the empty record pattern, I disagree with you.
Currently an empty record pattern won't typecheck anyway, so the question is hypothetical. But assuming we allow them, I would expect an empty record pattern to match values of an empty record type, and that's it.
Perhaps that's not how things will happen, perhaps it will indeed mean "I know I'm getting a record, but I don't care about its content". But saying "this line is buggy" seems a bit strong given the current situation.
If you want, you could map Tpat_record ([], _), _ | _, Tpat_record ([], _) to assert false and we'll revisit in due time. But I think the current situation is probably fine/better.

(* only omegas on the column: the column is coherent. *)
true
| discr_pat ->
List.for_all (coherent_heads discr_pat) column
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this line is wrong if we accept the empty record pattern at any record type (because then the relation is not transitive anymore).

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 26, 2018

We did another review of the PR with @maranget, with the merge to the 4.06 branch in mind. We have carefully reviewed and believe that the PR is correct. The review turned out two things:

  • Matching and coherence checks on (non-empty) records could be refined. This does not require any change for 4.06, but it is a change we could consider for trunk
  • The handling of empty record patterns in the coherence check is wrong. This is of no consequence for the compiler given that empty record patterns are rejected by the type-checker anyway, so the PR is safe, but it would still be nicer to remove the faulty line or fix it ((Tpat_record ([], _), Tpat_record _) | (Tpat_record _, Tpat_record ([], _))). It probably wasn't a very good idea to introduce empty records in the code in the first place.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 26, 2018

@trefis, would you consider submitting a rebased version of this PR against the 4.06 branch?

@trefis
Copy link
Copy Markdown
Contributor Author

trefis commented Jan 26, 2018

@trefis, would you consider submitting a rebased version of this PR against the 4.06 branch?

Will do.
Of course this GPR is based on top of the big rewrite of parmatch, which I assume we're not going to backport, so it'll have to be adapted. You'll have to carefully review one more time.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 26, 2018

I'm not sure which way is best, but it may be necessary to try to rebase to see how well it goes before making a decision...

gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018
gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018
gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018
gasche pushed a commit to gasche/ocaml that referenced this pull request Feb 2, 2018
Also extends the testsuite.

originally cherry-picked from fa5a4bb
(ocaml#1550), with a careful rebase by Thomas Refis to account for the fact
that the refactoring GPR#1448, which the original patch relied on, is
not present in the 4.06 branch.
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Co-authored-by: Paul-Elliot <peada@free.fr>
Co-authored-by: Christine Rose <christinerose@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants