matching: use polymorphic variants to classify general/half_simple/simple patterns by trefis · Pull Request #9361 · ocaml/ocaml

trefis · 2020-03-11T13:27:17Z

Another one-commit-PR extracted from #9321 .

Note: we have tried an alternative to this, which was to make the "intermediary types" (i.e. the ones representing half-simplified and simplified patterns) abstract.
But it didn't turn out as nice as this.

lambda/matching.ml

trefis · 2020-03-17T09:35:31Z

I discussed this offline with @Octachron last week, and he does not seem particularly excited by the use of polymorphic variants.
He's implemented an alternative to this PR, which seems to handle our needs.

Having just read over both versions of matching.ml (Octachron's one, and the one in this PR) I have a slight preference for the version with polymorphic variant; I might of course be biased, but I do not think so:

I was initially reluctant myself when @gasche proposed it
polymorphic variants are perfectly adapted to the problem at hand (more so than phantom encodings)
reading precompile_or written using polymorphic variant gives me a weirdly satisfying feeling

That said, I will let @Octachron present his grievances, and defend the merits of his approach.

Octachron · 2020-03-17T11:06:52Z

My main grievance is that the polymorphic variant types are not used that much in this PR in fine.

My view of this PR is that it refines the pattern type family in order to both expose the subtyping relationship between the family members and precise their "decomposition" (for lack of better or).

The first part can be done without polymorphic variants (as seen in my toy implementation), so the real advantages of polymorphic variants comes from the second part.

However, most of the PR improvement comes from exposing the first part: the subtyping relationship makes it possible to have a generic General.erase that works on all type in the family.

The precise decomposition of polymorphic variant is used exactly thrice in the PR. Two of those are an effect of making try_not_or unnecessary which is certainly nice. Unfortunately, everywhere else, we end up with a pattern of first refining a pattern, doing a small amount of works on the refined pattern and then we are blocked because the next step cannot handle the refined types so we need to undo our refining work.

I ended up looking to the overview pattern matching PR to see if it is only a short growing pain. However, even there, it is not that clear: the pattern of refine/erase is still present.

In other words, I am paradoxically stating that the lack of ambition in this PR makes it harder to review. And I am not fully convinced that this back-and-forth is only a growing pain, even if polymorphic variants does seem to be a natural fit.

Overall, I think that the balance still tip in favor of this PR, but by a far thinner margin than I expected when I started the review.

gasche · 2020-03-17T13:28:02Z

I don't like @Octachron's proposal for two reasons:

It feels more complex. We are encoding a subtyping relation using private synonyms within phantom type parameters. We know that this corner of the type system can express rich subtyping relations, but it does not mean that the result is pretty. I personally find simple half t or non_simple half t to be on the confusing side.
Most importantly, it provides less static guarantees. As with all solutions based on phantom types, the safety comes from the module type boundary: code outside the boundary is safe provided that the interface invariants are properly satisfied, and code inside the boundary has to be trusted, it is never checked for safety. On the contrary, our polymorphic-variant approach performs a precise static analysis in the language directly, instead of being encoded in an axiomatized form in the module interface. (This is the same reason why GADTs are typically safer and better than phantom types.)

Compare for example:

val explode : Half_simple.pattern -> Ident.t list -> Simple.clause list -> Simple.clause list

(* polymorphic variant version -- slightly reworded from the PR *)
let rec explode p aliases rem =
  let explode_p p aliases rem = explode (General.view p) aliases rem in
  match p.pat_desc with
  | `Or (p1, p2, _) -> explode_p p1 aliases (explode_p p2 aliases rem)
  | `Alias (p, id, _) -> explode_p p (id :: aliases) rem
  | `Var (id, str) ->
      let env = mk_alpha_env arg (x :: aliases) vars in
      ((omega, patl), mk_action ~vars:(List.map snd env)) :: rem
  | #simple_view as view ->
      let p = { p with pat_desc = view } in
      let env = mk_alpha_env arg aliases vars in
      ((alpha env p, patl), mk_action ~vars:(List.map snd env)) :: rem

(* phantom type versions *)
let rec explode p aliases rem =
   match p.pat_desc with
   | Tpat_or (p1, p2, _) -> explode p1 aliases (explode p2 aliases rem)
   | Tpat_alias (p, id, _) -> explode p (id :: aliases) rem
   | Tpat_var (x, _) ->
       let env = mk_alpha_env arg (x :: aliases) vars in
       ((omega, patl), mk_action ~vars:(List.map snd env)) :: rem
   | _ ->
       let env = mk_alpha_env arg aliases vars in
       ((alpha_pat env p, patl), mk_action ~vars:(List.map snd env)) :: rem

In the polymorphic version, we know statically that the #simple_view as view matches only simple pattern descriptions, and we retype p with this finer-grained type information. If we had forgotten one of the non-simple cases above, we would get a type error. With the phantom type version, we have to trust this code (which sits within the abstraction boundary), and forgetting a case would result in a silent breach of static enforcement.

There are more such code traversals in matching.ml and parmatch.ml, that can benefit from the static safety of the polymorphic-variant version. Most of these functions should not be pushed within the abstraction boundary. One could define a generic "non-simple decomposition" higher-order function that would sit inside the abstraction, to be used by those traversals outside, but then again this would result in more complex code.

I ended up looking to the overview pattern matching PR to see if it is only a short growing pain. However, even there, it is not that clear: the pattern of refine/erase is still present.

Sure, there is a frontier between the static-discipline-aware code (that knows which patterns are (half)simple in its inputs and outputs) and the unaware code. Whenever we call an unaware helper function (for example Typedtree.alpha_pat), there is some unpleasant ceremony to cross the frontier (typically an erasing coercion on the way out, and an unsafe cast on the way back out; or duplication of the code within the frontier). Your approach slightly reduces this ceremony because fine-grained patterns are subtypes of ordinary patterns, but it does not remove it. We could also reduce it by pushing more pattern-processing code to be discipline-aware, or by changing the Typedtree pattern definition to be a polymorphic variant from the start (so that we regain the erasure-is-subtyping convenience). I am not in favor of the latter (I would rather keep the entry cost of polymorphic variants only in the places where they are really useful), and I think that the former will come over time. Still during the transition period (which is, indeed, not finished by the end of our current PRs) I prefer our approach to yours, because it provides more type-safety.

lambda/matching.ml

Octachron

It seems indeed more natural to let the type-level computations to the type checker rather than to encode them as admitted axioms in the module interface.

My remaining remarks are completely minor.

Like I said, the subtyping and the removal of try_no_or are certainly nice, thus the PR seems good to go for me.

…mple patterns Also: these types are now instances of Typedtree.pattern_data.

gasche · 2020-03-19T17:24:58Z

@trefis I rebased this PR on top of trunk (most of the work was coming from changes to previous PRs in the series). It's now up to you to decide when to merge (assuming CI is green), and to shoot the next round.

trefis · 2020-03-20T13:12:56Z

Thanks!

I'll probably send the next PR early next week.

trefis assigned Octachron Mar 11, 2020

trefis added the no-change-entry-needed label Mar 11, 2020

Octachron reviewed Mar 12, 2020

View reviewed changes