The Pattern-Matching Bug: introduce a temporality heuristic to de-pessimize certain programs by gasche · Pull Request #13154 · ocaml/ocaml

gasche · 2024-05-07T15:31:03Z

(This PR sits on top of #13152 and #13153; for now it is best reviewed by looking only at the latest commit.)

#13152 pessimizes Total matches in transitively mutable position. It gives two examples of code that now generates worse code:

(* case (A) *)
let f : bool option ref -> ... = function
| { contents = None } -> ...
| { contents = Some true } -> ...
| _ when guard () -> ...
| { contents = Some false } (* here *) -> ...

type _ gadt = Int : int t | Bool : bool t

(* case (B) *)
let g : int gadt ref -> ... = function
| { contents = Int } (* here *) -> ...

In the case A, the generated code first checks something about the argument root.0, then it checks a guard, then it goes back to the argument root.0. (We say that there are two "splits", resulting in three submatrices, that are checked in sequence.) When it goes back to root.0, its value could have changed in the meantime, so assuming that root.0.0 could only be false is wrong, and we really needed to make the code worse.

In the case B, the generated code reads the mutable field root.0, then immediately checks all possible cases on this constructor. In this case there cannot have been any mutation "in the meantime", this is the first time ever that we are checking this value. So generating a Match_failure case here was not necessary, and we are generating worse code than we could.

In general it is hard to know for sure "this is the first time we do something on this value". Tracking this precisely might be done with more context information, but it would be a tricky change with quickly diminishing returns. There is however a case where it is very easy to see that we are in this situation: where we have not generated any "split", any logic of the form "check this about the first value, and if that fails check that".

The present PR introduces a new piece of static context information used by the pattern-matching compiler to track this criterion. In code:

type temporality =
  | First
  | Following
(** The [temporality] information tracks information about the
    placement of the current submatrix within the
    whole pattern-matching.

    - [First]: this is the first submatrix on this position seen by values
      that flow into the submatrix.
    - [Following]: there was a split, some other submatrix was tried first
      and failed, and the control jumped to the current submatrix.

    This information is used in {!compute_arg_partial}.
*)

When deciding whether a switch on a given position should be pessimized, we now use a refined criterion: we are in a transitively mutable position and the current temporality is Following, we have already accessed this position before.

In particular, the example g above now produces good code again, as well as any reasonable usage of GADTs in mutable positions that I can think of.

It is, of course, possible to build examples where the heuristic is too coarse-grained and which will generate worse code than they could. For example:

type _ gadt = Int : int t | Bool : bool t

let test : int gadt ref -> unit = function
  | _ when Random.bool () -> ()
  | { contents = Int } -> ()

ncik-roberts

The approach looks correct. In a survey of a large codebase, I found only one instance where this PR de-pessimizes a program -- I would consider arguing against this change if it significantly complicated the matching.ml code, but given that the change is clear and well-structured, I'm in favor of merging it.

lambda/matching.ml

gasche · 2024-07-31T19:32:39Z

One reason I like this heuristic is that it helps me reason about "structural matches" that do not contain splits -- all possible cases are explored in full. For example:

function
| None -> 0
| Some {contents = A} -> 1
| Some {contents = B _} -> 2
| Some {contents = C _} -> 3

(assuming the type-checker finds this example exhaustive: A, B of ..., C are the only valid constructors at the specific type involved -- it could be a GADT with other, incompatible constructors.)

For this fragment of pattern-matching programs, the type-checker exhaustiveness analysis is always correct, and the temporality heuristic guarantees that we generate good code.

gasche · 2024-07-31T20:06:51Z

Rebasing this on top of #13338 (transitively) made the implementation quite a bit less invasive, because I can add the temporality as an extra field to the partiality record instead of passing it along as before.

gasche · 2024-07-31T20:33:48Z

testsuite/tests/match-side-effects/check_partial.ml

+                          (opaque *match*/298))
+                        *match*/298)))
+                *match*/305 =o (field_mut 0 (field_imm 1 param/297)))
+               (if (isint *match*/305) (if *match*/305 12 (exit 3)) (exit 3)))))


I did not understand why this test case changed -- there was a Match_failure case before and there is still one after, so what did the temporality change in the code? This is much easier to see in the -drawlambda output for this test:

(setglobal Test! (let (lazy_needs_partial/274 = (function param/276 : int (catch (let (*match*/279 =a (field_imm 0 param/276)) (catch (let (*match*/280 =a (field_imm 1 param/276) *match*/281 =o (field_mut 0 *match*/280)) - (if (isint *match*/281) (if *match*/281 (exit 2) 0) - (exit 1))) + (switch* *match*/281 case int 0: 0 + case int 1: (exit 2))) with (2) (let (*match*/288 = (let (lzarg/282 = *match*/279 tag/283 =a (caml_obj_tag lzarg/282)) (if (== tag/283 250) (field_mut 0 lzarg/282) (if (|| (== tag/283 246) (== tag/283 244)) (apply (field_imm 1 (global CamlinternalLazy!)) (opaque lzarg/282)) lzarg/282))) *match*/289 =a (field_imm 1 param/276) *match*/290 =o (field_mut 0 *match*/289)) (if (isint *match*/290) (if *match*/290 12 (exit 1)) (exit 1))))) with (1) (raise (makeblock 0 (global Match_failure/20!) [0: "test.ml" 6 49]))))) (makeblock 0 lazy_needs_partial/274)))

This is about the way that the check (_, {contents = True}) is compiled in the first clause. The GADT type declaration is as follows:

type _ t = | Int : int -> int t | True : bool t | False : bool t

Before the compiled would do first an isint test to rule out Int, and then an if test to distinguish True from False. But with this commit, the compiler now trusts the totality information that tells it that the Int case is impossible, and generates a simpler check to differentiate True from False. (There is a similar check down below, after the lazy () pattern, and that ones is pessimized as it should.)

lpw25

Approving on the basis of @ncik-roberts review

gasche · 2024-08-22T14:45:19Z

Thanks! The present PR sits on top of #13341, so I should wait for that PR to be merged before merging the present one.

…DT+mutable combinations

gasche · 2024-09-12T15:20:40Z

This PR can be merged now that #13341 has been approved and merged. I am planning to do this after/if the CI passes.

The Pattern-Matching Bug: introduce a temporality heuristic to de-pessimize certain programs (cherry picked from commit 9e72506)

gasche mentioned this pull request May 7, 2024

Pattern matching with mutable and lazy patterns is unsound #7241

Closed

gasche force-pushed the matching-bug-temporality-heuristic branch 4 times, most recently from 4dbadd4 to a3020d5 Compare May 13, 2024 15:31

gasche mentioned this pull request May 13, 2024

The Pattern-Matching Bug: propagate mutability of argument positions #13138

Merged

ncik-roberts mentioned this pull request Jun 13, 2024

The Pattern-Matching Bug: fix totality information #13152

Merged

gasche self-assigned this Jun 14, 2024

gasche mentioned this pull request Jul 30, 2024

a simple, short refactoring of lambda/Matching.do_compile_matching #13342

Merged

ncik-roberts approved these changes Jul 31, 2024

View reviewed changes

lambda/matching.ml Outdated Show resolved Hide resolved

lambda/matching.ml Outdated Show resolved Hide resolved

gasche force-pushed the matching-bug-temporality-heuristic branch 2 times, most recently from d727011 to 47c16ae Compare July 31, 2024 20:05

gasche force-pushed the matching-bug-temporality-heuristic branch 2 times, most recently from 73fef1d to 522abfb Compare July 31, 2024 20:26

gasche commented Jul 31, 2024

View reviewed changes

gasche force-pushed the matching-bug-temporality-heuristic branch 3 times, most recently from 64e24b5 to e5e2b95 Compare August 1, 2024 07:52

gasche added maintainer-approval-needed pattern-matching enhancement labels Aug 1, 2024

lpw25 approved these changes Aug 22, 2024

View reviewed changes

gasche force-pushed the matching-bug-temporality-heuristic branch 2 times, most recently from 29a02bb to 9392e14 Compare August 24, 2024 09:15

gasche mentioned this pull request Sep 12, 2024

The Pattern-Matching Bug: a disabled-by-default warning on unexpectedly-partial matches #13341

Merged

matching: introduce a 'temporality' heuristic to de-pessimize some GA…

94a8158

…DT+mutable combinations

gasche force-pushed the matching-bug-temporality-heuristic branch from 9392e14 to 94a8158 Compare September 12, 2024 15:18

gasche added the merge-me label Sep 12, 2024

gasche merged commit 9e72506 into ocaml:trunk Sep 12, 2024

gasche added a commit that referenced this pull request Sep 12, 2024

Merge pull request #13154 from gasche/matching-bug-temporality-heuristic

4e79646

The Pattern-Matching Bug: introduce a temporality heuristic to de-pessimize certain programs (cherry picked from commit 9e72506)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Pattern-Matching Bug: introduce a temporality heuristic to de-pessimize certain programs#13154

The Pattern-Matching Bug: introduce a temporality heuristic to de-pessimize certain programs#13154
gasche merged 1 commit intoocaml:trunkfrom
gasche:matching-bug-temporality-heuristic

gasche commented May 7, 2024 •

edited

Loading

Uh oh!

ncik-roberts left a comment

Uh oh!

Uh oh!

Uh oh!

gasche commented Jul 31, 2024 •

edited

Loading

Uh oh!

gasche commented Jul 31, 2024

Uh oh!

gasche Jul 31, 2024

Uh oh!

lpw25 left a comment

Uh oh!

gasche commented Aug 22, 2024

Uh oh!

gasche commented Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gasche commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ncik-roberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gasche commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jul 31, 2024

Uh oh!

gasche Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

lpw25 left a comment

Choose a reason for hiding this comment

Uh oh!

gasche commented Aug 22, 2024

Uh oh!

gasche commented Sep 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gasche commented May 7, 2024 •

edited

Loading

gasche commented Jul 31, 2024 •

edited

Loading