Skip to content

non-exhaustive fix for the Pattern Matching Bug, take two#12556

Closed
gasche wants to merge 8 commits intoocaml:trunkfrom
gasche:matching-bug-strict-bindings-pr
Closed

non-exhaustive fix for the Pattern Matching Bug, take two#12556
gasche wants to merge 8 commits intoocaml:trunkfrom
gasche:matching-bug-strict-bindings-pr

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Sep 14, 2023

This PR implements the "alternative approach of binding mutable fields" earlier to fix the "incorrect context information" half of the Pattern Matching Bug #7241. This approach was jointly designed with @trefis. See the detailed description of the situation in #12555, which implements another fix for the same issue which I prefer.

This PR is larger than #12555, it contains the following:

  • a new testsuite file specifically designed to study the impact on generated code of this change
  • the bugfix itself, which works by special-casing intermediate arguments that have "let strictness" Strict or StrictOpt
  • refactorings to the pattern-matching compiler to remove some other sources of Strict and StrictOpt bindings, to clarify and simplify the codebase ; without this simplification, I personally find hard to convince myself that "the bugfix itself" actually fixes all instance of the issue.
  • finally, an optimization to push the "strict bindings" inside the branches in some cases where detect that it can avoid useless reads ; this reuses existing lower_bind logic from the pattern-matching compiler. (I explicitly decided against being more aggressive than the current optimization.)

The testsuite is updated at each commit, and looking at its diff for a given commit can be a good way to see the impact of code-generation changes in the commit.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 14, 2023

As I explain in #12555, these two PRs are alternative approaches to fix the same bug, and I prefer #12555. (I was not planning to submit this one but @smuenzel expressed interest.)

In particular, my intuition is that #12555 could be merged without specific benchmarking, because it does not change the generated code in the most common cases. This one changes the generated code for any usage of records with mutable fields, in a way that could be beneficial, neglectible or negative. I don't have the energy right now to run an experimental evaluation of this.

I would personally be in favor of going with #12555, and then fixing the "partiality information" issue (my fix for that is more complex than the two we are currently discussing, and will require more review time and energy) to fix #7241 for good. Then, if there is still appetite for fine-grained discussions of the code generation strategy for records with mutable fields, we can revisit the present PR and merge it later.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Sep 14, 2023

In #12555 (comment), @smuenzel expressed himself in favor of the present approach ("strict bindings" rather than "context erasure"):

I think [useless mutable reads are] not as bad as it sounds. The read is not required to be completed until we actually need the data (in this case we don't), so, assuming a normal out-of-order processor, we only consume memory bandwidth, but do not block subsequent operations. We would only be limited once we hit either our maximum memory bandwidth, or the maximum number of reads we can track. (we could also end up using extra registers/stack space, but I think the user would expect that if they have very complex matches)

The "Bind mutable fields earlier" feels a little more principled -- we're examining a "snapshot" of the variables taken at the time the match scrutinee is evaluated. This means it's much easier to reason about the results -- no "Match_failure" is required, and we can ensure that we retain the full power of exhaustiveness analysis, allowing even strangely written code to prevent having additional exception paths.

A note on the second paragraph: I like the idea of the "snapshot" semantics, as discussed earlier (in another monster comment from last June, #7241 (comment) ) -- the idea is that the pattern-matching code behaves as this the value was read in full at some point, and no part were re-evaluated after that. But it is important to note that the present PR only gets us closer to this snapshot semantics in some cases, and does not provide it in general. In particular, consider the two programs below, the first one will behave according to the "snapshot semantics", but the second will not -- it is currently unsound, but even with that other bug fixed it will raise Match_failure.

match x with
| { a = false } -> 0
| { a = true; b = None } -> 1
| { a = _; b = _ } when guard () -> 2
| { a = _; b = Some n } -> n

match x with
| { a = false } -> 0
| { a = true; b = None } -> 1
| _ when guard () -> 2
| { a = _; b = Some n } -> n

The difference between the two programs is that, in the first case, all accesses to b are under the scope of the same binding (when deconstructing the record pattern shared by all clauses), while in the second program the clauses are split in three separate submatrices, and matrices 1 and 3 will both read b independently.

In the general case my intuition is that we probably do not want to provide the snapshot semantics. The rule of the game of pattern-matching compilation is to avoid code-size explosion -- if implemented naively, it can be exponential in the source code. The "matrix-based" algorithm used by the OCaml compiler is designed on the idea that we avoid code explosion by introducing code sharing at the expense of redundant reads. We could decide to try to share code for immutable reads and accept code blowups for mutable reads, but (1) I have no idea how to implement this and it could be very very invasive, and (2) this means that we would behave terribly on complex clauses with only mutable fields.

Given that I don't think we can provide the snapshot semantics in general, I don't think that providing it in particular cases is an important goal. In fact, I would rather punish side-effects during matching with a Match_failure exception than magically saving the day in a weird subset of cases.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Dec 8, 2023

I looked at the two PRs #12555 and #12556 again. They conflict and fix the same issue, and I still think that #12555 (which erases more context information) is the better choice -- if only because it is simpler to implement. I am closing this one (which binds mutable fields earlier) to help move forward. Feel free to keep discussing and/or ask to reopen if you think differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant