Skip to content

Fix MPR7241: Pattern matching with mutable and lazy patterns is unsound#717

Closed
maranget wants to merge 15 commits intoocaml:trunkfrom
maranget:PR7241
Closed

Fix MPR7241: Pattern matching with mutable and lazy patterns is unsound#717
maranget wants to merge 15 commits intoocaml:trunkfrom
maranget:PR7241

Conversation

@maranget
Copy link
Copy Markdown
Contributor

@maranget maranget commented Jul 26, 2016

Correct PR#7241 --- side effect in guard segfaults because of PM compiler assuming that subject value does not change.

Correction consists in avoiding remembering what was matched. More precisely the compiler "forgets" what it learned during matching. This is performed by the various "rshift" functions.

Correction applies when

  1. Some pattern has a mutable field.
  2. And there is a guard or a lazy pattern.

Caveats
PM may now fail (with Match_failure) although no warning has been given (better than segfault),
see example in test suite.
Suboptimal, one may think confining the memory erasure to mutable subterms.

@alainfrisch
Copy link
Copy Markdown
Contributor

Can values be mutated without guards nor lazy patterns? I was thinking about patterns that allocate (reading from float records or arrays => boxing) and could trigger finalizers or signal handlers, or yield control to a different thread. Or are all allocations delayed until the guard/action?

@alainfrisch
Copy link
Copy Markdown
Contributor

Also, what do you think about introducing a warning about patterns that deconstruct mutable fields and arrays? This PR makes it less unsafe, but it's still not ideal to get a Match_failure and there does not seem to be any way to address that fully.

@maranget
Copy link
Copy Markdown
Contributor Author

On your first comment I have no idea especially as regards allocating pattern matching (is there such a thing ?). As regards concurrent execution. if the thread that performs pattern matching can yield (can it ?), anything can happen.

On your second comment, I think shuch a warning would be obnoxious, being triggered much more often than useful. However it would be very easy to implement...

@gasche gasche self-assigned this Jul 26, 2016
@alainfrisch
Copy link
Copy Markdown
Contributor

(is there such a thing ?)

Yes, patterns that read floats from unboxed representations (float arrays, float records) allocate. At least in native code, for capture variables which are known to be of type float (and which can potentially come from an unboxed representation), one could capture the unboxed version instead (i.e. simply read from unboxed representation, or unbox if the value comes from a boxed representation -- the two cases can be combined in a or-pattern). This might be completely straightforward since unboxing is currently done much later than the compilation of pattern matching (and bytecode would not be supported). But there will remain the case of arrays which can potentially hold floats (e.g. function [|x|] -> Some x | _ -> None). Here, the pattern allocates only when the array hold floats, but the trick above would not work.

if the thread that performs pattern matching can yield (can it ?)

Yes, allocation can give control to another thread (or execute code through finalizers or signal handlers).

anything can happen.

I think it would be safer to apply the same defensive mechanism that you suggest in this PR for any pattern that can possibly allocate, even with no guard or lazy pattern.

I think shuch a warning would be obnoxious, being triggered much more often than useful. However it would be very easy to implement...

If the warning was restricted to cases that (i) read mutable fields and (ii) can execute code (lazy/guard/allocation), do you think it would be triggered too much?

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jul 27, 2016

I think it would be safer to apply the same defensive mechanism that you suggest in this PR for any pattern that can possibly allocate, even with no guard or lazy pattern.

I think that we should aim to treat these patterns correctly for the cases where we know unboxing is happening, as we can always inspect the float itself for the pattern and then only allocate when the pattern has been successfully matched.

As you say, this doesn't work for polymorphic patterns in arrays, so I guess being defensive is the only real option in that case. (Add that to the list of reasons why the float array hack should be removed).

@alainfrisch
Copy link
Copy Markdown
Contributor

then only allocate when the pattern has been successfully matched.

Note that to support or-pattern, we'd need to unbox boxed representations, as in:

  function (Left [| x |] | Right x) -> x +. 1

and it will then be better to keep the unboxed binding (boxing only on demand).

I don't think this could be made to work easily for bytecode, though.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jul 27, 2016

Note that to support or-pattern, we'd need to unbox boxed representations

I don't think OR-patterns make a difference, because you only need to extract the field if there is a pattern to match it against. So:

type t = { x : float }
let f = function { x } -> x

doesn't need to extract the field until after the match, whereas:

type t = { x : float }
let f = function
| { x = 0.0 } -> false
| _ -> true

does.

Actually, now that I think about it I suspect that float's in arrays are also fine. Since there are no polymorphic patterns which range over float and non-float, you can always delay extracting an item from a polymorphic array until after the match has completed.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 27, 2016

I thought about this "only after matching" argument, but there is also the when guard, can't it result in matching more patterns in the future if it fails?

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jul 27, 2016

can't it result in matching more patterns in the future if it fails?

Yes, but when guards can allocate anyway so you don't care about allocating before them.

When I say "after matching" I mean the point after a pattern has been decided as matched, rather than the point after the (optional) when guard has been considered as well.

@alainfrisch
Copy link
Copy Markdown
Contributor

@lpw25 How would you deal with:

  function ((Left [| x |] | Right x),  (Left [| y ||] | Right y))-> x +. y

After detecting that the or-pattern succeeds, you need to remember which branches were taken (to know where to load x/y from and whether to allocate). Or does the compilation of patterns "explodes" the four possible cases?

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jul 27, 2016

How would you deal with

Oh I see what you mean. Although that still seems resolvable. You could store an object and index for fetching the unboxed float, rather than a pointer to the boxed float.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jul 27, 2016

Except that only works if you know about the boxing.

@lpw25
Copy link
Copy Markdown
Contributor

lpw25 commented Jul 27, 2016

So it seems that whilst or-patterns are fine and float arrays are fine, the combination of or-patterns and float arrays is problematic.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 27, 2016

@alainfrisch I think that you would get four exits in this example (at least this is what compilation-by-matrices gives).

@alainfrisch
Copy link
Copy Markdown
Contributor

I think that you would get four exits in this example (at least this is what compilation-by-matrices gives).

Experiment shows it's not the case. (The size of the lambda size grows linearly in the number of tuple components.)

the combination of or-patterns and float arrays is problematic.

Do you see any drawback in unboxing all capture variables of type float coming from at least one unboxed representation (in native code at least)? So in:

 function ((Left [| x |] | Right x),  (Left [| y |] | Right y))-> x +. y

one would unbox in the right branch of or-patterns.

This does not solve the problem of generic (float/non-float) arrays, but we know the real solution for that.

@maranget
Copy link
Copy Markdown
Contributor Author

I would now tend to err on the safe side: cancel PM optimisation whenever a mutable field or an array is present in the patterns. It will then remain two issues

  1. Confine the anti-optimisaton to subterm of the subject value that may change.
  2. Warn users (should we?).

(*
If there is a guard in a matching or a lazy pattern,
then set exhaustiveness info to Partial.
If there is a mutable pattern set exhiastiveness to partial
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

@alainfrisch
Copy link
Copy Markdown
Contributor

Ok, so the de-optimization is triggered by mutability even if no code can be executed given the current system (lazy/guards/allocations). A typical example would be patterns such as {desc = ..} in the type-checker. Do you believe this de-optimization could impact performance in a significant way?

@maranget
Copy link
Copy Markdown
Contributor Author

I do not think performance will be affected significantly. If any performance problem, it can be alleviated by confining de-optimisation to subterms that can change, as I wrote above. In any case, correctness first.

@maranget maranget closed this Jul 29, 2016
@maranget maranget reopened this Jul 29, 2016
@xavierleroy
Copy link
Copy Markdown
Contributor

@maranget : as much as I trust your 6AM intuitions, I'd like to see some measurement of performance impact, e.g. on the benchmarks that were used for flambda.

@maranget
Copy link
Copy Markdown
Contributor Author

maranget commented Jul 29, 2016

Fair enough. Has anybody a pointer to those "benchmarks that were used for flambda". ?
CC @chambart

@chambart
Copy link
Copy Markdown
Contributor

chambart commented Sep 2, 2016

@maranget some of the informations are here. Sadly some benchmarks relying on core will have to be updated for the new version since the old ones can't compiler on 4.04.
https://www.typerex.org/operf-macro.html

@maranget
Copy link
Copy Markdown
Contributor Author

maranget commented Sep 2, 2016

Thanks,

--Luc

@maranget some of the informations are here. Sadly some benchmarks relying on core will have to be updated for the new version since the old ones can't compiler on 4.04.
https://www.typerex.org/operf-macro.html

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#717 (comment)

@maranget
Copy link
Copy Markdown
Contributor Author

maranget commented Sep 6, 2016

I have solved issue 1. above. That is "Confine the anti-optimisaton to subterm of the subject value that may change.". Unfortunately I could not install operf-macro (command "opam repo add operf-macro git:" does not work), so as to systematically test the performance degradation....
@chambart

Immutable -> Alias
| Mutable -> StrictOpt in
Immutable -> Alias,mut
| Mutable -> StrictOpt,true in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why this variable is named "str"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea :)

…mitted

as an extra information in argument "args" of the main PM compilation functions
(do_compile_matching etc.)
@trefis
Copy link
Copy Markdown
Contributor

trefis commented Apr 20, 2018

Btw, @mshinwell rerun the benchmark during the night, the results are better than previously (i.e. they are fine now).
Note: the "mutmatch" version is based on a rebased version of this PR + some trivial commits, available here.

@maranget
Copy link
Copy Markdown
Contributor Author

I have integrated "trivial" cleanups (or argued why I did not integrate them).

I still have to look at the more general comments.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Apr 20, 2018

The embryo of a discussion we just had about choosing "the right branch" makes me wonder whether the following comment, which I made earlier, is still valid:

Apart from these remarks I think the code does what it claims: it disables optimizations which rely on informations that might change during matching.

That's not quite right, in the sense that the compatibility check we use to decide whether we can reorder rows or not doesn't care about mutability. And one could argue that you should not allow reordering rows which look at the same mutable cell when mutation can happen.
To be clear: I don't think there is a soundness issue in allowing it (though I'll try and think about it a bit more), just that the behavior might surprise the user, and that we might want to change it at some point.

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Apr 20, 2018

And one could argue that you should not allow reordering rows which look at the same mutable cell when mutation can happen.

That of course seems impossible to track. Fortunately, I believe the condition should actually be "you never reorder rows looking at mutable cells when mutation can happen", which seems easier to enforce.

@xavierleroy
Copy link
Copy Markdown
Contributor

What's the status on this PR? Was the problem solved in one way or another? Is the PR still relevant?

@trefis
Copy link
Copy Markdown
Contributor

trefis commented Oct 14, 2019

The problem is still present in trunk today. @gasche and I answered some of the questions that I asked during code review, while we were working on the pattern matching earlier this year.

That work is still ongoing, and we definitely hope to fix #7241 as a result of it. It's unclear to me whether our final fix will be a refresh of this PR, or something different, but what is clear is that this PR is unlikely to be merged.
I'll let @gasche close it if he agrees with this assessment.

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 15, 2019

I would vote for either "keep this one open as a reminder" or "close this one but open an issue". The first one doesn't require any work, and that is nice.

EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request Dec 17, 2021
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
@gasche
Copy link
Copy Markdown
Member

gasche commented Mar 10, 2023

@maranget should we have a meeting to discuss the state of this PR? (cc maybe @trefis, @Octachron)

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 4, 2023

(A couple weeks ago I posted my current thoughts about this as a comment on the original issue: #7241 (comment) )

@gasche gasche mentioned this pull request Sep 13, 2023
@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 20, 2024

We are not going to merge the present PR, but there is a lot of progress on other PRs to fix the same issue -- see #7241 for full details. Closing here.

@gasche gasche closed this Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants