Fix MPR7241: Pattern matching with mutable and lazy patterns is unsound by maranget · Pull Request #717 · ocaml/ocaml

maranget · 2016-07-26T14:23:58Z

Correct PR#7241 --- side effect in guard segfaults because of PM compiler assuming that subject value does not change.

Correction consists in avoiding remembering what was matched. More precisely the compiler "forgets" what it learned during matching. This is performed by the various "rshift" functions.

Correction applies when

Some pattern has a mutable field.
And there is a guard or a lazy pattern.

Caveats
PM may now fail (with Match_failure) although no warning has been given (better than segfault),
see example in test suite.
Suboptimal, one may think confining the memory erasure to mutable subterms.

alainfrisch · 2016-07-26T14:30:09Z

Can values be mutated without guards nor lazy patterns? I was thinking about patterns that allocate (reading from float records or arrays => boxing) and could trigger finalizers or signal handlers, or yield control to a different thread. Or are all allocations delayed until the guard/action?

alainfrisch · 2016-07-26T14:32:18Z

Also, what do you think about introducing a warning about patterns that deconstruct mutable fields and arrays? This PR makes it less unsafe, but it's still not ideal to get a Match_failure and there does not seem to be any way to address that fully.

maranget · 2016-07-26T19:11:46Z

On your first comment I have no idea especially as regards allocating pattern matching (is there such a thing ?). As regards concurrent execution. if the thread that performs pattern matching can yield (can it ?), anything can happen.

On your second comment, I think shuch a warning would be obnoxious, being triggered much more often than useful. However it would be very easy to implement...

alainfrisch · 2016-07-27T07:54:38Z

(is there such a thing ?)

Yes, patterns that read floats from unboxed representations (float arrays, float records) allocate. At least in native code, for capture variables which are known to be of type float (and which can potentially come from an unboxed representation), one could capture the unboxed version instead (i.e. simply read from unboxed representation, or unbox if the value comes from a boxed representation -- the two cases can be combined in a or-pattern). This might be completely straightforward since unboxing is currently done much later than the compilation of pattern matching (and bytecode would not be supported). But there will remain the case of arrays which can potentially hold floats (e.g. function [|x|] -> Some x | _ -> None). Here, the pattern allocates only when the array hold floats, but the trick above would not work.

if the thread that performs pattern matching can yield (can it ?)

Yes, allocation can give control to another thread (or execute code through finalizers or signal handlers).

anything can happen.

I think it would be safer to apply the same defensive mechanism that you suggest in this PR for any pattern that can possibly allocate, even with no guard or lazy pattern.

I think shuch a warning would be obnoxious, being triggered much more often than useful. However it would be very easy to implement...

If the warning was restricted to cases that (i) read mutable fields and (ii) can execute code (lazy/guard/allocation), do you think it would be triggered too much?

lpw25 · 2016-07-27T13:34:28Z

I think it would be safer to apply the same defensive mechanism that you suggest in this PR for any pattern that can possibly allocate, even with no guard or lazy pattern.

I think that we should aim to treat these patterns correctly for the cases where we know unboxing is happening, as we can always inspect the float itself for the pattern and then only allocate when the pattern has been successfully matched.

As you say, this doesn't work for polymorphic patterns in arrays, so I guess being defensive is the only real option in that case. (Add that to the list of reasons why the float array hack should be removed).

alainfrisch · 2016-07-27T13:54:11Z

then only allocate when the pattern has been successfully matched.

Note that to support or-pattern, we'd need to unbox boxed representations, as in:

  function (Left [| x |] | Right x) -> x +. 1

and it will then be better to keep the unboxed binding (boxing only on demand).

I don't think this could be made to work easily for bytecode, though.

lpw25 · 2016-07-27T14:15:28Z

Note that to support or-pattern, we'd need to unbox boxed representations

I don't think OR-patterns make a difference, because you only need to extract the field if there is a pattern to match it against. So:

type t = { x : float }
let f = function { x } -> x

doesn't need to extract the field until after the match, whereas:

type t = { x : float }
let f = function
| { x = 0.0 } -> false
| _ -> true

does.

Actually, now that I think about it I suspect that float's in arrays are also fine. Since there are no polymorphic patterns which range over float and non-float, you can always delay extracting an item from a polymorphic array until after the match has completed.

gasche · 2016-07-27T14:18:40Z

I thought about this "only after matching" argument, but there is also the when guard, can't it result in matching more patterns in the future if it fails?

lpw25 · 2016-07-27T14:22:03Z

can't it result in matching more patterns in the future if it fails?

Yes, but when guards can allocate anyway so you don't care about allocating before them.

When I say "after matching" I mean the point after a pattern has been decided as matched, rather than the point after the (optional) when guard has been considered as well.

alainfrisch · 2016-07-27T14:24:06Z

@lpw25 How would you deal with:

  function ((Left [| x |] | Right x),  (Left [| y ||] | Right y))-> x +. y

After detecting that the or-pattern succeeds, you need to remember which branches were taken (to know where to load x/y from and whether to allocate). Or does the compilation of patterns "explodes" the four possible cases?

lpw25 · 2016-07-27T14:28:09Z

How would you deal with

Oh I see what you mean. Although that still seems resolvable. You could store an object and index for fetching the unboxed float, rather than a pointer to the boxed float.

lpw25 · 2016-07-27T14:30:00Z

Except that only works if you know about the boxing.

lpw25 · 2016-07-27T14:30:54Z

So it seems that whilst or-patterns are fine and float arrays are fine, the combination of or-patterns and float arrays is problematic.

gasche · 2016-07-27T14:32:45Z

@alainfrisch I think that you would get four exits in this example (at least this is what compilation-by-matrices gives).

alainfrisch · 2016-07-27T17:03:35Z

I think that you would get four exits in this example (at least this is what compilation-by-matrices gives).

Experiment shows it's not the case. (The size of the lambda size grows linearly in the number of tuple components.)

the combination of or-patterns and float arrays is problematic.

Do you see any drawback in unboxing all capture variables of type float coming from at least one unboxed representation (in native code at least)? So in:

 function ((Left [| x |] | Right x),  (Left [| y |] | Right y))-> x +. y

one would unbox in the right branch of or-patterns.

This does not solve the problem of generic (float/non-float) arrays, but we know the real solution for that.

maranget · 2016-07-28T09:42:42Z

I would now tend to err on the safe side: cancel PM optimisation whenever a mutable field or an array is present in the patterns. It will then remain two issues

Confine the anti-optimisaton to subterm of the subject value that may change.
Warn users (should we?).

alainfrisch · 2016-07-28T20:38:08Z

bytecomp/matching.ml

 (*
-   If there is a guard in a matching or a lazy pattern,
-   then set exhaustiveness info to Partial.
+   If there is a mutable pattern set exhiastiveness to partial


alainfrisch · 2016-07-28T20:42:57Z

Ok, so the de-optimization is triggered by mutability even if no code can be executed given the current system (lazy/guards/allocations). A typical example would be patterns such as {desc = ..} in the type-checker. Do you believe this de-optimization could impact performance in a significant way?

maranget · 2016-07-29T04:40:57Z

I do not think performance will be affected significantly. If any performance problem, it can be alleviated by confining de-optimisation to subterms that can change, as I wrote above. In any case, correctness first.

xavierleroy · 2016-07-29T08:46:47Z

@maranget : as much as I trust your 6AM intuitions, I'd like to see some measurement of performance impact, e.g. on the benchmarks that were used for flambda.

maranget · 2016-07-29T12:14:07Z

Fair enough. Has anybody a pointer to those "benchmarks that were used for flambda". ?
CC @chambart

chambart · 2016-09-02T12:36:27Z

@maranget some of the informations are here. Sadly some benchmarks relying on core will have to be updated for the new version since the old ones can't compiler on 4.04.
https://www.typerex.org/operf-macro.html

maranget · 2016-09-02T12:55:21Z

Thanks,

--Luc

@maranget some of the informations are here. Sadly some benchmarks relying on core will have to be updated for the new version since the old ones can't compiler on 4.04.
https://www.typerex.org/operf-macro.html

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#717 (comment)

maranget · 2016-09-06T09:39:33Z

I have solved issue 1. above. That is "Confine the anti-optimisaton to subterm of the subject value that may change.". Unfortunately I could not install operf-macro (command "opam repo add operf-macro git:" does not work), so as to systematically test the performance degradation....
@chambart

gasche · 2016-09-07T14:10:55Z

bytecomp/matching.ml

-              Immutable -> Alias
-            | Mutable -> StrictOpt in
+              Immutable -> Alias,mut
+            | Mutable -> StrictOpt,true in


any idea why this variable is named "str"?

…mitted as an extra information in argument "args" of the main PM compilation functions (do_compile_matching etc.)

trefis · 2018-04-20T13:51:54Z

Btw, @mshinwell rerun the benchmark during the night, the results are better than previously (i.e. they are fine now).
Note: the "mutmatch" version is based on a rebased version of this PR + some trivial commits, available here.

…ary code.

…ization is active.

output of ocamlc and ocamlopt are the same.

maranget · 2018-04-20T15:02:21Z

I have integrated "trivial" cleanups (or argued why I did not integrate them).

I still have to look at the more general comments.

trefis · 2018-04-20T16:17:00Z

The embryo of a discussion we just had about choosing "the right branch" makes me wonder whether the following comment, which I made earlier, is still valid:

Apart from these remarks I think the code does what it claims: it disables optimizations which rely on informations that might change during matching.

That's not quite right, in the sense that the compatibility check we use to decide whether we can reorder rows or not doesn't care about mutability. And one could argue that you should not allow reordering rows which look at the same mutable cell when mutation can happen.
To be clear: I don't think there is a soundness issue in allowing it (though I'll try and think about it a bit more), just that the behavior might surprise the user, and that we might want to change it at some point.

trefis · 2018-04-20T16:21:57Z

And one could argue that you should not allow reordering rows which look at the same mutable cell when mutation can happen.

That of course seems impossible to track. Fortunately, I believe the condition should actually be "you never reorder rows looking at mutable cells when mutation can happen", which seems easier to enforce.

xavierleroy · 2019-10-14T18:40:35Z

What's the status on this PR? Was the problem solved in one way or another? Is the PR still relevant?

trefis · 2019-10-14T19:50:11Z

The problem is still present in trunk today. @gasche and I answered some of the questions that I asked during code review, while we were working on the pattern matching earlier this year.

That work is still ongoing, and we definitely hope to fix #7241 as a result of it. It's unclear to me whether our final fix will be a refresh of this PR, or something different, but what is clear is that this PR is unlikely to be merged.
I'll let @gasche close it if he agrees with this assessment.

gasche · 2019-10-15T07:26:40Z

I would vote for either "keep this one open as a reminder" or "close this one but open an issue". The first one doesn't require any work, and that is nice.

Tighten code comments in minor_gc.c

gasche · 2023-03-10T12:38:19Z

@maranget should we have a meeting to discuss the state of this PR? (cc maybe @trefis, @Octachron)

gasche · 2023-07-04T10:23:09Z

(A couple weeks ago I posted my current thoughts about this as a comment on the original issue: #7241 (comment) )

gasche · 2024-07-20T10:12:36Z

We are not going to merge the present PR, but there is a lot of progress on other PRs to fix the same issue -- see #7241 for full details. Closing here.

gasche self-assigned this Jul 26, 2016

alainfrisch reviewed Jul 28, 2016
View reviewed changes

maranget closed this Jul 29, 2016

maranget reopened this Jul 29, 2016

gasche reviewed Sep 7, 2016
View reviewed changes

maranget added 3 commits April 20, 2018 15:33

non-optimisation is now directed by compiled pattern matrix and trans…

1e23dc5

…mitted as an extra information in argument "args" of the main PM compilation functions (do_compile_matching etc.)

A few comments

b62fe54

avoid generating failure info for mutable subterms.

93d890c

maranget added 8 commits April 20, 2018 16:54

New warning for compiler added match failure.

06f2cde

comments

b720eb9

Restrict de-optimisation to cases where matching may call some arbitr…

6d72f51

…ary code.

Show warning 63 (ie extra match failure) only when mutable de-optimim…

bb128d4

…ization is active.

More code sharing.

643b67f

Supress occurrence of new warning 63

09c59e3

New warnings induce new reference files. Notice that compiler console

6555362

output of ocamlc and ocamlopt are the same.

Trivial cleanup, (from trefis observations)

8e0c32d

damiendoligez removed this from the consider-for-4.07 milestone Jun 4, 2018

vicuna mentioned this pull request Mar 14, 2019

Pattern matching with mutable and lazy patterns is unsound #7241

Closed

EduardoRFS pushed a commit to esy-ocaml/ocaml that referenced this pull request Dec 17, 2021

Merge pull request ocaml#717 from ctk21/clarify_minor_gc_ephe

ba50413

Tighten code comments in minor_gc.c

gasche mentioned this pull request Jan 25, 2022

Safety of some non-atomic loads generated by the compiler #10944

Closed

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022

Rename perm -> renaming (ocaml#717)

561d612

gasche mentioned this pull request Mar 23, 2023

Syntactic function arity ocaml/RFCs#32

Merged

gasche removed the high-priority label Jul 4, 2023

gasche mentioned this pull request Sep 13, 2023

Test cases for 7241 #12548

Merged

gasche closed this Jul 20, 2024

Conversation

maranget commented Jul 26, 2016 • edited by damiendoligez Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alainfrisch commented Jul 26, 2016

Uh oh!

alainfrisch commented Jul 26, 2016

Uh oh!

maranget commented Jul 26, 2016

Uh oh!

alainfrisch commented Jul 27, 2016

Uh oh!

lpw25 commented Jul 27, 2016

Uh oh!

alainfrisch commented Jul 27, 2016

Uh oh!

lpw25 commented Jul 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gasche commented Jul 27, 2016

Uh oh!

lpw25 commented Jul 27, 2016

Uh oh!

alainfrisch commented Jul 27, 2016

Uh oh!

lpw25 commented Jul 27, 2016

Uh oh!

lpw25 commented Jul 27, 2016

Uh oh!

lpw25 commented Jul 27, 2016

Uh oh!

gasche commented Jul 27, 2016

Uh oh!

alainfrisch commented Jul 27, 2016

Uh oh!

maranget commented Jul 28, 2016

Uh oh!

alainfrisch Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

alainfrisch commented Jul 28, 2016

Uh oh!

maranget commented Jul 29, 2016

Uh oh!

xavierleroy commented Jul 29, 2016

Uh oh!

maranget commented Jul 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chambart commented Sep 2, 2016

Uh oh!

maranget commented Sep 2, 2016

Uh oh!

maranget commented Sep 6, 2016

Uh oh!

gasche Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

maranget Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

trefis commented Apr 20, 2018

Uh oh!

maranget commented Apr 20, 2018

Uh oh!

trefis commented Apr 20, 2018

Uh oh!

trefis commented Apr 20, 2018

Uh oh!

xavierleroy commented Oct 14, 2019

Uh oh!

trefis commented Oct 14, 2019

Uh oh!

gasche commented Oct 15, 2019

Uh oh!

gasche commented Mar 10, 2023

Uh oh!

gasche commented Jul 4, 2023

Uh oh!

gasche commented Jul 20, 2024

maranget commented Jul 26, 2016 •

edited by damiendoligez

Loading

lpw25 commented Jul 27, 2016 •

edited

Loading

maranget commented Jul 29, 2016 •

edited

Loading