zero_alloc: fix treatment of caml_flambda2_invalid by gretay-js · Pull Request #2112 · oxcaml/oxcaml

gretay-js · 2023-11-30T17:30:36Z

Fixes a soundness problem in zero_alloc checking.

It turns out that zero_alloc annotations are ignored in some cases on functions that became dead because all calls to them were inlined (the functions themselves are not deleted, thanks to #1378). The culprit is returns=false field of the extcall to caml_flambda2_invalid, which is sometimes inserted by flambda2 in dead code. It allows selectgen to eliminate the rest of the code of the function, including any allocations, so zero_alloc check that runs later on does not report an error. However, the inlined copy of these allocatons may not have been eliminated. This is not sound: the user expects that if the original function is zero_alloc then its inlined copies have no allocations.

The fix is to pretend that a call to caml_flambda2_invalid returns when it appears inside a function annotated with [@zero_alloc]. This is done in selectgen but can probably be done earlier.

Here is an example (can probably be simplied):

let f x len pos =
  let[@zero_alloc][@inline] unsafe_set_int64 ba pos =
    Bigarray.Array1.unsafe_set ba pos (Int64.of_int x)  
  in
  let ba = Bigarray.Array1.create Int64 C_layout (Iarray.length len) in
  unsafe_set_int64 ba pos;
  ba

The inner function unsafe_set_int64 allocates (it contains an alloc and an extcall to caml_ba_set_1 that is not marked as @noalloc):

(function{dead.ml:7,45-157} assert_zero_alloc
 camlDead__unsafe_set_int64_1_3_code
     (ba/525: val pos/526: int my_closure/527: val)
 (extcall "caml_ba_set_1"{dead.ml:10,4-54} ba/525 pos/526
   (alloc{dead.ml:10,38-54} 2303 G:"caml_int64_ops"
     (>>s (extcall "caml_flambda2_invalid" L:"camlDead__invalid116" int->.)
       1))
   int,int,int->val))

but zero_alloc check used to succeed because the above cmm code was compiled to:

camlDead__unsafe_set_int64_1_3_code(R/0[%rax] R/1[%rbx] R/2[%rdi]) {dead.ml:7,45-157} assert_zero_alloc
  { + R/0[%rax]}
  ba/45 := R/0[%rax]
  { + R/1[%rbx]}
  pos/46 := R/1[%rbx]
  { + R/2[%rdi]}
  my_closure/47 := R/2[%rdi]
  {}
  I/48 := "camlDead__invalid116"
  { + I/48}
  R/2[%rdi] := I/48
  { + R/2[%rdi]}
  extcall "caml_flambda2_invalid" R/2[%rdi] (noalloc)

but will hopefully prevent us from running into this bug again when we add new properties.

for zero_alloc checking, pretend this extcall returns, otherwise it gets eliminated and the check is not sound.

Gbury · 2023-12-01T12:47:25Z

It doesn't seem that this is the right fix: if the simplified body of the function contains a flambda2 Invalid then that means the code should be unreachable, and crucially, all instances of the inlined function should also contain that Invalid (and therefore their allocations simplified away for the same reason that they have been in function body).

Considering that an flambda2 Invalid is essentially an abort, I'm not sure what semantics one would want to give to the zero alloc check in such a case. In any case, we're looking at why an invalid is generated in that case.

gretay-js · 2023-12-01T13:18:19Z

It doesn't seem that this is the right fix: if the simplified body of the function contains a flambda2 Invalid then that means the code should be unreachable,

even if the function is unreachable/dead/inlined away, then we still want to check its (original) body if the function was unnotated with [@zero_alloc], so I guess this PR is pretending that "caml_flambda2_invalid" is a NOP.

and crucially, all instances of the inlined function should also contain that Invalid (and therefore their allocations simplified away for the same reason that they have been in function body).

Considering that an flambda2 Invalid is essentially an abort, I'm not sure what semantics one would want to give to the zero alloc check in such a case. In any case, we're looking at why an invalid is generated in that case.

thanks! @mshinwell had an explanation, which I don't fully understand.

Gbury · 2023-12-01T15:39:51Z

Ok, so after a bit more debugging, the main problem is the following:

We explicitly keep the code of now-unused functions (e.g. functions for which all calls have been inlined) alive for the zero_alloc check, however, that does not prevent the various flamdba2 analysis from detecting such code as dead and unused.
In particular, the code that analyses usage of sets of closures slots rightfully detects that the various slots for unsafe_set_int64 are not actually used (because the code is not reachable anymore), and when assigning them an offset, it records them as slots that only occur in dead code.
Afterwards, during the to_cmm translation, when generating cmm for unsafe_set_int64, we see a projection from a dead value slot, and generate a call to caml_flamdba2_invalid since it is supposed to be deadcode.
We get to the problematic point of the zero_alloc check seeing an invalid in the code for unsafe_set_int64 even though that invalid is actually an unfortunate result of the "hack" we currently use to maintain the code of the function alive.

My opinion on all that is that we should try and find a more proper way to keep the code alive (since there will always be the possibility of a new analysis that could see the code as dead and make optimisations based on that) that would also be easily erasable to as to not have any runtime cost in the generated program.

gretay-js · 2023-12-01T15:58:32Z

@Gbury You analysis sounds similar to what Mark said. We should consider moving zero_alloc check earlier (to flambda2) after removing flambda1 and closure. In the meantime, I believe this PR improves the usability of zero_alloc analysis and the code seems no worse than the existing hack.

Gbury · 2023-12-01T16:23:29Z

If moving the analysis to operate on flambda2 is possible (now that flambda1 and closure are no more), that would certainly avoid having to hack things in flambda2 to keep those alive.

In the meantime, the fix should be okay, at least concerning the invalids generated by to_cmm, after all it only affects generated but dead function code (which might get a little bigger). I'm still a bit worried about what that means for genuine invalids generated by flambda2 though.

lpw25 · 2023-12-01T17:59:54Z

we should try and find a more proper way to keep the code alive

Can we do the equivalent of ignore (Sys.opque_identity foo) on all the functions explicitly marked zero_alloc? It will probably affect the code gen a bit, but probably only in module entry functions.

gretay-js · 2023-12-05T10:52:07Z

@Gbury said

It doesn't seem that this is the right fix: if the simplified body of the function contains a flambda2 Invalid then that means the code should be unreachable, and crucially, all instances of the inlined function should also contain that Invalid (and therefore their allocations simplified away for the same reason that they have been in function body). Considering that an flambda2 Invalid is essentially an abort, I'm not sure what semantics one would want to give to the zero alloc check in such a case.

This is a nice property, with it we don't need this PR at all and we don't need to fix anything, but I'm not sure if it does what users expect in the presence of code changes. Consider a situation where the check starts failing on an annotated function whose code hasn't changed but a new call to this function was added or even worse - completely unrelated new code was added that uses a previously-unused slot in the sets of closures. It would make the check fail (correctly) but can be hard to debug for the user

Gbury · 2023-12-05T11:14:35Z

This is a nice property, with it we don't need this PR at all and we don't need to fix anything, but I'm not sure if it does what users expect in the presence of code changes. Consider a situation where the check starts failing on an annotated function whose code hasn't changed but a new call to this function was added or even worse - completely unrelated new code was added that uses a previously-unused slot in the sets of closures. It would make the check fail (correctly) but can be hard to debug for the user

I don't know if it makes the debugging harder, but it sure makes the appearance of the warning/error quite surprising/puzzling. That being said, I don't expect that to happen in real cases, because to get to that situation, you have to have a function for which every use is dead code, so the function is completely dead: not exposed in any interface and every use of it is also dead code. In such a case, it could even make sense to emit a dedicated dead code warning, because that seems very much like a programming error (or a compiler error).

Also, note that the invariant I describe above is true when considering the flambda2 code, but since to_cmm can introduce some invalids (as is the case in your example), that invariant is not true for the generated cmm code, at least right now. However, such invalids generated by to_cmm only occurs inside of dead code, so if we stop artificially maintaining things alive, then there should be no such invalids generated by to_cmm. Said in another way: if the flambda2 code guarantees that it contains no dead code/all dead code has been replaced by Invalid, then we get the invariant we want.

gretay-js · 2023-12-05T11:30:09Z

However, such invalids generated by to_cmm only occurs inside of dead code, so if we stop artificially maintaining things alive, then there should be no such invalids generated by to_cmm. Said in another way: if the flambda2 code guarantees that it contains no dead code/all dead code has been replaced by Invalid, then we get the invariant we want.

This hack was introduced before we started propagating "zero_alloc assume" information as part of Debuginfo from Lambda all the way down to Mach. If we can reliably propagate "zero_alloc assert" on debug info in a similar way, we won't need to keep functions alive just for the check. Instead, we can check the inlined body and report an error if an annotated expression that may allocate made it all the way to mach. This is independent from whether the check is performed on mach or flambda2. It doesn't help with puzzling behavior upon code changes, as above.

gretay-js · 2023-12-11T11:02:54Z

@Gbury if you are okay with this approach for now, can you please review and approve the PR?

xclerc

I agree with others that this does
not appear like a good long-term
solution. For this particular check,
I think my preference would be to
have a principled way of keeping
the function closer to its original
form until we perform the check.
Approving as a temporary workaround.

backend/selectgen.ml

gretay-js added flambda2 Prerequisite for, or part of, flambda2 backend middle end labels Nov 30, 2023

gretay-js requested review from mshinwell and xclerc November 30, 2023 17:30

gretay-js added 3 commits November 30, 2023 17:31

Remove unused Selectgen.reset function

19a72ce

Add Cmm.all_properties function (currently trivial)

a1cd8f2

but will hopefully prevent us from running into this bug again when we add new properties.

Special handling of "caml_flambda2_invalid" in selectgen

d1db635

for zero_alloc checking, pretend this extcall returns, otherwise it gets eliminated and the check is not sound.

gretay-js force-pushed the zero_alloc_flambda2_invalid branch 2 times, most recently from 3965232 to 45591db Compare December 1, 2023 11:52

Add a test

8ec6f17

gretay-js force-pushed the zero_alloc_flambda2_invalid branch from 45591db to 8ec6f17 Compare December 1, 2023 16:42

gretay-js requested a review from Gbury December 11, 2023 11:08

xclerc approved these changes Dec 11, 2023

View reviewed changes

backend/selectgen.ml Outdated Show resolved Hide resolved

gretay-js added 2 commits December 12, 2023 11:27

Add Cmm.caml_flambda2_invalid symbolic constant

5b2d6fe

use symbolic constant in selectgen

54dcdc9

xclerc approved these changes Dec 12, 2023

View reviewed changes

gretay-js merged commit 0751332 into oxcaml:main Dec 13, 2023

gretay-js mentioned this pull request Jul 17, 2024

Zero alloc fix flambda2 invalid no return take2 #2810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zero_alloc: fix treatment of caml_flambda2_invalid#2112

zero_alloc: fix treatment of caml_flambda2_invalid#2112
gretay-js merged 6 commits intooxcaml:mainfrom
gretay-js:zero_alloc_flambda2_invalid

gretay-js commented Nov 30, 2023

Uh oh!

Gbury commented Dec 1, 2023

Uh oh!

gretay-js commented Dec 1, 2023

Uh oh!

Gbury commented Dec 1, 2023 •

edited

Loading

Uh oh!

gretay-js commented Dec 1, 2023

Uh oh!

Gbury commented Dec 1, 2023

Uh oh!

lpw25 commented Dec 1, 2023

Uh oh!

gretay-js commented Dec 5, 2023

Uh oh!

Gbury commented Dec 5, 2023 •

edited

Loading

Uh oh!

gretay-js commented Dec 5, 2023

Uh oh!

gretay-js commented Dec 11, 2023 •

edited

Loading

Uh oh!

xclerc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

gretay-js commented Nov 30, 2023

Uh oh!

Gbury commented Dec 1, 2023

Uh oh!

gretay-js commented Dec 1, 2023

Uh oh!

Gbury commented Dec 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gretay-js commented Dec 1, 2023

Uh oh!

Gbury commented Dec 1, 2023

Uh oh!

lpw25 commented Dec 1, 2023

Uh oh!

gretay-js commented Dec 5, 2023

Uh oh!

Gbury commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gretay-js commented Dec 5, 2023

Uh oh!

gretay-js commented Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xclerc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Gbury commented Dec 1, 2023 •

edited

Loading

Gbury commented Dec 5, 2023 •

edited

Loading

gretay-js commented Dec 11, 2023 •

edited

Loading