Skip to content

zero_alloc: fix treatment of caml_flambda2_invalid#2112

Merged
gretay-js merged 6 commits intooxcaml:mainfrom
gretay-js:zero_alloc_flambda2_invalid
Dec 13, 2023
Merged

zero_alloc: fix treatment of caml_flambda2_invalid#2112
gretay-js merged 6 commits intooxcaml:mainfrom
gretay-js:zero_alloc_flambda2_invalid

Conversation

@gretay-js
Copy link
Copy Markdown
Contributor

Fixes a soundness problem in zero_alloc checking.

It turns out that zero_alloc annotations are ignored in some cases on functions that became dead because all calls to them were inlined (the functions themselves are not deleted, thanks to #1378). The culprit is returns=false field of the extcall to caml_flambda2_invalid, which is sometimes inserted by flambda2 in dead code. It allows selectgen to eliminate the rest of the code of the function, including any allocations, so zero_alloc check that runs later on does not report an error. However, the inlined copy of these allocatons may not have been eliminated. This is not sound: the user expects that if the original function is zero_alloc then its inlined copies have no allocations.

The fix is to pretend that a call to caml_flambda2_invalid returns when it appears inside a function annotated with [@zero_alloc]. This is done in selectgen but can probably be done earlier.

Here is an example (can probably be simplied):

let f x len pos =
  let[@zero_alloc][@inline] unsafe_set_int64 ba pos =
    Bigarray.Array1.unsafe_set ba pos (Int64.of_int x)  
  in
  let ba = Bigarray.Array1.create Int64 C_layout (Iarray.length len) in
  unsafe_set_int64 ba pos;
  ba

The inner function unsafe_set_int64 allocates (it contains an alloc and an extcall to caml_ba_set_1 that is not marked as @noalloc):

(function{dead.ml:7,45-157} assert_zero_alloc
 camlDead__unsafe_set_int64_1_3_code
     (ba/525: val pos/526: int my_closure/527: val)
 (extcall "caml_ba_set_1"{dead.ml:10,4-54} ba/525 pos/526
   (alloc{dead.ml:10,38-54} 2303 G:"caml_int64_ops"
     (>>s (extcall "caml_flambda2_invalid" L:"camlDead__invalid116" int->.)
       1))
   int,int,int->val))

but zero_alloc check used to succeed because the above cmm code was compiled to:

camlDead__unsafe_set_int64_1_3_code(R/0[%rax] R/1[%rbx] R/2[%rdi]) {dead.ml:7,45-157} assert_zero_alloc
  { + R/0[%rax]}
  ba/45 := R/0[%rax]
  { + R/1[%rbx]}
  pos/46 := R/1[%rbx]
  { + R/2[%rdi]}
  my_closure/47 := R/2[%rdi]
  {}
  I/48 := "camlDead__invalid116"
  { + I/48}
  R/2[%rdi] := I/48
  { + R/2[%rdi]}
  extcall "caml_flambda2_invalid" R/2[%rdi] (noalloc)

@gretay-js gretay-js added flambda2 Prerequisite for, or part of, flambda2 backend middle end labels Nov 30, 2023
but will hopefully prevent us from running into this bug again
when we add new properties.
for zero_alloc checking, pretend this extcall returns, otherwise
it gets eliminated and the check is not sound.
@gretay-js gretay-js force-pushed the zero_alloc_flambda2_invalid branch 2 times, most recently from 3965232 to 45591db Compare December 1, 2023 11:52
@Gbury
Copy link
Copy Markdown
Contributor

Gbury commented Dec 1, 2023

It doesn't seem that this is the right fix: if the simplified body of the function contains a flambda2 Invalid then that means the code should be unreachable, and crucially, all instances of the inlined function should also contain that Invalid (and therefore their allocations simplified away for the same reason that they have been in function body).

Considering that an flambda2 Invalid is essentially an abort, I'm not sure what semantics one would want to give to the zero alloc check in such a case. In any case, we're looking at why an invalid is generated in that case.

@gretay-js
Copy link
Copy Markdown
Contributor Author

It doesn't seem that this is the right fix: if the simplified body of the function contains a flambda2 Invalid then that means the code should be unreachable,

even if the function is unreachable/dead/inlined away, then we still want to check its (original) body if the function was unnotated with [@zero_alloc], so I guess this PR is pretending that "caml_flambda2_invalid" is a NOP.

and crucially, all instances of the inlined function should also contain that Invalid (and therefore their allocations simplified away for the same reason that they have been in function body).

Considering that an flambda2 Invalid is essentially an abort, I'm not sure what semantics one would want to give to the zero alloc check in such a case. In any case, we're looking at why an invalid is generated in that case.

thanks! @mshinwell had an explanation, which I don't fully understand.

@Gbury
Copy link
Copy Markdown
Contributor

Gbury commented Dec 1, 2023

Ok, so after a bit more debugging, the main problem is the following:

  • We explicitly keep the code of now-unused functions (e.g. functions for which all calls have been inlined) alive for the zero_alloc check, however, that does not prevent the various flamdba2 analysis from detecting such code as dead and unused.
  • In particular, the code that analyses usage of sets of closures slots rightfully detects that the various slots for unsafe_set_int64 are not actually used (because the code is not reachable anymore), and when assigning them an offset, it records them as slots that only occur in dead code.
  • Afterwards, during the to_cmm translation, when generating cmm for unsafe_set_int64, we see a projection from a dead value slot, and generate a call to caml_flamdba2_invalid since it is supposed to be deadcode.
  • We get to the problematic point of the zero_alloc check seeing an invalid in the code for unsafe_set_int64 even though that invalid is actually an unfortunate result of the "hack" we currently use to maintain the code of the function alive.

My opinion on all that is that we should try and find a more proper way to keep the code alive (since there will always be the possibility of a new analysis that could see the code as dead and make optimisations based on that) that would also be easily erasable to as to not have any runtime cost in the generated program.

@gretay-js
Copy link
Copy Markdown
Contributor Author

@Gbury You analysis sounds similar to what Mark said. We should consider moving zero_alloc check earlier (to flambda2) after removing flambda1 and closure. In the meantime, I believe this PR improves the usability of zero_alloc analysis and the code seems no worse than the existing hack.

@Gbury
Copy link
Copy Markdown
Contributor

Gbury commented Dec 1, 2023

If moving the analysis to operate on flambda2 is possible (now that flambda1 and closure are no more), that would certainly avoid having to hack things in flambda2 to keep those alive.

In the meantime, the fix should be okay, at least concerning the invalids generated by to_cmm, after all it only affects generated but dead function code (which might get a little bigger). I'm still a bit worried about what that means for genuine invalids generated by flambda2 though.

@gretay-js gretay-js force-pushed the zero_alloc_flambda2_invalid branch from 45591db to 8ec6f17 Compare December 1, 2023 16:42
@lpw25
Copy link
Copy Markdown
Collaborator

lpw25 commented Dec 1, 2023

we should try and find a more proper way to keep the code alive

Can we do the equivalent of ignore (Sys.opque_identity foo) on all the functions explicitly marked zero_alloc? It will probably affect the code gen a bit, but probably only in module entry functions.

@gretay-js
Copy link
Copy Markdown
Contributor Author

@Gbury said

It doesn't seem that this is the right fix: if the simplified body of the function contains a flambda2 Invalid then that means the code should be unreachable, and crucially, all instances of the inlined function should also contain that Invalid (and therefore their allocations simplified away for the same reason that they have been in function body). Considering that an flambda2 Invalid is essentially an abort, I'm not sure what semantics one would want to give to the zero alloc check in such a case.

This is a nice property, with it we don't need this PR at all and we don't need to fix anything, but I'm not sure if it does what users expect in the presence of code changes. Consider a situation where the check starts failing on an annotated function whose code hasn't changed but a new call to this function was added or even worse - completely unrelated new code was added that uses a previously-unused slot in the sets of closures. It would make the check fail (correctly) but can be hard to debug for the user

@Gbury
Copy link
Copy Markdown
Contributor

Gbury commented Dec 5, 2023

This is a nice property, with it we don't need this PR at all and we don't need to fix anything, but I'm not sure if it does what users expect in the presence of code changes. Consider a situation where the check starts failing on an annotated function whose code hasn't changed but a new call to this function was added or even worse - completely unrelated new code was added that uses a previously-unused slot in the sets of closures. It would make the check fail (correctly) but can be hard to debug for the user

I don't know if it makes the debugging harder, but it sure makes the appearance of the warning/error quite surprising/puzzling. That being said, I don't expect that to happen in real cases, because to get to that situation, you have to have a function for which every use is dead code, so the function is completely dead: not exposed in any interface and every use of it is also dead code. In such a case, it could even make sense to emit a dedicated dead code warning, because that seems very much like a programming error (or a compiler error).

Also, note that the invariant I describe above is true when considering the flambda2 code, but since to_cmm can introduce some invalids (as is the case in your example), that invariant is not true for the generated cmm code, at least right now. However, such invalids generated by to_cmm only occurs inside of dead code, so if we stop artificially maintaining things alive, then there should be no such invalids generated by to_cmm. Said in another way: if the flambda2 code guarantees that it contains no dead code/all dead code has been replaced by Invalid, then we get the invariant we want.

@gretay-js
Copy link
Copy Markdown
Contributor Author

However, such invalids generated by to_cmm only occurs inside of dead code, so if we stop artificially maintaining things alive, then there should be no such invalids generated by to_cmm. Said in another way: if the flambda2 code guarantees that it contains no dead code/all dead code has been replaced by Invalid, then we get the invariant we want.

This hack was introduced before we started propagating "zero_alloc assume" information as part of Debuginfo from Lambda all the way down to Mach. If we can reliably propagate "zero_alloc assert" on debug info in a similar way, we won't need to keep functions alive just for the check. Instead, we can check the inlined body and report an error if an annotated expression that may allocate made it all the way to mach. This is independent from whether the check is performed on mach or flambda2. It doesn't help with puzzling behavior upon code changes, as above.

@gretay-js
Copy link
Copy Markdown
Contributor Author

gretay-js commented Dec 11, 2023

@Gbury if you are okay with this approach for now, can you please review and approve the PR?

@gretay-js gretay-js requested a review from Gbury December 11, 2023 11:08
Copy link
Copy Markdown
Contributor

@xclerc xclerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with others that this does
not appear like a good long-term
solution. For this particular check,
I think my preference would be to
have a principled way of keeping
the function closer to its original
form until we perform the check.
Approving as a temporary workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend flambda2 Prerequisite for, or part of, flambda2 middle end

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants