Fix signals_alloc test on PPC by stedolan · Pull Request #9814 · ocaml/ocaml

stedolan · 2020-07-30T16:36:59Z

The PowerPC ports (at least) ignore signals arriving during a @noalloc C call until the next non-@noalloc C call because it caches young_limit in a register. See this comment from signals.c:

  /* When this function is called without [caml_c_call] (e.g., in
     [caml_modify]), this is only moderately effective on ports that cache
     [Caml_state->young_limit] in a register, so it may take a while before the
     register is reloaded from [Caml_state->young_limit]. */
  Caml_state->young_limit = Caml_state->young_alloc_end;

This means that #9802 broke the signals_alloc test on this architecture, by marking a signal-raising function noalloc, and checking that signals were handled quickly afterwards.

cc @nojb

stedolan · 2020-07-30T16:38:01Z

The signal caching behaviour is pretty suspicious for multicore as well, which relies on being able to interrupt another domain by poking young_limit. I wonder whether the new efficient access to runtime variables via Caml_state makes this caching unnecessary?

stedolan · 2020-07-30T16:39:12Z

(precheck 474 now running)

nojb · 2020-07-30T17:02:12Z

Thanks! Indeed, arm64 also caches the allocation pointer in a register, and so does riscv (where I imitated arm64).

xavierleroy · 2020-07-30T17:02:46Z

Note that handle_signal in runtime/signals_nat.c updates the register that caches young_limit when it is safe to do so, i.e. when running OCaml compiled code. So, the problem is only with signals that arrive (or are generated) while running C code.

The signal caching behaviour is pretty suspicious for multicore as well, which relies on being able to interrupt another domain by poking young_limit.

That's something to discuss. I thought it was using... signals!

I wonder whether the new efficient access to runtime variables via Caml_state makes this caching unnecessary?

On RISC processors the caching makes the allocation sequence one instruction shorter, besides saving one load, of course. This is good for performance. I guess some benchmarking is in order.

xavierleroy · 2020-07-30T17:45:25Z

Indeed, arm64 also caches the allocation pointer in a register, and so does riscv (where I imitated arm64).

When the processor gives you 32 or more integer registers, I think it's worth (performance wise) dedicating one register to the allocation limit. That's a judgment call, not supported by actual benchmarks.

jhjourdan

Well spotted, I should have noticed that in #9802. This looks good to me.

nojb · 2020-07-31T14:39:03Z

https://ci.inria.fr/ocaml/job/precheck/475/

avsm · 2020-08-02T19:13:04Z

Can confirm this also fixes the signals_alloc failure on arm64 for me, and it didn't occur on arm32 before either (since that doesn't have a spare register for the allocation pointer).

nojb · 2020-08-02T19:30:46Z

There seems to be a problem with INRIA CI's arm machines, but I would like to go ahead and merge this, based on PR review, confirmation by @avsm, and the fact that INRIA CI passes for ppc.

stedolan · 2020-08-03T09:21:13Z

The signal caching behaviour is pretty suspicious for multicore as well, which relies on being able to interrupt another domain by poking young_limit.

That's something to discuss. I thought it was using... signals!

At the moment, it just pokes young_limit, without actually sending a Unix signal. (Signals are slow and don't work on Windows). We could easily change this to send an actual signal, perhaps only on those ports that cache young_limit. Benchmarking needed!

I wonder whether the new efficient access to runtime variables via Caml_state makes this caching unnecessary?

On RISC processors the caching makes the allocation sequence one instruction shorter, besides saving one load, of course. This is good for performance. I guess some benchmarking is in order.

When the processor gives you 32 or more integer registers, I think it's worth (performance wise) dedicating one register to the allocation limit. That's a judgment call, not supported by actual benchmarks.

As an equally unsupported guess: the load, compare and branch of the allocation sequence are off the critical path, and the branch is easily predicted, so on an out-of-order processor they won't slow anything down. (The code size of the allocation sequence seems important, though)

lpw25 · 2020-08-03T09:45:30Z

Perhaps another possibility on the architectures that cache the young limit: rely on the polling point insertion that is already going to be used to ensure that non-allocating sections can still be interupted.

stedolan · 2020-08-03T09:50:20Z

That doesn't help here: if the polling points are checking a value cached in a register, we'd need to somehow update that value when the domain is interrupted. Only signal handlers give us a way to poke another thread's registers. (Well, that or ptrace nonsense).

xavierleroy · 2020-08-03T09:53:53Z

A possible compromise: allocation and polling points reload the allocation limit from the state into the dedicated register by default, but allocations that are dominated (in CFG parlance) by another allocation or polling point don't, because the dedicated register already contains a fresh enough value for the allocation limit. Not sure this is worth the extra complexity, though.

lpw25 · 2020-08-03T10:03:02Z

What I meant was really Xavier's suggestion. The polling points insertion code gives us logic to ensure that you don't go unbounded time without hitting such a point -- we can use that logic to make sure to check the global often enough and otherwise use a register.

Fix signals_alloc test

0a1e5b6

stedolan added the no-change-entry-needed label Jul 30, 2020

stedolan mentioned this pull request Jul 30, 2020

Ensure signals are handled before Unix.kill returns #9802

Merged

jhjourdan approved these changes Jul 31, 2020

View reviewed changes

nojb closed this Jul 31, 2020

nojb reopened this Jul 31, 2020

nojb merged commit 0bf255c into ocaml:trunk Aug 2, 2020

Conversation

stedolan commented Jul 30, 2020

Uh oh!

stedolan commented Jul 30, 2020

Uh oh!

stedolan commented Jul 30, 2020

Uh oh!

nojb commented Jul 30, 2020

Uh oh!

xavierleroy commented Jul 30, 2020

Uh oh!

xavierleroy commented Jul 30, 2020

Uh oh!

jhjourdan left a comment

Choose a reason for hiding this comment

Uh oh!

nojb commented Jul 31, 2020

Uh oh!

avsm commented Aug 2, 2020

Uh oh!

nojb commented Aug 2, 2020

Uh oh!

stedolan commented Aug 3, 2020

Uh oh!

lpw25 commented Aug 3, 2020

Uh oh!

stedolan commented Aug 3, 2020

Uh oh!

xavierleroy commented Aug 3, 2020

Uh oh!

lpw25 commented Aug 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants