Skip to content

Conversation

@stedolan
Copy link
Contributor

The patch

The bytecode VM uses a fixed 7-instruction sequence of bytecode to set up a call from C into OCaml. This bytecode sequence requires initialisation, to register the code fragment (and thread the bytecode, if not a debug build).

Currently, this code sequence is defined per-thread and initialised on first use. This patch changes it to be defined globally and initialised at startup. (The main thread's copy was already initialised at startup through caml_init_callbacks. The change here is to make that the only copy).

Using a code sequence per thread instead of one global one is somewhat less efficient but not by itself a bug. The problem is that the code fragments leaked, as they were not unregistered at thread termination. This causes #13512.

The bug

With the old code, the following sequence of unfortunate events can occur, if you are very, very unlucky (or you are @jmid with multicoretests, manufacturing bad luck on an industrial scale):

  1. The program uses Domain.spawn f, which creates a new domain and uses caml_callback_res to invoke f.
  2. When caml_callback_res is called, the callback initialisation logic runs, creating a 28-byte code fragment. This fragment covers the 7-instruction callback sequence stored in thread-local storage.
  3. Later, the domain terminates. The C library / OS frees resources used by its threads, including the memory for thread-local storage. The 28-byte code fragment remains registered.
  4. Later, another domain does a minor GC, and runs out of space on the major heap to promote live blocks to. It requests more memory from the OS, which happens to return a 32K pool that includes the 28 bytes that are still registered as a code fragment. A particularly unlucky value is promoted into those 28 bytes.
  5. This unlucky value remains live, but happens not to be pointed to by any other heap value: all references to it are from local variables (so, on the bytecode stack).
  6. At the start of the next major GC cycle, the runtime scans the stack. The bytecode stack contains an arbitrary mix of values and code pointers (return addresses & exception handlers), so the GC skips over any pointers it finds that lie inside code fragments. So, it does not see that the unlucky value is live.
  7. The unlucky value gets collected, despite being in use, causing chaos when it's next used.

In other words: dangling code fragments that are not unregistered are not just a resource leak but a soundness issue. They leave blindspots in the address space that the GC cannot perceive and cause weird bugs later on.

The callback thunk code is now initialised once, not per-thread,
avoiding leaking of code fragments.
@ghost
Copy link

ghost commented Oct 14, 2024

This diff appears to solve indeed the bytecode issues (#13402 and probably others) for me.

However, are you sure it is safe to use a single copy for the whole program, as two elements of callback_code get modified everytime caml_callbackN_exn is invoked? In other words, are you sure there is no risk of two domains attempting to modify and use callback_code at the same time?

@stedolan
Copy link
Contributor Author

Urgh, well spotted, I'd forgotten it actually mutates it after initialisation. I'll update this with a better fix.

Instead of mutating bytecode, this version uses the same mechanism
in native code and bytecode: separate code exists for 1, 2 and 3
arguments, and more arguments are handled by iterating callback3.
@stedolan
Copy link
Contributor Author

Just pushed a version that (hopefully) fixes the bug without introducing a new one.

It removes the optimised caml_callbackN for bytecode, and instead uses fixed code sequences for caml_callback1/2/3, and iterates caml_callback3 for more arguments. (This is the mechanism used by native code callbacks).

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is smart and looks correct to me. And it still appears to work.

@kayceesrk
Copy link
Contributor

This looks correct to me. However, I'm not an expert on this part of the runtime. Perhaps a review from someone who understands this better would be useful.

@Octachron, we should consider this for 5.3.

@gasche
Copy link
Member

gasche commented Oct 15, 2024

My impression is that @xavierleroy would be the right person to look at this.

@Octachron Octachron added this to the 5.3 milestone Oct 15, 2024
@xavierleroy
Copy link
Contributor

Thanks for pinging me. From a quick look, I agree we have a problem here, but I'm a bit sad about mirroring the native-code behavior (using "slices" of at most 3 arguments for callbacks) in the bytecode behavior. My feeling is that it's time to change a bit the API of caml_interprete to make bytecode callbacks easier. Can I have a couple of weeks to look into this before this PR is merged?

@xavierleroy
Copy link
Contributor

Also: just exposing init_callback and uninit_callback functions, and calling them when threads/domains start/stop would solve the issue with fewer changes to the existing code.

xavierleroy added a commit to xavierleroy/ocaml that referenced this pull request Oct 15, 2024
- Use the new `caml_bytecode_interpreter` API to jump straight to the
  function's code.
- Avoid modifying bytecode in place.
- Avoid per-thread bytecode.
- Register bytecode early and only once.

Fixes:  ocaml#13512
Closes: ocaml#13549
@xavierleroy
Copy link
Contributor

That was easier than I thought. See #13553 for my alternate proposal.

@dra27 dra27 linked an issue Oct 16, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC crashes under bytecode

5 participants