Skip to content

Document gc_regs and gc_regs_buckets#11437

Merged
xavierleroy merged 2 commits intoocaml:trunkfrom
gasche:document-gc-regs
Aug 28, 2022
Merged

Document gc_regs and gc_regs_buckets#11437
xavierleroy merged 2 commits intoocaml:trunkfrom
gasche:document-gc-regs

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Jul 15, 2022

This PR was motivated by the discussion of #11406. cc @fabbing @Engil @Octachron @xavierleroy.

I realized while writing the documentation that there are things I don't understand.

  • @xavierleroy asks why the gc_regs buckets are now allocated on the C stack, rather than being stack-allocated as with the 4.x runtime. I don't know! I thought of the following possible explanations:
    • maybe it's easier to write the runtime code if the size of the "stack metadata" (that previously contained gc_regs) has the same size on all architectures?
    • maybe there is a small performance advantage in not growing the stack by so many registers? (the stack is copied by the realloc_stack logic at the beginning, and it starts small at 64 words)
  • I'm not sure which functions need to save all registers in gc_regs and why. Obviously caml_call_gc does, because the GC needs to walk the OCaml registers for roots. But if I read the assembly code correctly, the (only) other runtime function doing this is caml_call_realloc_stack, and I don't know why. As far as I can tell, this function never transfers control back to OCaml code, and it does not call the GC (stacks are allocated by fiber.c:alloc_for_stack on the C heap).

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 15, 2022

Note: my assembler-fu is pretty weak. It's easy to see where Caml_state(gc_regs_bucket) is popped (and then pushed again) when saving all registers from OCaml. On the other hand, it's not easy for me to understand where Caml_state(gc_regs) is saved on the stack when calling OCaml code from C. I understand that it happens at

ocaml/runtime/amd64.S

Lines 696 to 698 in 182fd48

/* Store the gc_regs for callbacks during a GC */
movq Caml_state(gc_regs), %r11
movq %r11, 8(%r10)

but I would expect that the structure that is in %r10 at this point is also modelled or at least documented somewhere in the C code of the runtime, and I would expect to find documentation that explains that the previous value of gc_regs is saved as the second word in this block. I was not able to find such a reference.

@xavierleroy
Copy link
Copy Markdown
Contributor

I would expect that the structure that is in %r10 at this point is also modelled or at least documented somewhere in the C code of the runtime

Fat chance. But here is where it is used on the C side:

ocaml/runtime/fiber.c

Lines 272 to 277 in 182fd48

/* This marks the top of an ML stack chunk. Move sp to the previous
* stack chunk. */
sp += 3 * sizeof(value); /* trap frame & DWARF pointer */
regs = *(value**)sp; /* update gc_regs */
sp += 1 * sizeof(value); /* gc_regs */
goto next_chunk;

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 15, 2022

Thanks the pointer! I was expecting to find documentation in

/* Stack layout for native code. Stack grows downwards.
*
* +------------------------+
* | struct stack_handler |
* +------------------------+ <--- Stack_high
* | caml_runstack / |
* | caml_start_program |
* +------------------------+
* | |
* . OCaml frames . <--- sp
* | |
* +------------------------+ <--- Stack_threshold
* | |
* . Red Zone .
* | |
* +------------------------+ <--- Stack_base
* | struct stack_info |
* +------------------------+ <--- Caml_state->current_stack
*/

I realize now that gc_regs is stored on the stack precisely inside the caml_runstack / caml_start_program stack frame that is depicted in this figure. This is clearer on the Multicore Wiki, but my impression is that the Wiki documentation is partly outdated (I don't see any trace of last_retaddr in the current implementation, for example.)

The stack-walking code that you point out (and some variants of it in backtrace_nat.c and frame_descriptors.c) mention a notion of "stack chunk" that is not explained in the documentation. I tried to piece it back together from the code but it will need another try before I can come up with a coherent description. (I think that an OCaml "stack" is made of several "chunks", one per transfer from C to OCaml, which are adjacent to each other and each start with a caml_start_program frame. But is the diagram of fiber.h the depiction of a single stack chunk, or are there implicitly several chunks in the diagram? This is all still mysterious to me.)

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 15, 2022

(cc @kayceesrk who I guess worked quite a bit on this code, and could probably answer my questions above.)

@xavierleroy
Copy link
Copy Markdown
Contributor

a notion of "stack chunk" that is not explained in the documentation

I think this comes from the OCaml 4 code. In OCaml 4 the native stack contains both OCaml function frames and C function frames. A "stack chunk" is a set of contiguous OCaml frames that can be scanned linearly. Such a chunk terminates at a C function call, and if the C code calls back into OCaml, a new chunk starts. The stack scanning code needs to jump from the end of the first chunk to the beginning of the second chunk¸ jumping over the C stack frames.

In OCaml 5, OCaml frames and C frames are on different stacks. However, callbacks from C to OCaml still need special treatment when scanning the OCaml stacks.

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 21, 2022

Thanks for the clarification. Here is my current understanding:

  1. The diagram in caml/fiber.h

/* Stack layout for native code. Stack grows downwards.
 *
 * +------------------------+
 * |  struct stack_handler  |
 * +------------------------+ <--- Stack_high
 * |    caml_runstack /     |
 * |   caml_start_program   |
 * +------------------------+
 * |                        |
 * .      OCaml frames      . <--- sp
 * |                        |
 * +------------------------+ <--- Stack_threshold
 * |                        |
 * .        Red Zone        .
 * |                        |
 * +------------------------+ <--- Stack_base
 * |   struct stack_info    |
 * +------------------------+ <--- Caml_state->current_stack
 */

describes a "stack".

  1. In general with effects there are many stacks, each created by an effect handler and "switched out" when performing an effect.

  2. Calls to OCaml functions never create a new stack, instead they dynamically resize the current stack if they need more space.

  3. C calls are made on a separate "C stack": control may switch back and forth between the C stack and the "current stack" (for OCaml function calls).

  4. A stack is formed of several consecutive "stack chunks" that are located between Stack_high and Stack_threshold. A new stack chunk is started when calling an OCaml function from the C stack. Each chunk starts with a caml_start_program frame that stores enough information to walk the stack or transfer control back to C on return.

I welcome feedback if there is something very wrong in the description above. Otherwise I may send a PR later to extend the documentation comments of fiber.h to contain these details. (Let's avoid feature-creep on this one PR which should stay focused on gc_regs.)

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Jul 21, 2022

git blame points to 46fe3a0 (by @stedolan in 2018) as the source of the current gc_regs logic. The diff of this commits answer my two questions I believe.

  1. Before this commit, the gc_regs were stored on the (OCaml) call stack just like in 4.x, not mallocated.
  2. Before this commit, caml_try_realloc_stack was previously saving some registers only (I guess the caller-saves registers?), not all registers.

My impression of the commit is that the answer to both questions is "it's just simpler to do it that way" (to malloc instead of stack-allocating the gc_regs, to save all registers instead of having a slightly different macro to save less registers).

@xavierleroy
Copy link
Copy Markdown
Contributor

I completely agree that it's simpler (and OK) to save and restore all registers (and not just caller-save registers) in caml_try_realloc_stack: it's good to share the corresponding assembly code with caml_call_gc, which does need to save all registers for GC interface.

I'd still like to be reminded the reason why gc_regs is allocated with malloc. I think it was mentioned during the Multicore integration working groups, and I think it goes beyond "it makes the code simpler", but I can't remember the details...

@fabbing
Copy link
Copy Markdown
Contributor

fabbing commented Jul 29, 2022

I won't be able to provide a definite response to why it's malloc-ed but I'll try to provide some arguments for the simplification of the code:

If a fiber registers were saved into its stack and the fiber stack needed to grow, the whole fiber needing to be reallocated, the saved registers would also have to be moved in memory.
This would have the cost of copying all the saved registers from the old fiber structure to the new one and would also require updating gc_regs to the new bucket address (which was the purpose of the removed caml_update_gc_regs_slot function).

Currently caml_call_realloc_stack doesn't even bother to use gc_regs, it picks the available bucket from gc_regs_buckets, SAVE_ALL_REGS pops its address in r15 and uses it to save registers in the bucket. r15 being a callee-save register in x86-64, it directly calls RESTORE_ALL_REGS, after the stack reallocation attempt, to restore the registers from the gc_regs bucket address still in r15.
Having registers saved on the fiber would make this impossible without using caml_state->gc_regs, updating it during the stack reallocation and re-reading it to get the bucket address from which to restore the registers.

Interestingly (caml_call_gc) actually uses caml_state->gc_regs to save and restore the gc_regs bucket address...

@gasche
Copy link
Copy Markdown
Member Author

gasche commented Aug 25, 2022

Current status of the present PR:

  1. We haven't received a clarification by @stedolan, but my non-expert understanding of @fabbing's response above (thanks!) is that it noticeably simplifies the code to use malloc here. I don't know if @xavierleroy is fully satisfied, but in any case I would suggest to not delay arbitrarily the present PR, which documents the current mechanism, with a question about the relation to previous code that I cannot answer myself.

  2. My own questions about the difference between a "stack", as documented in fiber.h, and a "stack chunk" (as not documented yet) suggest that additional documentation on this point could be useful for the runtime code. Ideally I would like to add such a clarification to the present PR, but I am not sure that I will actually get to doing it (I'm in holidays this week still and expect to be busy with many things when I get back). So I would be in favor of not taking this extra task as a hard requirement for the present PR. What do people think?

Copy link
Copy Markdown
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra documentation, which is an improvement.

I'm still disappointed that the original authors of this code did not reply to this PR by explaining why they chose to allocate gc_regs with malloc and not on the stack, as in OCaml 4. I'm pretty sure this has to do with keeping the minimal size of a stack / a fiber small, and not having to copy the gc_regs arrays when growing the stack / the fiber.

@xavierleroy xavierleroy merged commit 6a0471e into ocaml:trunk Aug 28, 2022
@gasche
Copy link
Copy Markdown
Member Author

gasche commented Aug 29, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants