Skip to content
This repository was archived by the owner on Jun 21, 2024. It is now read-only.

Thread local allocation buffers#484

Closed
abbysmal wants to merge 9 commits intoocaml-multicore:parallel_minor_gcfrom
abbysmal:thread_local_allocation_buffers
Closed

Thread local allocation buffers#484
abbysmal wants to merge 9 commits intoocaml-multicore:parallel_minor_gcfrom
abbysmal:thread_local_allocation_buffers

Conversation

@abbysmal
Copy link
Copy Markdown
Collaborator

@abbysmal abbysmal commented Mar 5, 2021

(See #398 and the draft on the wiki)
This PR introduce thread local allocation buffers. (or Domain local allocation buffers.)

With this PR, we change the way we currently allocate minor heaps segment for each domains.

In the current setting, a minor heap segment is given to each domains as we initialize the Max_domains slots:

caml_minor_heaps_start and caml_minor_heaps_end delimits the minor heaps space, and contains exactly Max_domains segments, each one being set to each domains slot.

When a minor heap is full (any domain's), we proceed to do a minor collection, and reallocate each individual minor heap segments.

This means that each domains currently possess their own address range in the minor heaps area, and they keep the same range during their lifetime.

This PR introduces a new mechanism to allocate minor heaps segments, by using an allocation pointer in the global minor heap segment:

  • Domains won't have a dedicated address range on the global minor heap, but will instead bump this allocation pointer to reserve a new minor heap segment from the global heap.
  • When an individual minor heap runs out, a domain is allowed to fetch a new minor heap segment from the global_minor_heap instead.
  • The new condition for organically triggering a minor collection (outside of a finish_major_gc cycle), is for the global_minor_heap_ptr to grow past the end of the global minor heaps segment.

3526033 addresses one existing issue where the current sizing for the minor heaps space is not properly sized.

The current state of this PR:

  • The bump pointer allocation logic is implemented
  • Greedily reallocating minor heap segments (ie. one domain can request another segment when it runs out of minor heap space) is done

Missing parts:

  • Terminology adjustment: for the sake of keeping the core logic easy to review, renaming has not yet taken place. (we may want to rename caml_reallocate_minor_heaps, caml_minor_heaps_base, caml_minor_heaps_end... ect)
  • Testsuite should be extended to include:
    • Spawning up to Max_domains and see if minor heaps math does checks out under various circumstances
    • More organic versions of the spawn_burn tests, that does not manually trigger GC, but rather let it happen.
  • Benchmarking, and tweaking sizes: this PR takes not time to revisit the currents sizes: a minor heap segment is equal to the current default size. Which means a single busy domain can right now make usage of a minor heap space up to Max_domains * Max_minor_heap_size (so 128 * 256k words).
    • Related point: dynamically resizing the minor heap segment size?

@abbysmal abbysmal force-pushed the thread_local_allocation_buffers branch 2 times, most recently from 2574006 to 76a6013 Compare March 5, 2021 13:28
@abbysmal abbysmal force-pushed the thread_local_allocation_buffers branch from 76a6013 to b6872b9 Compare March 5, 2021 13:31
@abbysmal
Copy link
Copy Markdown
Collaborator Author

abbysmal commented Mar 8, 2021

For the pr7798 failure, it is related to one stat breaking (minor_words).
https://github.com/ocaml-multicore/ocaml-multicore/blob/parallel_minor_gc/runtime/minor_gc.c#L530
The reason why is because the computation for this is not reliable anymore as one domain may have cycled through a few minor heaps segments before this statistic is collected.
One way to handle it could be to store it in the domain state (and update it each time we cycle through a new minor heap).
I do not quite like this as it adds requirements on identifying why a minor heap segment was fetched. (domain spawn vs domain consuming all its minor heap vs end of a minor collection)

Copy link
Copy Markdown
Collaborator

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design (nearly) works. There is a possible race that can lead to us losing interrupts (and so deadlock). We have to be extremely careful whenever we touch young_limit because other domains may try to interrupt us using that variable. If we blat the young_limit pointer in caml_reallocate_minor_heap without first checking there isn't an interrupt race, then we are in trouble. We are going to need to change young_limit from the mutator to get new heap segments, I think we just need a CAS that ensures that we only update young_limit if it isn't INTERRUPT_MAGIC.

We should also clean up the tests and make sure things work. With the major_wait_gc_backup I wonder if it is naturally fixed by getting the flow control right in caml_poll_gc_work. An alternative would be to alter the test by changing the assert to look at minor_collections (and if needed upping the make 22 to force a minor collection). I'm keen we have at least one test with multiple domains that exhausts the whole minor heap bump area naturally with multiple domains going hard at their minor heaps. A lot of our tests in tests/parallel are triggering forced collections, which might not exercise the code paths in caml_reallocate_minor_heap that trigger a minor collection.

I have some niggles around safety with commit and exceptions. We also need to make sure we get the flow control right in caml_poll_gc_work and ensure that marking is not starved.

Comment on lines 208 to 209
caml_mem_decommit((void*)domain_self->minor_heap_area,
domain_self->minor_heap_area_end - domain_self->minor_heap_area);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is safe. Are we really allowed to decommit an area which we might then go and read when doing the minor collection?

Also, what is going to happen for the decommit on the domain that hits the REALLOCATE_HEAP_FULL codepath above?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works out. This borrows the same logic as the current reallocate code, which is that a segment is decommited before reallocation.
Before it worked because there was full ownership of a specific minor heap segment. (one segment per domain, always the same).

In this branch, every segments that one domain gets is in the rightful position to decommit the memory: everything beyond the global allocation pointer is unused memory (because resetting the global allocation always involves a minor collection).
So everything we decommit has supposedly been collected in the previous minor cycle (and values promoted if need be).
I don't think there is a path where we would touch values in to-be-decommited (beyond the allocation pointer) areas, but I may be missing a subtlety

As for the REALLOCATE_HEAP_FULL, hitting that codepath means there is no segment available, so there is nothing to decommit.

I think the confusing part of this logic is, do we actually need to decommit every segment we fetch?
I kept it the same as our current branch (+ accounting for the fact that there is no ownership of individual minor heaps segments per domains). The current code mentions this:
https://github.com/ocaml-multicore/ocaml-multicore/blob/parallel_minor_gc/runtime/domain.c#L174

 /* free old minor heap.
     instead of unmapping the heap, we decommit it, so there's
     no race whereby other code could attempt to reuse the memory. */

I'm slightly confused by this comment, is there a scenario where other code could attempt to reuse the memory, after performing a minor collection? (which is the pre-requisite for resetting the global allocation pointer)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or maybe it is a concurrent minor quirk?)

uintnat global_minor_heap_ptr;
uintnat new_alloc_ptr;

caml_ev_begin("reallocate");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ev event isn't being closed on every exit from the function.

the global minor heap segment. */
/* TODO(engil): rework logic, double checking requested_minor_gc is odd */
if (!Caml_state->requested_minor_gc &&
(atomic_load_explicit(&caml_global_minor_heap_ptr, memory_order_acquire)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this guard here? Can't we just let caml_reallocate_minor_heap deal with all of the logic and parallelism of caml_global_minor_heap_ptr.

break;

/* TODO(engil): same as earlier, rework, because in case we did reallocate
we skip the requested_major_slice branch this time around. */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get the control flow to work here. Maybe we do the 'try to get a new heap slice' before the existing code and have it always run through the rest of the function, with it tripping the young_ptr/young_start test if getting a new slice fails because the heap is exhausted. That will then also check the request_major_slice condition and any further conditions or functionalities that we add in the future.

to trigger minor_heaps reallocation on GC poll */
domain_state->young_start = (char*)caml_minor_heaps_base;
domain_state->young_end = (char*)caml_minor_heaps_base;
domain_state->young_limit = (uintnat) caml_minor_heaps_base;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a way to avoid losing interrupts if another domain attempts to signal us via young_limit. I think we can do this by checking that young_limit hasn't been set to INTERRUPT_MAGIC and then atomically updating young_limit. If it has been set to INTERRUPT_MAGIC, then we leave young_limit alone to allow the signal to propagate.

@sadiqj sadiqj self-requested a review March 16, 2021 12:43
@ctk21
Copy link
Copy Markdown
Collaborator

ctk21 commented Apr 14, 2021

Closing as superseded by #508

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants