Thread local allocation buffers#484
Thread local allocation buffers#484abbysmal wants to merge 9 commits intoocaml-multicore:parallel_minor_gcfrom
Conversation
2574006 to
76a6013
Compare
… shared minor heaps segment
…and there's still space available on the global minor heap
76a6013 to
b6872b9
Compare
…ntexts where this is not the case.
|
For the |
ctk21
left a comment
There was a problem hiding this comment.
I think the design (nearly) works. There is a possible race that can lead to us losing interrupts (and so deadlock). We have to be extremely careful whenever we touch young_limit because other domains may try to interrupt us using that variable. If we blat the young_limit pointer in caml_reallocate_minor_heap without first checking there isn't an interrupt race, then we are in trouble. We are going to need to change young_limit from the mutator to get new heap segments, I think we just need a CAS that ensures that we only update young_limit if it isn't INTERRUPT_MAGIC.
We should also clean up the tests and make sure things work. With the major_wait_gc_backup I wonder if it is naturally fixed by getting the flow control right in caml_poll_gc_work. An alternative would be to alter the test by changing the assert to look at minor_collections (and if needed upping the make 22 to force a minor collection). I'm keen we have at least one test with multiple domains that exhausts the whole minor heap bump area naturally with multiple domains going hard at their minor heaps. A lot of our tests in tests/parallel are triggering forced collections, which might not exercise the code paths in caml_reallocate_minor_heap that trigger a minor collection.
I have some niggles around safety with commit and exceptions. We also need to make sure we get the flow control right in caml_poll_gc_work and ensure that marking is not starved.
| caml_mem_decommit((void*)domain_self->minor_heap_area, | ||
| domain_self->minor_heap_area_end - domain_self->minor_heap_area); |
There was a problem hiding this comment.
I'm not sure this is safe. Are we really allowed to decommit an area which we might then go and read when doing the minor collection?
Also, what is going to happen for the decommit on the domain that hits the REALLOCATE_HEAP_FULL codepath above?
There was a problem hiding this comment.
I think this works out. This borrows the same logic as the current reallocate code, which is that a segment is decommited before reallocation.
Before it worked because there was full ownership of a specific minor heap segment. (one segment per domain, always the same).
In this branch, every segments that one domain gets is in the rightful position to decommit the memory: everything beyond the global allocation pointer is unused memory (because resetting the global allocation always involves a minor collection).
So everything we decommit has supposedly been collected in the previous minor cycle (and values promoted if need be).
I don't think there is a path where we would touch values in to-be-decommited (beyond the allocation pointer) areas, but I may be missing a subtlety
As for the REALLOCATE_HEAP_FULL, hitting that codepath means there is no segment available, so there is nothing to decommit.
I think the confusing part of this logic is, do we actually need to decommit every segment we fetch?
I kept it the same as our current branch (+ accounting for the fact that there is no ownership of individual minor heaps segments per domains). The current code mentions this:
https://github.com/ocaml-multicore/ocaml-multicore/blob/parallel_minor_gc/runtime/domain.c#L174
/* free old minor heap.
instead of unmapping the heap, we decommit it, so there's
no race whereby other code could attempt to reuse the memory. */
I'm slightly confused by this comment, is there a scenario where other code could attempt to reuse the memory, after performing a minor collection? (which is the pre-requisite for resetting the global allocation pointer)
There was a problem hiding this comment.
(or maybe it is a concurrent minor quirk?)
| uintnat global_minor_heap_ptr; | ||
| uintnat new_alloc_ptr; | ||
|
|
||
| caml_ev_begin("reallocate"); |
There was a problem hiding this comment.
This ev event isn't being closed on every exit from the function.
| the global minor heap segment. */ | ||
| /* TODO(engil): rework logic, double checking requested_minor_gc is odd */ | ||
| if (!Caml_state->requested_minor_gc && | ||
| (atomic_load_explicit(&caml_global_minor_heap_ptr, memory_order_acquire) |
There was a problem hiding this comment.
Do we need this guard here? Can't we just let caml_reallocate_minor_heap deal with all of the logic and parallelism of caml_global_minor_heap_ptr.
| break; | ||
|
|
||
| /* TODO(engil): same as earlier, rework, because in case we did reallocate | ||
| we skip the requested_major_slice branch this time around. */ |
There was a problem hiding this comment.
We need to get the control flow to work here. Maybe we do the 'try to get a new heap slice' before the existing code and have it always run through the rest of the function, with it tripping the young_ptr/young_start test if getting a new slice fails because the heap is exhausted. That will then also check the request_major_slice condition and any further conditions or functionalities that we add in the future.
| to trigger minor_heaps reallocation on GC poll */ | ||
| domain_state->young_start = (char*)caml_minor_heaps_base; | ||
| domain_state->young_end = (char*)caml_minor_heaps_base; | ||
| domain_state->young_limit = (uintnat) caml_minor_heaps_base; |
There was a problem hiding this comment.
We need to have a way to avoid losing interrupts if another domain attempts to signal us via young_limit. I think we can do this by checking that young_limit hasn't been set to INTERRUPT_MAGIC and then atomically updating young_limit. If it has been set to INTERRUPT_MAGIC, then we leave young_limit alone to allow the signal to propagate.
|
Closing as superseded by #508 |
(See #398 and the draft on the wiki)
This PR introduce thread local allocation buffers. (or Domain local allocation buffers.)
With this PR, we change the way we currently allocate minor heaps segment for each domains.
In the current setting, a minor heap segment is given to each domains as we initialize the
Max_domainsslots:caml_minor_heaps_startandcaml_minor_heaps_enddelimits the minor heaps space, and contains exactlyMax_domainssegments, each one being set to each domains slot.When a minor heap is full (any domain's), we proceed to do a minor collection, and reallocate each individual minor heap segments.
This means that each domains currently possess their own address range in the minor heaps area, and they keep the same range during their lifetime.
This PR introduces a new mechanism to allocate minor heaps segments, by using an allocation pointer in the global minor heap segment:
finish_major_gccycle), is for theglobal_minor_heap_ptrto grow past the end of the global minor heaps segment.3526033 addresses one existing issue where the current sizing for the minor heaps space is not properly sized.
The current state of this PR:
Missing parts:
caml_reallocate_minor_heaps,caml_minor_heaps_base,caml_minor_heaps_end... ect)Max_domainsand see if minor heaps math does checks out under various circumstancesspawn_burntests, that does not manually trigger GC, but rather let it happen.Max_domains * Max_minor_heap_size(so 128 * 256k words).