Only add big objarray to remset once#49315
Conversation
|
What's the performance impact of this PR on #49205? |
|
Makes it faster than on 1.8 |
d-netto
left a comment
There was a problem hiding this comment.
The logic for decide whether we need to chunk or push the array into the remset could be a bit cleaner. I'd refactor:
size_t too_big = (obj_end - obj_begin) / MAX_REFS_AT_ONCE > step; // use this order of operations to avoid idiv
jl_value_t **scan_end = obj_end;
if (too_big) {
scan_end = obj_begin + step * MAX_REFS_AT_ONCE;
if ((nptr & 0x3) != 0x2) {
jl_gc_chunk_t c = {GC_objary_chunk, obj_parent, scan_end,
obj_end, NULL, NULL,
step, nptr};
gc_chunkqueue_push(mq, &c);
obj_end = scan_end;
}
}
for (; obj_begin < scan_end; obj_begin += step) {
new_obj = *obj_begin;
if (new_obj != NULL) {
verify_parent2("obj array", obj_parent, obj_begin, "elem(%d)",
gc_slot_to_arrayidx(obj_parent, obj_begin));
gc_try_claim_and_push(mq, new_obj, &nptr);
gc_heap_snapshot_record_array_edge(obj_parent, &new_obj);
}
}
if (too_big) {
if (obj_end != scan_end) {
jl_gc_chunk_t c = {GC_objary_chunk, obj_parent, scan_end,
obj_end, NULL, NULL,
step, nptr};
gc_chunkqueue_push(mq, &c);
}
}
else {
gc_mark_push_remset(ptls, obj_parent, nptr);
}into:
size_t too_big = (obj_end - obj_begin) / MAX_REFS_AT_ONCE > step; // use this order of operations to avoid idiv
jl_value_t **scan_end = obj_end;
if (too_big) {
scan_end = obj_begin + step * MAX_REFS_AT_ONCE;
}
for (; obj_begin < scan_end; obj_begin += step) {
new_obj = *obj_begin;
if (new_obj != NULL) {
verify_parent2("obj array", obj_parent, obj_begin, "elem(%d)",
gc_slot_to_arrayidx(obj_parent, obj_begin));
gc_try_claim_and_push(mq, new_obj, &nptr);
gc_heap_snapshot_record_array_edge(obj_parent, &new_obj);
}
}
if (too_big) {
jl_gc_chunk_t c = {GC_objary_chunk, obj_parent, scan_end,
obj_end, NULL, NULL,
step, nptr};
gc_chunkqueue_push(mq, &c);
}
else {
gc_mark_push_remset(ptls, obj_parent, nptr);
}Same change for gc_mark_array8/16.
|
As long as you don't care that your change eliminates the possibility of intra-array thread-stealing and thus slightly reduces parallelism of threaded GC, yeah, that rearrangement is fine. |
This should be fine. If an array is large enough to be chunked then it will expose a lot of |
|
Also, in #48600 we prioritize stealing pointers before chunks. |
src/gc.c
Outdated
| if (too_big) { | ||
| if (ary8_end != scan_end) { | ||
| jl_gc_chunk_t c = {GC_objary_chunk, ary8_parent, scan_end, | ||
| ary8_end, NULL, NULL, |
There was a problem hiding this comment.
Replace NULL, NULL here with elem_begin, elem_end. Same for array16.
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Fixes #49205