Skip to content

[Bug #18119] Fix crash when instantiating classes in Ractors#13284

Merged
peterzhu2118 merged 1 commit intoruby:masterfrom
peterzhu2118:ractor-crash-bug-18119
May 9, 2025
Merged

[Bug #18119] Fix crash when instantiating classes in Ractors#13284
peterzhu2118 merged 1 commit intoruby:masterfrom
peterzhu2118:ractor-crash-bug-18119

Conversation

@peterzhu2118
Copy link
Member

When we create classes, it pushes the class to the subclass list of the superclass. This access needs to be synchronized because multiple Ractors may be creating classes with the same superclass, which would cause race conditions and cause the linked list to be corrupted.

For example, we can reproduce with this script crashing:

workers = (0...8).map do
  Ractor.new do
    loop do
      100.times.map { Class.new }
      Ractor.yield nil
    end
  end
end

100.times { Ractor.select(*workers) }

With ASAN enabled, we can see that there are use-after-free errors:

==176013==ERROR: AddressSanitizer: heap-use-after-free on address 0x5030000974f0 at pc 0x62f9e56f892d bp 0x7a503f1ffd90 sp 0x7a503f1ffd88
WRITE of size 8 at 0x5030000974f0 thread T4
    #0 0x62f9e56f892c in rb_class_remove_from_super_subclasses class.c:149:24
    #1 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
    #2 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
    #3 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
    #4 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
    #5 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
    #6 0x62f9e58fac93 in gc_start gc/default/default.c:6402:13
    #7 0x62f9e58e8b69 in heap_prepare gc/default/default.c:2032:13
    #8 0x62f9e58e8b69 in heap_next_free_page gc/default/default.c:2255:9
    #9 0x62f9e58e8b69 in newobj_cache_miss gc/default/default.c:2362:38
...
0x5030000974f0 is located 16 bytes inside of 24-byte region [0x5030000974e0,0x5030000974f8)
freed by thread T4 here:
    #0 0x62f9e562f28a in free (miniruby+0x1fd28a) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
    #1 0x62f9e58ca2ab in rb_gc_impl_free gc/default/default.c:8102:9
    #2 0x62f9e58ca2ab in ruby_sized_xfree gc.c:5029:13
    #3 0x62f9e58ca2ab in ruby_xfree gc.c:5040:5
    #4 0x62f9e56f88e6 in rb_class_remove_from_super_subclasses class.c:152:9
    #5 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
    #6 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
    #7 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
    #8 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
    #9 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
...
previously allocated by thread T5 here:
    #0 0x62f9e562f70d in calloc (miniruby+0x1fd70d) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
    #1 0x62f9e58c8e1a in calloc1 gc/default/default.c:1472:12
    #2 0x62f9e58c8e1a in rb_gc_impl_calloc gc/default/default.c:8138:5
    #3 0x62f9e58c8e1a in ruby_xcalloc_body gc.c:4964:12
    #4 0x62f9e58c8e1a in ruby_xcalloc gc.c:4958:34
    #5 0x62f9e56f906e in push_subclass_entry_to_list class.c:88:13
    #6 0x62f9e56f906e in rb_class_subclass_add class.c:111:38
    #7 0x62f9e56f906e in RCLASS_SET_SUPER internal/class.h:257:9
    #8 0x62f9e56fca7a in make_metaclass class.c:786:5
    #9 0x62f9e59db982 in rb_class_initialize object.c:2101:5

@peterzhu2118 peterzhu2118 force-pushed the ractor-crash-bug-18119 branch from 45a5604 to 21e079e Compare May 8, 2025 20:25
class.c Outdated
}
entry->next = head->next;
entry->prev = head;
RB_VM_LOCK_ENTER_NO_BARRIER();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is RB_VM_LOCK_ENTER_NO_BARRIER() correct or it should be RB_VM_LOCK_ENTER()? I'm not clear on the difference, after taking a quick look at the sources.
It seems most places use RB_VM_LOCK_ENTER()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RB_VM_LOCK_ENTER will trigger a barrier which stops other Ractors. For correctness here, we just need to ensure that two Ractors do not modify the same superclass at the same time, so we can use the no barrier variant to ensure that.

However, I suspect that reading the subclass list is not synchronized. So by not having a barrier here we run the risk of race conditions in other Ractors that may read an incorrect list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's right. RB_VM_LOCK_ENTER should not trigger a barrier, however it will join a barrier if another Ractor has requested one.

IMO we should use RB_VM_LOCK_ENTER where possible (it's better to have the barrier complete sooner rather than later), and use RB_VM_LOCK_ENTER_NO_BARRIER where we can't allow GC to happen (where the original commit introducing it provides a bit of context: 0ebf6bd).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah I'm convinced that RB_VM_LOCK_ENTER doesn't actually trigger a barrier. I switched it to that 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad I asked, this "barrier" is not exactly trivial or the usual meaning of barrier :)
It would be nice to document what RB_VM_LOCK_ENTER_NO_BARRIER is for/when it should be used in the source code, cc @ko1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RB_VM_LOCK_ENTER get lock and if another Ractor is acquiring the barrier wait, the ractor join this barrier (wait for the barrier locking of another Ractor is finished, maybe by GC).

RB_VM_LOCK_ENTER_NO_BARRIER doesn't join this barrier waiting.

[Bug #18119]

When we create classes, it pushes the class to the subclass list of the
superclass. This access needs to be synchronized because multiple Ractors
may be creating classes with the same superclass, which would cause race
conditions and cause the linked list to be corrupted.

For example, we can reproduce with this script crashing:

    workers = (0...8).map do
      Ractor.new do
        loop do
          100.times.map { Class.new }
          Ractor.yield nil
        end
      end
    end

    100.times { Ractor.select(*workers) }

With ASAN enabled, we can see that there are use-after-free errors:

    ==176013==ERROR: AddressSanitizer: heap-use-after-free on address 0x5030000974f0 at pc 0x62f9e56f892d bp 0x7a503f1ffd90 sp 0x7a503f1ffd88
    WRITE of size 8 at 0x5030000974f0 thread T4
        #0 0x62f9e56f892c in rb_class_remove_from_super_subclasses class.c:149:24
        #1 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
        ruby#2 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
        ruby#3 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
        ruby#4 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
        ruby#5 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
        ruby#6 0x62f9e58fac93 in gc_start gc/default/default.c:6402:13
        ruby#7 0x62f9e58e8b69 in heap_prepare gc/default/default.c:2032:13
        ruby#8 0x62f9e58e8b69 in heap_next_free_page gc/default/default.c:2255:9
        ruby#9 0x62f9e58e8b69 in newobj_cache_miss gc/default/default.c:2362:38
    ...
    0x5030000974f0 is located 16 bytes inside of 24-byte region [0x5030000974e0,0x5030000974f8)
    freed by thread T4 here:
        #0 0x62f9e562f28a in free (miniruby+0x1fd28a) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
        #1 0x62f9e58ca2ab in rb_gc_impl_free gc/default/default.c:8102:9
        ruby#2 0x62f9e58ca2ab in ruby_sized_xfree gc.c:5029:13
        ruby#3 0x62f9e58ca2ab in ruby_xfree gc.c:5040:5
        ruby#4 0x62f9e56f88e6 in rb_class_remove_from_super_subclasses class.c:152:9
        ruby#5 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
        ruby#6 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
        ruby#7 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
        ruby#8 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
        ruby#9 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
    ...
    previously allocated by thread T5 here:
        #0 0x62f9e562f70d in calloc (miniruby+0x1fd70d) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
        #1 0x62f9e58c8e1a in calloc1 gc/default/default.c:1472:12
        ruby#2 0x62f9e58c8e1a in rb_gc_impl_calloc gc/default/default.c:8138:5
        ruby#3 0x62f9e58c8e1a in ruby_xcalloc_body gc.c:4964:12
        ruby#4 0x62f9e58c8e1a in ruby_xcalloc gc.c:4958:34
        ruby#5 0x62f9e56f906e in push_subclass_entry_to_list class.c:88:13
        ruby#6 0x62f9e56f906e in rb_class_subclass_add class.c:111:38
        ruby#7 0x62f9e56f906e in RCLASS_SET_SUPER internal/class.h:257:9
        ruby#8 0x62f9e56fca7a in make_metaclass class.c:786:5
        ruby#9 0x62f9e59db982 in rb_class_initialize object.c:2101:5
@peterzhu2118 peterzhu2118 force-pushed the ractor-crash-bug-18119 branch from 21e079e to 74a5df8 Compare May 8, 2025 21:34
@peterzhu2118 peterzhu2118 merged commit f30f0f0 into ruby:master May 9, 2025
83 checks passed
@peterzhu2118 peterzhu2118 deleted the ractor-crash-bug-18119 branch May 9, 2025 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants