Skip to content

Always take th->interrupt_lock in ubf_clear#16362

Merged
luke-gru merged 1 commit intoruby:masterfrom
luke-gruber:ubf_clear_fix
Mar 11, 2026
Merged

Always take th->interrupt_lock in ubf_clear#16362
luke-gru merged 1 commit intoruby:masterfrom
luke-gruber:ubf_clear_fix

Conversation

@luke-gruber
Copy link
Copy Markdown
Contributor

@luke-gruber luke-gruber commented Mar 10, 2026

Patch 0837263 fixed a race condition on ubfs, but it's only valid if right after a call to ubf_clear, we assume the ubf function cannot be in the middle of running. This patch removes an optimization in ubf_clear that violates that assumption. In short, ubf_clear needs to take th->interrupt_lock unconditionally both to avoid deadlocks and to be able to reason about when ubfs can be run.

This should fix CI errors like https://ci.rvm.jp/results/trunk-jemalloc@ruby-sp2-noble-docker/6242153. The error was in test_timeout.rb, which had a deadlock during VM shutdown.

r = Ractor.new do
begin
    Timeout.timeout(0.1) { sleep }
rescue Timeout::Error
    :ok
end
end.value

assert_equal :ok, r

The deadlock happened during rb_ractor_terminate_interrupt_main_thread with 2 ractors:

  1. r1 t1: UBF called with t2->interrupt_lock (ubf = ubf_waiting)
  2. r2 t2: ubf cleared from previous thread_sched_wait_events_call (but no lock taken, because of optimization)
  3. r2 t2: thread_sched_wait_events: acquire thread_sched_lock(t2) (caller calling native_sleep() in loop)
  4. r2 t2: ubf_set: try to acquire t2->interrupt_lock [block]
  5. r1 t1: try to acquire thread_sched_lock(t2) [block, deadlock]

t2 needs to block on t2->interrupt_lock in step 2 until the ubf has completed. Only then can it register a new ubf in the next native_sleep iteration.

Patch 0837263 fixed a race condition on ubfs, but it's only valid if right after
a call to `ubf_clear`, we assume the ubf function cannot be in the middle of running.
This patch removes an optimization in `ubf_clear` that violates that assumption. In short,
`ubf_clear` needs to take `th->interrupt_lock` unconditionally both to avoid deadlocks and to be
able to reason about when ubfs can be run.

This should fix CI errors like https://ci.rvm.jp/results/trunk-jemalloc@ruby-sp2-noble-docker/6242153.
The error was in test_timeout.rb, which had a deadlock during VM shutdown.

```ruby
r = Ractor.new do
begin
    Timeout.timeout(0.1) { sleep }
rescue Timeout::Error
    :ok
end
end.value

assert_equal :ok, r
```

The deadlock happened during `rb_ractor_terminate_interrupt_main_thread` with 2 ractors:

1) r1 t1: UBF called with t2->interrupt_lock (ubf = ubf_waiting)
2) r2 t2: ubf cleared from previous thread_sched_wait_events_call (but no lock taken, because of optimization)
3) r2 t2: thread_sched_wait_events: acquire thread_sched_lock(t2) (caller calling native_sleep() in loop)
4) r2 t2: ubf_set: try to acquire t2->interrupt_lock [block]
5) r1 t1: try to acquire thread_sched_lock(t2) [block, deadlock]

t2 needs to block on t2->interrupt_lock in step 2 until the ubf has completed. Only then can it register a new
ubf in the next `native_sleep` iteration.
@launchable-app
Copy link
Copy Markdown

launchable-app bot commented Mar 10, 2026

1/68676 Tests Failed

test/ruby/test_gc.rb#test_stat_heap_constraints
Failure:
TestGc#test_stat_heap_constraints [/Users/runner/work/ruby/ruby/src/test/ruby/test_gc.rb:292]:
<362064> expected but was
<362065>.

[-> View Test suite health in main branch]

@luke-gru luke-gru merged commit d72a0fe into ruby:master Mar 11, 2026
94 of 96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants