Skip to content

Use ec->interrupt_mask to prevent interrupts.#14588

Merged
ioquatix merged 2 commits intoruby:masterfrom
ioquatix:ruby-unblock-interrupt-handling
Sep 18, 2025
Merged

Use ec->interrupt_mask to prevent interrupts.#14588
ioquatix merged 2 commits intoruby:masterfrom
ioquatix:ruby-unblock-interrupt-handling

Conversation

@ioquatix
Copy link
Member

@ioquatix ioquatix commented Sep 18, 2025

The following program can segfault without this change:

require 'async/scheduler'

scheduler = Async::Scheduler.new
Fiber.set_scheduler(scheduler)

Signal.trap(:USR1) do
end

q = Thread::Queue.new

Thread.new do
  loop do
    Ractor.new do
      Process.kill(:USR1, $$)
    end.join
  end
end

Fiber.schedule do
  Fiber.schedule do
    1.upto(1000000) do |i|
      sleep 0.01
      q.pop
      q.push(1)
      puts "1 iter push/pop"
    end
  end
  Fiber.schedule do
    1.upto(1000000) do |i|
      sleep 0.01
      q.push(i)
      q.pop
      puts "1 iter push/pop#2"
    end
  end
  Fiber.schedule do
    gets
    exit!
  end
end

The segfault is deliberate due to rb_bug but without that, the program could probably hang indefinitely. I tried to write a test for this but it's too hard.

cc @luke-gruber

@ioquatix ioquatix force-pushed the ruby-unblock-interrupt-handling branch 2 times, most recently from efbc4b2 to 950dbe9 Compare September 18, 2025 00:59
@samuel-williams-shopify
Copy link
Contributor

Example failure:

supervisor (ruby) : [BUG] rb_fiber_scheduler_unblock called with pending interrupt
Unhandled signal SIGABRT in /usr/local/ruby/bin/ruby

@samuel-williams-shopify
Copy link
Contributor

(Original analysis)

The rb_fiber_scheduler_unblock function is vulnerable to a critical race condition when called with pending interrupts (signals). This can result in fibers being left permanently blocked, causing deadlocks and system hangs.

Root Cause

When rb_fiber_scheduler_unblock executes with ec->interrupt_flag set, the following sequence can occur:

Thread A (Signaling)              Thread B (Blocked Fiber)
--------------------              ---------------------
calls unblock(fiber_x)           
├─ enters scheduler user code     fiber_x sleeping in scheduler.block()
├─ ⚠️  SIGNAL INTERRUPT ⚠️       
├─ raises SignalException        
├─ unblock operation aborted     
├─ scheduler state inconsistent   
└─ returns believing success      └─ fiber_x waits forever ⚠️

Why This Is Critical

  1. Lost Wakeup: The signaling thread believes it successfully unblocked the fiber, but the target fiber remains blocked
  2. Inconsistent Scheduler State: The fiber scheduler's internal data structures may be left in a partially updated state
  3. Deadlock Potential: The blocked fiber may be holding resources or conditions that other code depends on
  4. Cross-Thread Communication Failure: The fundamental mechanism for fiber coordination becomes unreliable

The Assertion

#ifdef RUBY_DEBUG
    rb_execution_context_t *ec = GET_EC();
    if (ec->interrupt_flag) {
        rb_bug("rb_fiber_scheduler_unblock called with interrupt flags set");
    }
#endif

This assertion detects the race condition by catching cases where unblock is called unsafely.

Safe Usage Pattern

The correct approach is to use rb_fiber_scheduler_unblock within an EC_TAG push/pop context:

EC_PUSH_TAG(ec);
int state = EC_EXEC_TAG();

if (state == TAG_NONE) {
    // Normal execution - may raise exceptions
    some_blocking_operation();
}

EC_POP_TAG(); // Exception state is now controlled

// SAFE ZONE - no new interrupts processed
if (fiber_needs_unblocking) {
    rb_fiber_scheduler_unblock(fiber); // ✅ Safe here
}

if (state != TAG_NONE) {
    EC_JUMP_TAG(ec, state); // Re-raise exception
}

Why EC_TAG Context Is Safe

The ensure block after EC_POP_TAG() but before EC_JUMP_TAG() creates a signal-safe critical section where:

  1. Exception handling state is controlled and stable
  2. No new interrupts are processed until after re-raise
  3. Critical cleanup operations can complete atomically
  4. The unblock operation is guaranteed to run to completion

Impact

Without this protection, Ruby applications using fiber schedulers can experience:

  • Random deadlocks that are difficult to reproduce
  • Fibers hanging indefinitely under load
  • Unpredictable behavior in signal-heavy environments
  • Silent failures in cross-thread fiber coordination

Solution Requirements

Code calling rb_fiber_scheduler_unblock must either:

  1. Ensure no pending interrupts before calling (process ec->interrupt_flag first)
  2. Use the EC_TAG pattern to call unblock in a protected context
  3. Never call unblock from contexts where signals may be pending

This ensures that fiber unblocking is an atomic, all-or-nothing operation that cannot be interrupted mid-execution.

@samuel-williams-shopify
Copy link
Contributor

samuel-williams-shopify commented Sep 18, 2025

In addition to the above analysis it turns out that it's insufficient - signals delivered during unblock can also cause interrupts, that's what this fix handles.

ioquatix and others added 2 commits September 18, 2025 13:40
Ractors can send signals at any time, so the previous debug assertion
can fail if a Ractor sends a signal.

```ruby
require 'async/scheduler'

scheduler = Async::Scheduler.new
Fiber.set_scheduler(scheduler)

Signal.trap(:INT) do
end

q =  Thread::Queue.new

Thread.new do
  loop do
    Ractor.new do
      Process.kill(:INT, $$)
    end.value
  end
end

Fiber.schedule do
  Fiber.schedule do
    1.upto(1000000) do |i|
      sleep 0.01
      q.pop
      q.push(1)
      puts "1 iter push/pop"
    end
  end
  Fiber.schedule do
    1.upto(1000000) do |i|
      sleep 0.01
      q.push(i)
      q.pop
      puts "1 iter push/pop#2"
    end
  end
end
```
@samuel-williams-shopify samuel-williams-shopify force-pushed the ruby-unblock-interrupt-handling branch from c721a95 to dfe9072 Compare September 18, 2025 01:40
@ioquatix ioquatix merged commit e687940 into ruby:master Sep 18, 2025
84 checks passed
@ioquatix ioquatix deleted the ruby-unblock-interrupt-handling branch September 18, 2025 02:24
ioquatix added a commit to ioquatix/ruby that referenced this pull request Sep 18, 2025
Disallow pending interrupts to be checked during `FiberScheduler#unblock`.

Ractors can send signals at any time, so the previous debug assertion can fail if a Ractor sends a signal.

Co-authored-by: Luke Gruber <luke.gruber@shopify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants