In ruby/net-http#197 I patched an issue in the net-http tests where a server thread could get stuck in a loop by being "double-interrupted": killed and then closed before the kill can start to propagate. The issue here is that JRuby runs these things in parallel, so internally, the following can happen:
- Main thread kills blocked thread.
- Blocked thread wakes up and processes the kill request, clearing its interrupt queue but not yet removing itself from the IO blocked thread list.
- Main proceeds to close the IO, which sees the thread is still blocked and issues a second interrupt to raise IOError.
- As the thread propagates the kill, it may run Ruby code and check for interrupts again; it sees the raise interrupt and propagates that instead of the kill.
These are tricky things to coordinate because there's a lot of shared state here: the interrupt queue and test bits, the IO's list of blocked threads, and the cleanup logic after the blocked IO call gets interrupted.
I am unsure if this is a bug exactly.
CRuby does not have the issue with the race when closing IO, because the kill and close happen rapidly and by the time the thread acquires the GVL both interrupts are there. The kill gets seen first, the queue is cleared, and the raise never happens.
But a similar case can be simulated in CRuby by putting a sleep in an ensure block, since ensure blocks are run when a thread is killed:
in_ensure = false
t = Thread.new do
sleep
ensure
in_ensure = true
sleep
end
Thread.pass until t.status == "sleep"
t.kill
Thread.pass until in_ensure
t.raise
t.join
When run on CRuby, the raise will interrupt the sleep in ensure (simulating interrupt checks that may happen as the kill propagates) and cause the thread to now raise an exception rather than quietly being killed.
This case forces a race for CRuby, but it is an open question as to whether a thread that has been killed or "raised" can be forced to die in a different way given a second "raise" or "kill" call before it dies.
In ruby/net-http#197 I patched an issue in the net-http tests where a server thread could get stuck in a loop by being "double-interrupted": killed and then closed before the kill can start to propagate. The issue here is that JRuby runs these things in parallel, so internally, the following can happen:
These are tricky things to coordinate because there's a lot of shared state here: the interrupt queue and test bits, the IO's list of blocked threads, and the cleanup logic after the blocked IO call gets interrupted.
I am unsure if this is a bug exactly.
CRuby does not have the issue with the race when closing IO, because the kill and close happen rapidly and by the time the thread acquires the GVL both interrupts are there. The kill gets seen first, the queue is cleared, and the raise never happens.
But a similar case can be simulated in CRuby by putting a sleep in an ensure block, since ensure blocks are run when a thread is killed:
When run on CRuby, the raise will interrupt the sleep in ensure (simulating interrupt checks that may happen as the kill propagates) and cause the thread to now raise an exception rather than quietly being killed.
This case forces a race for CRuby, but it is an open question as to whether a thread that has been killed or "raised" can be forced to die in a different way given a second "raise" or "kill" call before it dies.