MpscBlockingConsumerArrayQueue poll with timeout fix#316
Conversation
Reorder offerAndWakeup() to fix race. Previously the following
sequence was possible:
Consumer Provider
poll(timeout)
offerAndWakeup()
time out
soRefElement(, offset,)
consume from offset
offset -> offset'
poll()
soBlocked(null)
unpark(consumerThread)
// Spin for ever
spinWaitForElement(, offset')
Pull Request Test Coverage Report for Build 699
💛 - Coveralls |
|
I don't understand why the last poll should spin wait for element TBH, given that there is no new element on offset' beside the one already consumed by take(timeout), need to look better into this |
|
Look at Or try reverting the fix and running the test. It failed consistently for me, even with additional logging so I could figure out what was happening. |
|
What it looks strange to me is the check on https://github.com/pushtechnology/JCTools/blob/865a75f12cfa3d63029f16979dc8ee2af7e4a5b1/jctools-core/src/main/java/org/jctools/queues/MpscBlockingConsumerArrayQueue.java#L577 : I will add some logs to understand why it isn't returning null instead |
|
Oh, right. I was using |
|
Ah ok so the issue happens with 2 subsequent poll(timeout) and the expectation would be that:
Or
|
|
Yes, at least two It reproduced for me fairly readily once I had the test balanced to ensure a mostly empty queue. I'd be interested to know if the test fails for you (with the fix reverted). |
|
Hi @philipa ! long time no see :-) Thanks for the find + PR, this is an interesting one. The thinking behind the original ordering is that we want to avoid the spin when we wake the consumer from an unbounded |
|
Not sure CAS on blocked works alone, since the producer would need to distinguish between the consumer waiting for a value at the original offset vs it having woken up early, sneakily read the element, and now waiting for an element at the next offset. |
|
Here is an alternative fix to consumer-side which should not affect the existing happy paths at all: 3f82f1c. The existing cas in the finally block already detects this race and so it's just a matter of handling it appropriately (I removed the try/finally and now have separate checks for interrupt and timeout cases). wdyt? |
|
Looks good. I prefer the revised code flow. It would make sense to restructure I did like e8e0839 though. |
|
+100 for the changes of @njhill |
|
Thanks @philipa @franz1981, agree about the restructure of |
|
@philipa thanks for the good work! I'll do a final pass today |
|
@philipa @njhill See minor changes: diffusiondata@c3ea433 If you can squash it all to one nice commit I think we're done :-) |
|
Refactoring looks good. Excuse my GH ignorance, but don't you have a "squash and merge" button for PRs? |
|
Great work team :-) |
Reorder offerAndWakeup() to fix race. Previously the following sequence was possible: