Fix reordering/reentrancy bug in NIOAsyncWriter + NIOAsyncChannel#2587
Fix reordering/reentrancy bug in NIOAsyncWriter + NIOAsyncChannel#2587FranzBusch merged 2 commits intoapple:mainfrom
NIOAsyncWriter + NIOAsyncChannel#2587Conversation
3609bb1 to
b0e3e99
Compare
| /// - ``NIOAsyncWriter/Sink/finish()`` or ``NIOAsyncWriter/Sink/finish(error:)`` is called. | ||
| /// | ||
| /// - Note: This is guaranteed to be called _exactly_ once. | ||
| /// - Note: This is guaranteed to be called _at most_ once. |
There was a problem hiding this comment.
I'm a bit concerned about this change in behaviour, users could be relying on it as a signal to e.g. tear some things down.
There was a problem hiding this comment.
So the behaviour here changed that we only call didTerminate when somebody called writer.finished and not on sink.finished anymore. The logic before we quite weird and it caused problems with overlapping exclusive access that I fixed in previous PRs. Now the logic is that a call to didTerminate should result in the underlying resource to trigger termination and then a call to sink.didFinish indicates that this is done.
I am on the fence with this change so happy to discuss this more.
There was a problem hiding this comment.
Do you think anyone outside of NIO actually implements this? Might be worth having a quick search.
There was a problem hiding this comment.
I just did a quick GH search and couldn't find anyone besides grpc-swift using this and there we are only relying on the didTerminate being called from writer.finish and not from sink.finish which is the intention of this PR.
|
|
||
| case .retry: | ||
| // The yield has to be retried. We are recursively calling yield here | ||
| // but don't expect to blow the stack. |
There was a problem hiding this comment.
Why do we think this won't happen?
There was a problem hiding this comment.
I changed this to a while loop now
| inDelegateOutcall: Bool, | ||
| suspendedYields: _TinyArray<SuspendedYield>, | ||
| cancelledYields: [YieldID], | ||
| knownYieldIDs: _TinyArray<YieldID>, |
There was a problem hiding this comment.
What are the known yield IDs?
There was a problem hiding this comment.
I added a comment to clarify this but they are the buffered yields before we got the finish.
# Motivation While testing the latest async interfaces we found a potential reordering/reentrancy bug in the `NIOAsyncWriter`. This was caused due to our latest performance changes where we fast-pathed calls in `didYield` to not hop. The problem was in the following flow: 1. Task 1: Calls `outbound.write()` -> which led an `EventLoop` enqueue with `didYield` 2. Task 1: Calls `outbound.write()` -> which led an `EventLoop` enqueue with `didYield` 3. EventLoop: While processing the write from 1. the channel became **not** writable 4. Task 1: Calls `outbound.write()` -> which lead to buffering the write in the writer's state machine since we are **not** writable 5. EventLoop: While still processing the write from 1. the channel became writable again -> We informed the `NIOAsyncWriter` about this which unbuffered the write in 4. that was stored in the state machine and call `didYield`. Since, we are on the EventLoop already we processed the write right away The above flow show-cases a flow where we reordered the write in 2. and 4. # Modification This PR fixes the above issue while upholding a few constraints: 1. Produce as few context switches as possible 2. Minimize allocations I tried different approaches but in the end decided to do the following: 1. Make sure to never call `didYield/didTerminate` from calls on the `NIOAsyncWriter.Sink` 2. Don't coalesce the elements of different writes in the `NIOAsyncWriter` but rather use the suspended tasks to retry a write after they were suspended. I choose to do this since I wanted to avoid any allocation (remember writers are `some Sequence`) and because we assume that continuous contention in a multi producer pattern is low. 3. Make sure that `Sink.finish()` is terminal and does not lead to a `didTerminate` event. This is in line with 1. One important thing to call out, our `writer.finish()` method is not `async` we have to buffer the finish event and deliver it with the yield that got buffered before we transitioned to `writerFinished`. # Result No more reordering/reentrancy problems in `NIOAsyncWriter` or `NIOAsyncChannel`.
05e8ce4 to
09c595c
Compare
| /// - ``NIOAsyncWriter/Sink/finish()`` or ``NIOAsyncWriter/Sink/finish(error:)`` is called. | ||
| /// | ||
| /// - Note: This is guaranteed to be called _exactly_ once. | ||
| /// - Note: This is guaranteed to be called _at most_ once. |
There was a problem hiding this comment.
Do you think anyone outside of NIO actually implements this? Might be worth having a quick search.
|
|
||
| switch action { | ||
| case .callDidYield(let delegate): | ||
| // We are allocating a new Deque for every write here |
There was a problem hiding this comment.
Ouch. I didn't realise we did this.
There was a problem hiding this comment.
Yeah this one of the early design choices because users can yield us any kind of sequence and we have to normalise this in the Deque case. Right now we have provided a fast path for a single element we ought to be able to do the same for Deque/Array and more.
Motivation
While testing the latest async interfaces we found a potential reordering/reentrancy bug in the
NIOAsyncWriter. This was caused due to our latest performance changes where we fast-pathed calls indidYieldto not hop. The problem was in the following flow:outbound.write()-> which led anEventLoopenqueue withdidYieldoutbound.write()-> which led anEventLoopenqueue withdidYieldoutbound.write()-> which lead to buffering the write in the writer's state machine since we are not writableNIOAsyncWriterabout this which unbuffered the write in 4. that was stored in the state machine and calldidYield. Since, we are on the EventLoop already we processed the write right awayThe above flow show-cases a flow where we reordered the write in 2. and 4.
Modification
This PR fixes the above issue while upholding a few constraints:
I tried different approaches but in the end decided to do the following:
didYield/didTerminatefrom calls on theNIOAsyncWriter.SinkNIOAsyncWriterbut rather use the suspended tasks to retry a write after they were suspended. I choose to do this since I wanted to avoid any allocation (remember writers aresome Sequence) and because we assume that continuous contention in a multi producer pattern is low.Sink.finish()is terminal and does not lead to adidTerminateevent. This is in line with 1.One important thing to call out, our
writer.finish()method is notasyncwe have to buffer the finish event and deliver it with the yield that got buffered before we transitioned towriterFinished.Result
No more reordering/reentrancy problems in
NIOAsyncWriterorNIOAsyncChannel.