Fix CoW performance bug in NIOThreadPool work queue#2669
Conversation
| /// Since `WorkItems` are embedded in `State` enum, it's hard to reliably avoid CoW otherwise. | ||
| /// | ||
| /// Also the closures are wrapped in a struct, to avoid the cost of allocation as discussed on https://bugs.swift.org/browse/SR-15872 | ||
| final class WorkItems { |
There was a problem hiding this comment.
Instead of boxing the buffer here with a class we should be able to avoid the CoW by introducing a new state called modifying into the State enum and before mutating the queue we transition to that state and then back to the running state. We have used that approach in most other places recently and it avoids this class entirely.
The other upside of this pattern is that once we language gets mutating/inout switches in the future we can easily transition over to that and remove the modifying case again.
There was a problem hiding this comment.
good catch! shoot, missed the class, yes, let's do this properly
There was a problem hiding this comment.
The .modifying solution is 60% slower than the class. @FranzBusch @weissi do you still want me to do that?
60% is for the benchmarks, going from ~140ms per run, to ~230ms.
There was a problem hiding this comment.
Where does the slow down come from and is sprinkling some @inlinable + @usableFromInline making it go faster?
There was a problem hiding this comment.
Also consider moving from CircularBuffer to Deque from swift-collections.
There was a problem hiding this comment.
Ah, this confused me also.
Turns out I had a different testing environment. When I was developing the original solution, and performance testing it. I was sitting on top of #2645, which also has an effect, as ELFs are used to notify completions.
In the "this is 60% slower" result, I was sitting on top of now rebased branch from this PR.
My bad. I'll update the results in the PR description to give the numbers for this branch alone (as they should have been). And then we'll see the 60% perf bump on once #2645 goes in.
There was a problem hiding this comment.
@dnadoba I checked Deque and it didn't have any noticeable performance difference. Is it supposed to? And/or are we attempting to move away from CircularBuffer? Pushing Deque-s version, so that we have all the different paints on the PR :).
There was a problem hiding this comment.
We are moving away from CircularBuffer. We are using Deque in all new code e.g. in our new custom AsyncSequences.
Both are heavily optimised but Deque is the way forward. This isn't urgent and we will keep CircularBuffer around but it isn't likely to be updated. All new performance optimisations, if any are possible, should be done on top of Deque as it is independent of NIO and therefore more generally useful.
There was a problem hiding this comment.
@dnadoba got it, thanks. Will keep this in mind for the future. This code could be trivially switched to Deque so it's already in.
…dPool changes reduced the allocations required.
Fix CoW performance bug in
NIOThreadPoolwork queue.Motivation:
NIOThreadPooluses an enum to track it's state. If the pool is running, it'll be inState.running(CircularBuffer<WorkItem>). However this hard to mutate in place reliably. The current code certainly doesn't achieve that, which leads to CoW on each enqueue and dequeue.Modifications:
RunIfActiveBenchmark. It's run in 1 & 8 threads variants, with 100k tasks.CircularBuffer<WorkItem>in a class, thus providing a layer of indirection during mutations, which avoid CoWWorkItemwhich is a closure in a struct wrapped to avoid the cost of allocations discussed in: https://bugs.swift.org/browse/SR-15872Result:
Performance is orders of magnitudes greater. The "after" (measured for
Dequeversion, on my own laptop [given current perf testing CI infra breakage]), is:The before takes way too long to run (hours).