Skip to content

Fix CoW performance bug in NIOThreadPool work queue#2669

Merged
gmilos merged 7 commits intoapple:mainfrom
gmilos:gm-nio-thread-pool-queueing-opitimisations
Feb 28, 2024
Merged

Fix CoW performance bug in NIOThreadPool work queue#2669
gmilos merged 7 commits intoapple:mainfrom
gmilos:gm-nio-thread-pool-queueing-opitimisations

Conversation

@gmilos
Copy link
Copy Markdown
Contributor

@gmilos gmilos commented Feb 28, 2024

Fix CoW performance bug in NIOThreadPool work queue.

Motivation:

NIOThreadPool uses an enum to track it's state. If the pool is running, it'll be in State.running(CircularBuffer<WorkItem>). However this hard to mutate in place reliably. The current code certainly doesn't achieve that, which leads to CoW on each enqueue and dequeue.

Modifications:

  • I added a test to measure the behaviour before and after. It's called RunIfActiveBenchmark. It's run in 1 & 8 threads variants, with 100k tasks.
  • I encapsulated the CircularBuffer<WorkItem> in a class, thus providing a layer of indirection during mutations, which avoid CoW
  • I wrapped WorkItem which is a closure in a struct wrapped to avoid the cost of allocations discussed in: https://bugs.swift.org/browse/SR-15872

Result:

Performance is orders of magnitudes greater. The "after" (measured for Deque version, on my own laptop [given current perf testing CI infra breakage]), is:

measuring: runIfActive_1_thread_100k_tasks: 0.251138375, 0.247212625, 0.257556917, 0.2396585, 0.245091042, 0.240477125, 0.246961292, 0.244519042, 0.240759375, 0.240233208,
measuring: runIfActive_8_threads_100k_tasks: 0.26803725, 0.262704083, 0.263101083, 0.266763792, 0.269062791, 0.267278875, 0.265546208, 0.266732625, 0.265480375, 0.271898916,

The before takes way too long to run (hours).

@gmilos gmilos requested review from Lukasa and weissi February 28, 2024 12:02
Copy link
Copy Markdown
Member

@weissi weissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM, letting @Lukasa merge this if he's happy

Comment thread Sources/NIOPosix/NIOThreadPool.swift Outdated
/// Since `WorkItems` are embedded in `State` enum, it's hard to reliably avoid CoW otherwise.
///
/// Also the closures are wrapped in a struct, to avoid the cost of allocation as discussed on https://bugs.swift.org/browse/SR-15872
final class WorkItems {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of boxing the buffer here with a class we should be able to avoid the CoW by introducing a new state called modifying into the State enum and before mutating the queue we transition to that state and then back to the running state. We have used that approach in most other places recently and it avoids this class entirely.

The other upside of this pattern is that once we language gets mutating/inout switches in the future we can easily transition over to that and remove the modifying case again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! shoot, missed the class, yes, let's do this properly

Copy link
Copy Markdown
Contributor Author

@gmilos gmilos Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .modifying solution is 60% slower than the class. @FranzBusch @weissi do you still want me to do that?

60% is for the benchmarks, going from ~140ms per run, to ~230ms.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the slow down come from and is sprinkling some @inlinable + @usableFromInline making it go faster?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider moving from CircularBuffer to Deque from swift-collections.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this confused me also.

Turns out I had a different testing environment. When I was developing the original solution, and performance testing it. I was sitting on top of #2645, which also has an effect, as ELFs are used to notify completions.

In the "this is 60% slower" result, I was sitting on top of now rebased branch from this PR.

My bad. I'll update the results in the PR description to give the numbers for this branch alone (as they should have been). And then we'll see the 60% perf bump on once #2645 goes in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnadoba I checked Deque and it didn't have any noticeable performance difference. Is it supposed to? And/or are we attempting to move away from CircularBuffer? Pushing Deque-s version, so that we have all the different paints on the PR :).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving away from CircularBuffer. We are using Deque in all new code e.g. in our new custom AsyncSequences.
Both are heavily optimised but Deque is the way forward. This isn't urgent and we will keep CircularBuffer around but it isn't likely to be updated. All new performance optimisations, if any are possible, should be done on top of Deque as it is independent of NIO and therefore more generally useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnadoba got it, thanks. Will keep this in mind for the future. This code could be trivially switched to Deque so it's already in.

…dPool changes reduced the allocations required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants