Optimize lock contention in the Finisher thread#60174
Merged
Conversation
6196a78 to
4da39b0
Compare
4da39b0 to
4e4c836
Compare
batrick
approved these changes
Oct 9, 2024
Member
|
@MaxKellermann many build errors, please check. |
Member
Author
|
Aye. This didn't fail for me because the faulty forward declaration was not used, due to different indirect header includes. I have lots of patches to detangle the bloated header dependencies in Ceph, in order to verify correctness of include lines and improve build times. Will submit them later. |
This merges some duplicate code. Only two overloads remain: one for single `Context` pointers and one for all containers. I tried to merge the former into the same template but that led to a larger binary (+7kB) because many pointer overloads were instantiated. This patch (with two overloads) only increases the binary by 8 bytes which is acceptable. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
`std::lock_guard` is all we need here, and the added complexity of `std::unique_lock` is not used and is usually optimized away by the compiler. Using `std::lock_guard` directly will reduce the amount of work that the optimizer needs to do and saves some build CPU cycles. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
The `PerfCounters::inc()` method acquires another lock which can block the calling thread while holding the `finisher_lock` which can cause a lot of lock contention. This can be avoided easily by moving the `PerfCounters::inc()` call out of the protected code block. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
If `finisher_running` is set, then the `Finisher` thread will automatically pick up new items queued by other threads. It is therefore not needed to wake it up and we can eliminate one system call. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Pushing to the queue may take a long time when the `std::vector` needs to allocate more memory. We should wake up the `Finisher` thread only right before unlocking the `finisher_mutex` to reduce lock contention, because it is the more likely that the mutex can really be acquired when the thread really wakes up. This imitates how commit cc7ec3e did it - it refactored only one of the `queue()` overloads, leaving less-than-optimal copies of this piece code in all other overloads. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
As noted in commit cc7ec3e, there is only ever a single `Finisher` thread, therefore the overhead for `notify_all()` can be eliminated. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
These are never changed; `const` prevents accidental changes and allows further compiler optimizations. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This aims to speed up compile times because constructor and destructor contain a lot of code that would be compiled in sources that do not call them. Also this allows removing the "common/perf_counters.h" include. Since there is now only one instantiation of these for all call sites, the binary size shrinks by nearly 1 kB. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This is not only more efficient at runtime, but also shaves 500 bytes code off the binary. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
4e4c836 to
9947a72
Compare
This impliciy heap allocation is unnecessary. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This allows eliminating the copy in `ActivePyModule`. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This eliminates a temporary `std::string`. Additionally, convert the `tn` parameter to an rvalue reference and move it to `thread_name`. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
9947a72 to
efc96a2
Compare
Member
Author
|
Looks like I fixed all build failures now. |
Member
|
This PR is under test in https://tracker.ceph.com/issues/68629. |
batrick
added a commit
to batrick/ceph
that referenced
this pull request
Oct 19, 2024
* refs/pull/60174/head: common/Finisher: pass name as std::string_view to ctor common/Finisher: add method get_thread_name() mgr/ActivePyModule: build thread name with fmt mgr/ActivePyModule: return std::string_view instead of std::string copy common/Finisher: use fmt to build strings common/Finisher: un-inline ctor and dtor common/Finisher: add `const` to several fields common/Finisher: merge duplicate field initializers common/Finisher: call notify_one() instead of notify_all() common/Finisher: wake up after pushing to the queue common/Finisher: do not wake up the thread if already running common/Finisher: call logger without holding the lock common/Finisher: use `std::lock_guard` instead of `std::unique_lock` common/Finisher: merge all queue() container methods into one template
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Several patches which reduce lock contention and eliminate system calls in the
Finisherthread, similar to what commit cc7ec3e did five years ago.Checklist