Skip to content

Optimize lock contention in the Finisher thread#60174

Merged
batrick merged 14 commits intoceph:mainfrom
MaxKellermann:optimize_finisher
Oct 22, 2024
Merged

Optimize lock contention in the Finisher thread#60174
batrick merged 14 commits intoceph:mainfrom
MaxKellermann:optimize_finisher

Conversation

@MaxKellermann
Copy link
Member

Several patches which reduce lock contention and eliminate system calls in the Finisher thread, similar to what commit cc7ec3e did five years ago.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

@github-actions github-actions bot added the common label Oct 7, 2024
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

@batrick
Copy link
Member

batrick commented Oct 10, 2024

@MaxKellermann many build errors, please check.

@MaxKellermann
Copy link
Member Author

Aye. This didn't fail for me because the faulty forward declaration was not used, due to different indirect header includes.

I have lots of patches to detangle the bloated header dependencies in Ceph, in order to verify correctness of include lines and improve build times. Will submit them later.

This merges some duplicate code.

Only two overloads remain: one for single `Context` pointers and one
for all containers.  I tried to merge the former into the same
template but that led to a larger binary (+7kB) because many pointer
overloads were instantiated.  This patch (with two overloads) only
increases the binary by 8 bytes which is acceptable.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
`std::lock_guard` is all we need here, and the added complexity of
`std::unique_lock` is not used and is usually optimized away by the
compiler.  Using `std::lock_guard` directly will reduce the amount of
work that the optimizer needs to do and saves some build CPU cycles.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
The `PerfCounters::inc()` method acquires another lock which can block
the calling thread while holding the `finisher_lock` which can cause a
lot of lock contention.  This can be avoided easily by moving the
`PerfCounters::inc()` call out of the protected code block.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
If `finisher_running` is set, then the `Finisher` thread will
automatically pick up new items queued by other threads.  It is
therefore not needed to wake it up and we can eliminate one system
call.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Pushing to the queue may take a long time when the `std::vector` needs
to allocate more memory.  We should wake up the `Finisher` thread only
right before unlocking the `finisher_mutex` to reduce lock contention,
because it is the more likely that the mutex can really be acquired
when the thread really wakes up.

This imitates how commit cc7ec3e did it - it refactored only
one of the `queue()` overloads, leaving less-than-optimal copies of
this piece code in all other overloads.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
As noted in commit cc7ec3e, there is only ever a single `Finisher`
thread, therefore the overhead for `notify_all()` can be eliminated.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
These are never changed; `const` prevents accidental changes and
allows further compiler optimizations.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This aims to speed up compile times because constructor and destructor
contain a lot of code that would be compiled in sources that do not
call them.  Also this allows removing the "common/perf_counters.h"
include.

Since there is now only one instantiation of these for all call sites,
the binary size shrinks by nearly 1 kB.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This is not only more efficient at runtime, but also shaves 500 bytes
code off the binary.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This impliciy heap allocation is unnecessary.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This allows eliminating the copy in `ActivePyModule`.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This eliminates a temporary `std::string`.

Additionally, convert the `tn` parameter to an rvalue reference and
move it to `thread_name`.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
@MaxKellermann
Copy link
Member Author

Looks like I fixed all build failures now.

@batrick
Copy link
Member

batrick commented Oct 19, 2024

This PR is under test in https://tracker.ceph.com/issues/68629.

batrick added a commit to batrick/ceph that referenced this pull request Oct 19, 2024
* refs/pull/60174/head:
	common/Finisher: pass name as std::string_view to ctor
	common/Finisher: add method get_thread_name()
	mgr/ActivePyModule: build thread name with fmt
	mgr/ActivePyModule: return std::string_view instead of std::string copy
	common/Finisher: use fmt to build strings
	common/Finisher: un-inline ctor and dtor
	common/Finisher: add `const` to several fields
	common/Finisher: merge duplicate field initializers
	common/Finisher: call notify_one() instead of notify_all()
	common/Finisher: wake up after pushing to the queue
	common/Finisher: do not wake up the thread if already running
	common/Finisher: call logger without holding the lock
	common/Finisher: use `std::lock_guard` instead of `std::unique_lock`
	common/Finisher: merge all queue() container methods into one template
@batrick
Copy link
Member

batrick commented Oct 22, 2024

@batrick batrick merged commit 4c38857 into ceph:main Oct 22, 2024
@MaxKellermann MaxKellermann deleted the optimize_finisher branch October 22, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants