msg: Code cleanup and optimizations#60220
Conversation
batrick
left a comment
There was a problem hiding this comment.
The locking changes are the ones I'm least certain about. I'd request those go in a separate PR too since they will require more rigorous review and broad QA. The other changes are uncontroversial and could be merged quickly.
| @@ -305,12 +305,12 @@ ssize_t AsyncConnection::read_bulk(char *buf, unsigned len) | |||
| ssize_t AsyncConnection::write(ceph::buffer::list &bl, | |||
| std::function<void(ssize_t)> callback, | |||
There was a problem hiding this comment.
| std::function<void(ssize_t)> callback, | |
| std::function<void(ssize_t)>&& callback, |
also appropriate?
There was a problem hiding this comment.
Would it be better?
Without &&, the std::function is created on the caller's stack which gets passed to the method.
With the &&, the std::function is created on the caller's stack, and additionally, a pointer to this stack address gets passed to the method.
This is only useful if you already have a reference which you forward, so you don't allocate any stack space.
There was a problem hiding this comment.
Just conventional, I suppose. I don't think it matters much.
src/msg/async/ProtocolV1.cc
Outdated
| do { | ||
| if (connection->is_queued()) { | ||
| if (r = connection->_try_send(); r!= 0) { | ||
| connection->write_lock.unlock(); |
There was a problem hiding this comment.
msg/async/ProtocolV[12]: unlock write_lockbefore calling_try_send()`` please expand the commit message why this is appropriate/safe.
There was a problem hiding this comment.
That's going to be difficult, because I have to explain the absence of something - try_send() does not access anything that needs write_lock protection. If somebody thinks otherwise, I'd like to hear about it.
But anyway, the code doesn't make much sense - it calls dispatch_event_external() which is a function designed to be called from outside the I/O thread, but by definition, we must be inside the I/O thread (or else we wouldn't be allowed to access outgoing_bl). This is confusing, but I guess I'm here to clean up the mess.
There was a problem hiding this comment.
That's going to be difficult, because I have to explain the absence of something -
try_send()does not access anything that needswrite_lockprotection. If somebody thinks otherwise, I'd like to hear about it.
I think that's what makes this challenging to review too. We would want to be certain your supposition is correct.
You won't hear much argument from here that this code can be messy. I'm glad you're here!
There was a problem hiding this comment.
The problem is that while locks are extremely important, documentation on what they protect is nowhere to be found. My theory is that write_lock really protects nothing at all. At least I couldn't figure out anything. So I'm trying to remove it one-by-one from calls that look obviously ok without that lock; most importantly to calls that do I/O, and you should never to I/O while holding locks.
My profiler says that the MDS spends 20% of the CPU time only in sendmsg() - and that's actual CPU time, without the off-cpu time that must be added on top. That means write_lock is kept locked at least 20% of the time, for no reason at all.
There was a problem hiding this comment.
I forgot: AsyncConnection::write_lock does protect something - just not in class AsyncConnection. It protects ProtocolV[12]::out_queue (and others in these two classes). This mutex should probably be moved there, out of AsyncConnection, to avoid confusion.
There was a problem hiding this comment.
@MaxKellermann your profile of sengmsg() is correect, at least from the profiling I've done. It's a major issue and one of the reasons I was looking at reducing the number of async messenger threads talking to the kernel.
There was a problem hiding this comment.
I have several patches which reduce the number of sendmsg() system calls, but I can only submit them only after some of the other PRs are merged because they build on top of them.
Anyway, the number of I/O threads isn't a performance problem; but a real performance problem is all the bouncing between workers, dispatcher, finisher, and submit thread. One single request engages all of them, and that causes a lot of inter-thread communication overhead which is very visible in my profiler.
Reducing lock contention is really important, and several of my PRs do that; but it would be much better to stay in one thread for handling a request.
a2d32e2 to
a1c70b8
Compare
Dropped for now. (These two patches are part of the patch set that make the MDS faster. Reducing lock contention is one important piece.) |
|
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
This allows eliminating one lookup in `_get_next_outgoing()` because we can pass the iterator instead of the key to `erase()`. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Since `std::function` is nullable and as an `operator bool()`, we can easily eliminate the `std::optional` overhead. Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
a1c70b8 to
425fc4d
Compare
|
jenkins test make check |
|
jenkins test make check arm64 |
|
This PR is under test in https://tracker.ceph.com/issues/68629. |
* refs/pull/60220/head: msg/async/AsyncConnection: move the writeCallback instead of copying it msg/async/AsyncConnection: do not wrap writeCallback in `std::optional` msg/async/frames_v2: use zero-initialization instead of memset() msg/async/Event: use zero-initialization instead of memset() msg/Message: use zero-initialization instead of memset() msg/async/ProtocolV2: eliminate redundant std::map lookups msg/async/ProtocolV[12]: reverse the std::map sort order msg/async/ProtocolV[12]: use `auto` msg/async/ProtocolV[12]: use range-based `for` msg/async/ProtocolV1: use zero-initialization instead of memset()
Checklist