Skip to content

msg/async: round timeouts up (fix busy loops)#60265

Merged
yuriw merged 2 commits intoceph:mainfrom
MaxKellermann:timeout_round_up
Jan 16, 2025
Merged

msg/async: round timeouts up (fix busy loops)#60265
yuriw merged 2 commits intoceph:mainfrom
MaxKellermann:timeout_round_up

Conversation

@MaxKellermann
Copy link
Member

Currently, we always round down, which has a bad side effect: when a timer comes closer, we have lots of early wakeups, and eventually we'll run into a busy loop (timeout=0) when the timeout is less than one millisecond; the process will remain this busy loop for one millisecond, wasting lots of CPU time.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! The comments are just minors / nits.

*
*/

#ifndef CEPH_MSG_TIMEOUT_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a dedicated header? There is already common/ceph_time.h.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need not, but I thought this was a special function that is tailored for converting timeouts and nothing else, and is only ever used by msg/async/Event.cc. And ceph_time.h comes with heavy dependencies on <string>, fmt, <ostream> and even <fstream> (via fmt/ostream.h).
I have a passionate hate for fat headers with a deep dependency tree. And I long for the day C++20 modules solve this problem once and for all, and Ceph could switch to them any day if you want to. (cmake supports modules, but I prefer Meson over cmake, but unfortunately Meson won't support modules for a long time.)

@MaxKellermann
Copy link
Member Author

Force-push with amendments; I havn't moved to ceph_time.h for now, but will do it if that's your preference.

This is duplicate code, and it's buggy, but I want to fix only one
copy.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Currently, we always round down, which has a bad side effect: when a
timer comes closer, we have lots of early wakeups, and eventually
we'll run into a busy loop (timeout=0) when the timeout is less than
one millisecond; the process will remain this busy loop for one
millisecond, wasting lots of CPU time.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

Do you have any numbers, flamegraphs, etc to demonstrate the effect? You said that in your testing there were no sub-second timers, I wonder if we aren't missing something else especially given that this PR has performance label.

@MaxKellermann
Copy link
Member Author

Do you have any numbers, flamegraphs, etc to demonstrate the effect?

No, I only observed in an strace that there were numerous spikes of epoll_wait() calls with timeout=0 running in a busy loop, every time a scheduled timer was approached. Its due time was still not yet reached, thus nothing was done in this EventCenter iteration, so it called epoll_wait(timeout=0) again and again until that busy loop ran for long enough that the timer's due time was eventually reached and then everything looked normal again - until the next time a timer was approached.

You said that in your testing there were no sub-second timers

Initially those times were not sub-second, but eventually they became sub-second, of course.
My second patch resolved the problem, and my third patch introduced another problem which added up to one second to all timers. But:

  1. if you schedule a timer in 5 seconds, you don't notice if it fires only after 6 seconds; but if you schedule a timer in 10ms and it doesn't fire until after 1 whole second, you'll notice that - but my point was that such timers do not exist in Ceph, therefore my bug was unnoticed
  2. this is only about the epoll_wait() timeout calculation; if epoll_wait() wakes up earlier (because there's traffic on a socket), the timer callback gets invoked soon enough; it is extremely unlikely that there's no traffic at all for one second

That's two reasons why the bug you found in my third patch didn't have any noticable effect. But of course it was wrong and I removed it.

@MaxKellermann
Copy link
Member Author

btw. this is where I should have put the rounding code:

timeout_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(end_time - now).count();

A duration_cast() rounds down, and this can trigger the same bug.

Unfortunately the whole EventCenter infrastructure is a lot of inefficient spaghetti code. process_events() rounds down, and then EpollDriver::event_wait() rounds down again.
Just look at the type conversions:

  1. EventCenter::create_time_event() takes a raw integer parameter denoting microseconds
  2. EventCenter::process_events() takes a raw integer parameter denoting microseconds again
  3. timer events are stored as ceph::coarse_mono_clock = NANOseconds
  4. those NANOseconds are then converted to microseconds, always rounding down
  5. those microseconds are then converted to struct timeval, split to seconds and microseconds
  6. finally, EpollDriver::event_wait() reconstructs the struct timeval back to a single integer, but MILLIseconds this time, again rounding down

All of this could be so much easier, (type-) safer AND faster by consistently using one single std::chrono::duration type alias globally. This is what I do in my projects: https://github.com/CM4all/libcommon/blob/master/src/event/Chrono.hxx
All timeouts can be constexpr and never need to be converted until epoll_wait() is called, but then I round up: https://github.com/CM4all/libcommon/blob/master/src/event/Loop.cxx#L215

Right now, I'm integrating io_uring into EventCenter to offload all sendmsg() calls to kernel threads and eliminate the latency from Ceph, but it's a bit painful. I might soon send patches to improve the whole EventCenter class (and the rest of msg/async).

@idryomov
Copy link
Contributor

Unfortunately the whole EventCenter infrastructure is a lot of inefficient spaghetti code.

I don't disagree ;)

@MaxKellermann
Copy link
Member Author

@yuriw, how did your testing go?

@MaxKellermann
Copy link
Member Author

@idryomov, is there anything I can do to get this merged?

@idryomov
Copy link
Contributor

idryomov commented Dec 9, 2024

We need @yuriw's response here, I have pinged him.

@yuriw
Copy link
Contributor

yuriw commented Dec 9, 2024

@MaxKellermann
Copy link
Member Author

@yuriw, how did the review go? I don't understand the tracker ticket.

@MaxKellermann
Copy link
Member Author

@yuriw?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@idryomov
Copy link
Contributor

No idea what the previous comment from @ceph-jenkins is about, but this PR appears to be pending on @ljflores reviewing the results captured in https://tracker.ceph.com/issues/68795. It's out of @yuriw's hands.

@ljflores
Copy link
Member

QA results looked good. Rados approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrackercephcomissues68795

Apologies for the delay- there were some issues with this batch coming from other PRs.

@yuriw yuriw merged commit 70ed84b into ceph:main Jan 16, 2025
@MaxKellermann MaxKellermann deleted the timeout_round_up branch January 17, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants