Cumulative metric for active main thread usage#2931
Conversation
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
This address #2065 part for tracking the main thread utilization |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2931 +/- ##
============================================
+ Coverage 74.16% 74.38% +0.22%
============================================
Files 129 129
Lines 70988 71007 +19
============================================
+ Hits 52649 52822 +173
+ Misses 18339 18185 -154
🚀 New features to boost your workflow:
|
|
@dvkashapov how much of a difference is this compared to using the serverCron taking periodic measurements of CLOCK_THREAD_CPUTIME_ID? |
Compared locally with I think the main benefit of this approach is close to 0 overhead, compared to |
Not sure how you compared the 2 options. I was mainly pointing out that having a single cron point where we calculate and update the engine CPU time, might simplify this feature (maybe at a slight loss of precision) |
Ah, I thought that you were concerned about accuracy of percentage value of this approach. |
It's a percentage of the total time since the server started? We need to be able to observe changes over time. The current metrics show server uptime in seconds and CPU time in seconds ( I think we should expose the CPU utilization field in the same way, i.e. as number of useful seconds since server start, as suggested in the issue #2065. Here's an example: Time 1: Time 2: With these two samples, we can calculate the delta between time 1 and 2. The delta for used CPU time is 50 seconds and for busy time it's 5 seconds, so for this period, the CPU utilization is 10%. It's possible to draw a graph over time and we don't need to hard-code the sampling interval in the server. I don't know if "busy" or "useful" is a better prefix. Maybe "useful" (suggested in the issue) is too similar to the exiting fields prefix "used", so maybe busy is better. 🤔 |
|
I see now that you implemented it as an instantaneous metric. It works by calculating the utilization over the last 1.6 seconds. (It makes a sample every 100 milliseconds, it has a ring of 16 samples and each sample overwrite the oldest one.) IMO this kind of metrics are less useful for monitoring the system. If you monitor a system with an INFO call every minute or every 5 minutes, you can easily miss a spike of a few seconds high CPU utilization, etc. |
|
Yes, exactly, I'll expose cumulative metric then |
|
"busy" can be confusing because it sounds like busy-waiting, which is the non-useful time. We are interested in the useful time. New suggestion: "active" We can use it as
If we pick the latter, we'll have these three which looks pretty nice: |
…ain-thread-cpu Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
zuiderkwast
left a comment
There was a problem hiding this comment.
Looks great. Since we measure the duration from the beginning of afterSleep to the end of beforeSleep, we automatically exclude any time we're "sleeping" in a blocking epoll, which is good.
I added a few comments.
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
Clang Format failed at setup, not on check itself, checked locally - everything should be OK |
zuiderkwast
left a comment
There was a problem hiding this comment.
The current implementation and INFO fielda look good. My only thoughts now is if we cover all active work or if we should consider something more.
What about this in beforeSleep, is it active work?
/* Run a fast expire cycle (the called function will return
* ASAP if a fast cycle is not needed). */
if (server.active_expire_enabled && !server.import_mode && iAmPrimary()) activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST);
if (moduleCount()) {
moduleFireServerEvent(VALKEYMODULE_EVENT_EVENTLOOP, VALKEYMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP, NULL);
}If we want to count it as active, we could take the monotime before and after this...
Makes sense to mark this as active work but should we check that this expire cycle took more than some noticeable amount of time, so that we won't accidentally count all loops to be active. What about 250-500 us? |
A threshold is not perfect. I have an idea: This function internally already keeps track of the elapsed time spent on active expiration. We could change the function so it returns the elapsed time and count it as active CPU time even during an inactive iteration. Does it make sense? |
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Definitely makes sense, I added a comment to indicate that we count expiry time as active CPU time for all event loops. |
|
@ranshid I'm not sure I understand how your suggestion can solve this. If we're asking the clock or the kernel about CPU time, it will return 100% CPU usage (or close to) because when IO threads are used, the main thread does a busy loop with non-blocking epoll, i.e. it busy-loops beforeSleep, epoll, afterSleep. The purpose of the metric is to know what margin the server has, i.e. how much more work the thread can handle, which we get if we excude the CPU time that consists of only busy-looping. Am I missing anything? |
Oh @zuiderkwast thank you for putting me back on the relevant motivation :) for some reason I only thought you wanted to provide a percentage calculation from the server (which has it's own benefits). I guess I am fine with that direction then. |
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
madolson
left a comment
There was a problem hiding this comment.
Major decision approved here, just change the two usages of monotonic clocks.
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
zuiderkwast
left a comment
There was a problem hiding this comment.
Yeah, this looks very good now.
I checked how the monotonic clock works in src/monotonic.c and I see that we use clock_gettime(CLOCK_MONOTONIC, &ts); by default. It's a syscall (or vDSO on Linux but a real syscall on other OSes) so I think we should really start using processor clock without syscall now in the same version, so this metric doesn't add a lot of new syscalls.
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
Update. We have been chatting about this in other channels. Thanks also for feedback from @JimB123. We should measure the times intervals using
I believe we can get away with only a single |
Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
|
Update again. I've been talking to various people about measuring CPU time vs monotonic time. Going back to the purpose of this metric. The purpose is to see how much spare capacity the thread has to handle more work. If we measure CPU time and it shows 90% utilization, it doesn't necessarily mean that the thread has capacity to handle more work. The remaining 10% can be that the thread is waiting, not for more work but waiting for other reasons:
On the other hand, if we measure these active work time in monotonic time and the metric shows 90% of the time the thread is working, it means that the remaining 10% of the time, it is waiting for work. Thus, measuring monotonic time is more useful for this metric. We'll keep measuring active time using monotonic clock and we'll change the name of the INFO field so that it doesn't include "cpu". New name of the field is |
The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field `used_active_time_main_thread` in the CPU section. When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work. The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time. Implements the main thread part of valkey-io#2065. --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: arshidkv12 <arshidkv12@gmail.com>
The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field `used_active_time_main_thread` in the CPU section. When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work. The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time. Implements the main thread part of valkey-io#2065. --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field `used_active_time_main_thread` in the CPU section. When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work. The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time. Implements the main thread part of valkey-io#2065. --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Documents the following features: * valkey-io/valkey#2931 * valkey-io/valkey#2463 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field
used_active_time_main_threadin the CPU section.When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work.
The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time.
Implements the main thread part of #2065.