Cumulative metric for active main thread usage by dvkashapov · Pull Request #2931 · valkey-io/valkey

dvkashapov · 2025-12-13T21:49:27Z

The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field used_active_time_main_thread in the CPU section.

When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work.

The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time.

Implements the main thread part of #2065.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

dvkashapov · 2025-12-13T21:52:48Z

This address #2065 part for tracking the main thread utilization

codecov · 2025-12-13T22:08:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.38%. Comparing base (33a1b51) to head (8de3456).
⚠️ Report is 6 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2931      +/-   ##
============================================
+ Coverage     74.16%   74.38%   +0.22%     
============================================
  Files           129      129              
  Lines         70988    71007      +19     
============================================
+ Hits          52649    52822     +173     
+ Misses        18339    18185     -154

Files with missing lines	Coverage Δ
src/expire.c	`97.31% <100.00%> (+<0.01%)`	⬆️
src/server.c	`89.59% <100.00%> (+0.14%)`	⬆️
src/server.h	`100.00% <ø> (ø)`

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ranshid · 2025-12-14T16:40:44Z

@dvkashapov how much of a difference is this compared to using the serverCron taking periodic measurements of CLOCK_THREAD_CPUTIME_ID?

dvkashapov · 2025-12-14T18:31:47Z

how much of a difference is this compared to using the serverCron taking periodic measurements of CLOCK_THREAD_CPUTIME_ID?

Compared locally with clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts), diff_perc = busy_time_perc - thread_cputime_perc was from 0 to -7%, as load approaches 100% diff is getting smaller which kinda makes sense.

I think the main benefit of this approach is close to 0 overhead, compared to clock_gettime() WDYT?

ranshid · 2025-12-15T08:10:52Z

how much of a difference is this compared to using the serverCron taking periodic measurements of CLOCK_THREAD_CPUTIME_ID?

Compared locally with clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts), diff_perc = busy_time_perc - thread_cputime_perc was from 0 to -7%, as load approaches 100% diff is getting smaller which kinda makes sense.

I think the main benefit of this approach is close to 0 overhead, compared to clock_gettime() WDYT?

Not sure how you compared the 2 options. I was mainly pointing out that having a single cron point where we calculate and update the engine CPU time, might simplify this feature (maybe at a slight loss of precision)

dvkashapov · 2025-12-15T08:17:54Z

I was mainly pointing out that having a single cron point where we calculate and update the engine CPU time, might simplify this feature (maybe at a slight loss of precision)

Ah, I thought that you were concerned about accuracy of percentage value of this approach.
Both are probably the same in terms of complexity, but this one is just lightweight and uses everything we already have.

zuiderkwast · 2025-12-15T11:45:57Z

The metric tracks percentage of time the main thread spends in busy event loop cycles (those with file events, IO responses, or client writes), exposed as main_thread_utilization_perc in INFO CPU section.

It's a percentage of the total time since the server started?

We need to be able to observe changes over time. The current metrics show server uptime in seconds and CPU time in seconds (used_cpu_user_main_thread). To see this over time, you would call INFO CPU every "monitoring period" (e.g. one minute) and compare the current to the previous number. With this information, you can calculate the CPU usage per monitoring period.

I think we should expose the CPU utilization field in the same way, i.e. as number of useful seconds since server start, as suggested in the issue #2065.

Here's an example:

Time 1:

used_cpu_user_main_thread:100.000000
busy_cpu_user_main_thread:50.000000

Time 2:

used_cpu_user_main_thread:160.000000
busy_cpu_user_main_thread:55.000000

With these two samples, we can calculate the delta between time 1 and 2. The delta for used CPU time is 50 seconds and for busy time it's 5 seconds, so for this period, the CPU utilization is 10%. It's possible to draw a graph over time and we don't need to hard-code the sampling interval in the server.

I don't know if "busy" or "useful" is a better prefix. Maybe "useful" (suggested in the issue) is too similar to the exiting fields prefix "used", so maybe busy is better. 🤔

zuiderkwast · 2025-12-15T12:22:31Z

I see now that you implemented it as an instantaneous metric. It works by calculating the utilization over the last 1.6 seconds. (It makes a sample every 100 milliseconds, it has a ring of 16 samples and each sample overwrite the oldest one.) IMO this kind of metrics are less useful for monitoring the system. If you monitor a system with an INFO call every minute or every 5 minutes, you can easily miss a spike of a few seconds high CPU utilization, etc.

dvkashapov · 2025-12-15T12:24:09Z

Yes, exactly, I'll expose cumulative metric then

zuiderkwast · 2025-12-15T14:53:26Z

"busy" can be confusing because it sounds like busy-waiting, which is the non-useful time. We are interested in the useful time.

New suggestion: "active"

We can use it as

a prefix as in active_cpu_user_main_thread or
instead of the sys/user part as in used_cpu_active_main_thread

If we pick the latter, we'll have these three which looks pretty nice:

used_cpu_sys_main_thread:123.123456
used_cpu_user_main_thread:567.123456
used_cpu_active_main_thread:345.123456

…ain-thread-cpu Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast

Looks great. Since we measure the duration from the beginning of afterSleep to the end of beforeSleep, we automatically exclude any time we're "sleeping" in a blocking epoll, which is good.

I added a few comments.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

dvkashapov · 2025-12-19T14:47:10Z

Clang Format failed at setup, not on check itself, checked locally - everything should be OK

zuiderkwast

The current implementation and INFO fielda look good. My only thoughts now is if we cover all active work or if we should consider something more.

What about this in beforeSleep, is it active work?

    /* Run a fast expire cycle (the called function will return
     * ASAP if a fast cycle is not needed). */
    if (server.active_expire_enabled && !server.import_mode && iAmPrimary()) activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST);
    if (moduleCount()) {
        moduleFireServerEvent(VALKEYMODULE_EVENT_EVENTLOOP, VALKEYMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP, NULL);
    }

If we want to count it as active, we could take the monotime before and after this...

dvkashapov · 2025-12-26T07:03:34Z

activeExpireCycle()

Makes sense to mark this as active work but should we check that this expire cycle took more than some noticeable amount of time, so that we won't accidentally count all loops to be active. What about 250-500 us?

zuiderkwast · 2025-12-26T22:52:45Z

activeExpireCycle()

Makes sense to mark this as active work but should we check that this expire cycle took more than some noticeable amount of time, so that we won't accidentally count all loops to be active. What about 250-500 us?

A threshold is not perfect. I have an idea: This function internally already keeps track of the elapsed time spent on active expiration. We could change the function so it returns the elapsed time and count it as active CPU time even during an inactive iteration. Does it make sense?

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

dvkashapov · 2025-12-27T09:40:01Z

I have an idea: This function internally already keeps track of the elapsed time spent on active expiration. We could change the function so it returns the elapsed time and count it as active CPU time even during an inactive iteration. Does it make sense?

Definitely makes sense, I added a comment to indicate that we count expiry time as active CPU time for all event loops.

zuiderkwast · 2026-01-02T16:44:55Z

@ranshid I'm not sure I understand how your suggestion can solve this.

If we're asking the clock or the kernel about CPU time, it will return 100% CPU usage (or close to) because when IO threads are used, the main thread does a busy loop with non-blocking epoll, i.e. it busy-loops beforeSleep, epoll, afterSleep.

The purpose of the metric is to know what margin the server has, i.e. how much more work the thread can handle, which we get if we excude the CPU time that consists of only busy-looping.

Am I missing anything?

ranshid · 2026-01-03T07:38:20Z

@ranshid I'm not sure I understand how your suggestion can solve this.

If we're asking the clock or the kernel about CPU time, it will return 100% CPU usage (or close to) because when IO threads are used, the main thread does a busy loop with non-blocking epoll, i.e. it busy-loops beforeSleep, epoll, afterSleep.

The purpose of the metric is to know what margin the server has, i.e. how much more work the thread can handle, which we get if we excude the CPU time that consists of only busy-looping.

Am I missing anything?

Oh @zuiderkwast thank you for putting me back on the relevant motivation :) for some reason I only thought you wanted to provide a percentage calculation from the server (which has it's own benefits). I guess I am fine with that direction then.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

madolson

Major decision approved here, just change the two usages of monotonic clocks.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast

Yeah, this looks very good now.

I checked how the monotonic clock works in src/monotonic.c and I see that we use clock_gettime(CLOCK_MONOTONIC, &ts); by default. It's a syscall (or vDSO on Linux but a real syscall on other OSes) so I think we should really start using processor clock without syscall now in the same version, so this metric doesn't add a lot of new syscalls.

#2597

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast · 2026-01-14T08:42:50Z

Update. We have been chatting about this in other channels. Thanks also for feedback from @JimB123.

We should measure the times intervals using clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts) instead of monotonic wall clock time, because:

CPU time excludes time that the thread is not allowed to run because of preemption.
CPU time excludes time spent on blocking calls to epoll, disk IO, etc.
used_cpu_active_main_thread is not expected to be higher than the CPU time reported in used_cpu_user_main_thread. That would be quite confusing. It's more expected that the "active" time is always equal to or lower than used_cpu_user_main_thread.
The name of the field used_cpu_active_main_thread and the INFO section # CPU hint that this metric is all about CPU time, minus the busy spinning.

I believe we can get away with only a single clock_gettime call per event loop iteration. If we do it only in beforeSleep, we can use the delta between one and the next. The delta then includes the exact total CPU time of the cycle.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast · 2026-01-15T10:13:27Z

Update again. I've been talking to various people about measuring CPU time vs monotonic time.

Going back to the purpose of this metric. The purpose is to see how much spare capacity the thread has to handle more work.

If we measure CPU time and it shows 90% utilization, it doesn't necessarily mean that the thread has capacity to handle more work. The remaining 10% can be that the thread is waiting, not for more work but waiting for other reasons:

preemtion, i.e. the thread is not allowed to run full time
waiting for blocking IO or syscalls (CPU time spent by kernel, not userspace)

On the other hand, if we measure these active work time in monotonic time and the metric shows 90% of the time the thread is working, it means that the remaining 10% of the time, it is waiting for work. Thus, measuring monotonic time is more useful for this metric.

We'll keep measuring active time using monotonic clock and we'll change the name of the INFO field so that it doesn't include "cpu". New name of the field is used_active_time_main_thread. (Ack by @madolson.)

The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field `used_active_time_main_thread` in the CPU section. When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work. The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time. Implements the main thread part of valkey-io#2065. --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: arshidkv12 <arshidkv12@gmail.com>

The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field `used_active_time_main_thread` in the CPU section. When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work. The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time. Implements the main thread part of valkey-io#2065. --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

The metric tracks the number of seconds the main thread spends doing active work as supposed to waiting for work. The active time is exposed as the INFO field `used_active_time_main_thread` in the CPU section. When I/O threads are used, the main thread uses a busy-loop when it's waiting for work, so the reported CPU usage is near 100% even if the thread has capacity to handle more work. This new metric attempts to provide a useful metric of how loaded the main thread is by excluding the time the thread is just waiting for work. The busy loop consists of the cycle (beforeSleep, non-blocking epoll, afterSleep). Only the duration of the loops that handle at least one event loop event (network, file or timer event) or some work from I/O threads (execution of commands received from clients by I/O threads) is counted as active time. Implements the main thread part of valkey-io#2065. --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Documents the following features: * valkey-io/valkey#2931 * valkey-io/valkey#2463 Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

main thread cpu util

8afb817

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

dvkashapov requested a review from zuiderkwast December 13, 2025 21:49

github-actions Bot assigned dvkashapov Dec 13, 2025

dvkashapov added this to Valkey 9.1 Dec 13, 2025

dvkashapov moved this to Todo in Valkey 9.1 Dec 13, 2025

zuiderkwast mentioned this pull request Dec 17, 2025

[New] IO threads utilization INFO field #2065

Closed

dvkashapov added 5 commits December 19, 2025 16:23

Merge branch 'unstable' of https://github.com/valkey-io/valkey into m…

49dbbc9

…ain-thread-cpu Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

Move uptime to mono and used_cpu_active_main_thread to cumulative

8e60ea5

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

Move stat_starttime_mono

eaa219b

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

Del old starttime and rename new one

44daa3a

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

Fix formatting

96aa229

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast reviewed Dec 19, 2025

View reviewed changes

Comment thread src/server.h Outdated

Comment thread src/server.c

Comment thread src/server.c Outdated

Apply review suggestions

94465fd

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

dvkashapov changed the title ~~Metric for main thread CPU utilization in percentages~~ Cumulative metric for active main thread usage Dec 19, 2025

zuiderkwast reviewed Dec 26, 2025

View reviewed changes

Comment thread src/server.c Outdated

Comment thread src/server.c Outdated

Apply Viktor's suggestions

e478d1f

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

ranshid self-requested a review January 1, 2026 17:27

Do not account expire_cycle_time when ProcessingEventsWhileBlocked

425b383

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

madolson reviewed Jan 12, 2026

View reviewed changes

Comment thread src/server.c Outdated

Comment thread src/server.c Outdated

madolson added the major-decision-approved Major decision approved by TSC team label Jan 12, 2026

dvkashapov added 2 commits January 13, 2026 10:55

Merge remote-tracking branch 'upstream/unstable' into main-thread-cpu

d039bb1

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

revert uptime

3bf5f20

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast reviewed Jan 13, 2026

View reviewed changes

Comment thread src/expire.c Outdated

Comment thread src/server.c Outdated

Comment thread src/server.c

Comment thread tests/unit/info-command.tcl Outdated

Apply review suggestions

b8e48e2

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast mentioned this pull request Jan 13, 2026

Set the default to enable CPU clock for monotonic time tracking #2597

Closed

zuiderkwast approved these changes Jan 13, 2026

View reviewed changes

Comment thread src/server.c Outdated

Update src/server.c

86ce9d8

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast mentioned this pull request Jan 13, 2026

Cumulative metrics for active I/O threads usage #2463

Merged

remove cpu from field name

8de3456

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

zuiderkwast merged commit 4524c23 into valkey-io:unstable Jan 16, 2026
36 of 37 checks passed

github-project-automation Bot moved this from Todo to Done in Valkey 9.1 Jan 16, 2026

dvkashapov deleted the main-thread-cpu branch January 16, 2026 10:56

dvkashapov added needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes labels Jan 16, 2026

madolson mentioned this pull request Jan 26, 2026

Track request payload distribution in INFO stats #3106

Closed

zuiderkwast mentioned this pull request Jan 28, 2026

Redesign IO threading communication model #2909

Closed

dvkashapov mentioned this pull request Mar 9, 2026

Redesign IO threading communication model #3324

Merged

zuiderkwast mentioned this pull request May 4, 2026

Document INFO fields used_active_time_{main_thread,io_thread_N} valkey-io/valkey-doc#437

Merged

Uh oh!

Conversation

dvkashapov commented Dec 13, 2025 • edited by zuiderkwast Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvkashapov commented Dec 13, 2025

Uh oh!

codecov Bot commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ranshid commented Dec 14, 2025

Uh oh!

dvkashapov commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranshid commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvkashapov commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Dec 15, 2025

Uh oh!

dvkashapov commented Dec 15, 2025

Uh oh!

zuiderkwast commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dvkashapov commented Dec 19, 2025

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dvkashapov commented Dec 26, 2025

Uh oh!

zuiderkwast commented Dec 26, 2025

Uh oh!

dvkashapov commented Dec 27, 2025

Uh oh!

zuiderkwast commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ranshid commented Jan 3, 2026

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zuiderkwast commented Jan 14, 2026

Uh oh!

zuiderkwast commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

dvkashapov commented Dec 13, 2025 •

edited by zuiderkwast

Loading

codecov Bot commented Dec 13, 2025 •

edited

Loading

dvkashapov commented Dec 14, 2025 •

edited

Loading

ranshid commented Dec 15, 2025 •

edited

Loading

dvkashapov commented Dec 15, 2025 •

edited

Loading

zuiderkwast commented Dec 15, 2025 •

edited

Loading

zuiderkwast commented Dec 15, 2025 •

edited

Loading

zuiderkwast commented Jan 2, 2026 •

edited

Loading