Skip to content

Optimize ustime() with monotonic delta#3193

Merged
zuiderkwast merged 2 commits into
valkey-io:unstablefrom
zuiderkwast:ustime-mono-delta
Feb 15, 2026
Merged

Optimize ustime() with monotonic delta#3193
zuiderkwast merged 2 commits into
valkey-io:unstablefrom
zuiderkwast:ustime-mono-delta

Conversation

@zuiderkwast

@zuiderkwast zuiderkwast commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

Only call gettimeofday() periodically and add monotonic delta. This avoids a syscall if we have a fast monotonic clock, which we have now by default on most platforms.

ustime() is called once per command execution. In a busy server running 1M RPS, ustime() is called every microsecond. By calling gettimeofday() only once every millisecond, we save 99.9% of these syscalls.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Really clever!

@zuiderkwast

Copy link
Copy Markdown
Contributor Author

I'll change to 1ms caching time to reduce the risk of drift. I'll do it when the benchmark is complete.

When we're calibrating the hardware clock frequency, we measure it in ticks per microsecond. The calibration is quite exact, but the value is rounded to a integer in ticks-per-microsecond. The worst case error due to this rounding is 0.5 ticks per microsecond. Commonly, a TSC clock has a frequency of ~2500 ticks per us and then we get an error of at most 1/5000. If we're caching gettimeofday for 100ms, we get a maximum error of 20us. With 1ms, we get a maximum error of 0.2us (rounded to 0).

Given ustime() is called every microsecond, we can still save a lot by calling gettimeofday() only once in a thousand.

@github-actions

Copy link
Copy Markdown

Benchmark ran on this commit: 2ff52b9

Benchmark Comparison: unstable vs c0e16ef (averaged) - rps metrics

Run Summary:

  • unstable: 80 total runs, 16 configurations (avg 5.00 runs per config)
  • c0e16ef: 80 total runs, 16 configurations (avg 5.00 runs per config)

Statistical Notes:

  • CI99%: 99% Confidence Interval - range where the true population mean is likely to fall
  • PI99%: 99% Prediction Interval - range where a single future observation is likely to fall
  • CV: Coefficient of Variation - relative variability (σ/μ × 100%)

Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges.

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 16
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads unstable c0e16ef Diff % Change
GET rps 1 1 228929.146 (n=5, σ=1634.592, CV=0.71%, CI99%=±1.470%, PI99%=±3.601%, CI[225563.498, 232294.794], PI[220685.025, 237173.267]) 228499.606 (n=5, σ=2363.358, CV=1.03%, CI99%=±2.130%, PI99%=±5.216%, CI[223633.419, 233365.793], PI[216579.931, 240419.281]) -429.540 -0.188%
GET rps 1 9 1498061.522 (n=5, σ=24810.916, CV=1.66%, CI99%=±3.410%, PI99%=±8.353%, CI[1446975.503, 1549147.541], PI[1372926.842, 1623196.202]) 1516739.952 (n=5, σ=15761.046, CV=1.04%, CI99%=±2.140%, PI99%=±5.241%, CI[1484287.741, 1549192.163], PI[1437248.595, 1596231.309]) 18678.430 +1.247%
GET rps 10 1 1193404.176 (n=5, σ=4892.678, CV=0.41%, CI99%=±0.844%, PI99%=±2.068%, CI[1183330.084, 1203478.268], PI[1168727.791, 1218080.561]) 1228115.748 (n=5, σ=9222.564, CV=0.75%, CI99%=±1.546%, PI99%=±3.787%, CI[1209126.362, 1247105.134], PI[1181601.442, 1274630.054]) 34711.572 +2.909%
GET rps 10 9 2598875.900 (n=5, σ=8444.567, CV=0.32%, CI99%=±0.669%, PI99%=±1.639%, CI[2581488.420, 2616263.380], PI[2556285.446, 2641466.354]) 2786868.050 (n=5, σ=14842.048, CV=0.53%, CI99%=±1.097%, PI99%=±2.686%, CI[2756308.067, 2817428.033], PI[2712011.686, 2861724.414]) 187992.150 +7.234%
SET rps 1 1 219500.076 (n=5, σ=1696.877, CV=0.77%, CI99%=±1.592%, PI99%=±3.899%, CI[216006.183, 222993.969], PI[210941.821, 228058.331]) 222193.262 (n=5, σ=921.273, CV=0.41%, CI99%=±0.854%, PI99%=±2.091%, CI[220296.349, 224090.175], PI[217546.793, 226839.731]) 2693.186 +1.227%
SET rps 1 9 1475641.624 (n=5, σ=7013.810, CV=0.48%, CI99%=±0.979%, PI99%=±2.397%, CI[1461200.092, 1490083.156], PI[1440267.240, 1511016.008]) 1474146.054 (n=5, σ=11573.920, CV=0.79%, CI99%=±1.617%, PI99%=±3.960%, CI[1450315.192, 1497976.916], PI[1415772.602, 1532519.506]) -1495.570 -0.101%
SET rps 10 1 1030528.876 (n=5, σ=1864.610, CV=0.18%, CI99%=±0.373%, PI99%=±0.913%, CI[1026689.618, 1034368.134], PI[1021124.654, 1039933.098]) 1036045.114 (n=5, σ=3825.596, CV=0.37%, CI99%=±0.760%, PI99%=±1.862%, CI[1028168.158, 1043922.070], PI[1016750.591, 1055339.637]) 5516.238 +0.535%
SET rps 10 9 1909346.526 (n=5, σ=15532.601, CV=0.81%, CI99%=±1.675%, PI99%=±4.103%, CI[1877364.686, 1941328.366], PI[1831007.337, 1987685.715]) 1928861.350 (n=5, σ=26482.105, CV=1.37%, CI99%=±2.827%, PI99%=±6.924%, CI[1874334.330, 1983388.370], PI[1795297.975, 2062424.725]) 19514.824 +1.022%

Configuration:

  • architecture: aarch64
  • benchmark_mode: duration
  • clients: 1600
  • cluster_mode: False
  • data_size: 96
  • duration: 180
  • tls: False
  • valkey_benchmark_threads: 90
  • warmup: 30
Command Metric Pipeline io_threads unstable c0e16ef Diff % Change
GET rps 1 1 220817.654 (n=5, σ=3086.405, CV=1.40%, CI99%=±2.878%, PI99%=±7.049%, CI[214462.703, 227172.605], PI[205251.268, 236384.040]) 220260.892 (n=5, σ=1954.975, CV=0.89%, CI99%=±1.828%, PI99%=±4.477%, CI[216235.571, 224286.213], PI[210400.909, 230120.875]) -556.762 -0.252%
GET rps 1 9 1463142.474 (n=5, σ=11415.055, CV=0.78%, CI99%=±1.606%, PI99%=±3.935%, CI[1439638.719, 1486646.229], PI[1405570.267, 1520714.681]) 1453714.252 (n=5, σ=9033.879, CV=0.62%, CI99%=±1.280%, PI99%=±3.134%, CI[1435113.370, 1472315.134], PI[1408151.583, 1499276.921]) -9428.222 -0.644%
GET rps 10 1 1129929.348 (n=5, σ=3657.580, CV=0.32%, CI99%=±0.667%, PI99%=±1.633%, CI[1122398.340, 1137460.356], PI[1111482.221, 1148376.475]) 1159364.702 (n=5, σ=9418.641, CV=0.81%, CI99%=±1.673%, PI99%=±4.097%, CI[1139971.590, 1178757.814], PI[1111861.474, 1206867.930]) 29435.354 +2.605%
GET rps 10 9 2134575.400 (n=5, σ=30138.159, CV=1.41%, CI99%=±2.907%, PI99%=±7.121%, CI[2072520.514, 2196630.286], PI[1982572.593, 2286578.207]) 2239762.050 (n=5, σ=25759.222, CV=1.15%, CI99%=±2.368%, PI99%=±5.801%, CI[2186723.456, 2292800.644], PI[2109844.557, 2369679.543]) 105186.650 +4.928%
SET rps 1 1 212256.250 (n=5, σ=1913.884, CV=0.90%, CI99%=±1.857%, PI99%=±4.548%, CI[208315.537, 216196.963], PI[202603.514, 221908.986]) 213415.120 (n=5, σ=1849.952, CV=0.87%, CI99%=±1.785%, PI99%=±4.372%, CI[209606.044, 217224.196], PI[204084.827, 222745.413]) 1158.870 +0.546%
SET rps 1 9 1447471.176 (n=5, σ=13491.004, CV=0.93%, CI99%=±1.919%, PI99%=±4.701%, CI[1419693.012, 1475249.340], PI[1379428.849, 1515513.503]) 1462414.372 (n=5, σ=10941.320, CV=0.75%, CI99%=±1.540%, PI99%=±3.773%, CI[1439886.043, 1484942.701], PI[1407231.461, 1517597.283]) 14943.196 +1.032%
SET rps 10 1 1019681.450 (n=5, σ=6252.577, CV=0.61%, CI99%=±1.263%, PI99%=±3.093%, CI[1006807.308, 1032555.592], PI[988146.372, 1051216.528]) 1023880.576 (n=5, σ=7302.930, CV=0.71%, CI99%=±1.469%, PI99%=±3.597%, CI[1008843.743, 1038917.409], PI[987048.008, 1060713.144]) 4199.126 +0.412%
SET rps 10 9 1817022.950 (n=5, σ=29478.287, CV=1.62%, CI99%=±3.340%, PI99%=±8.182%, CI[1756326.751, 1877719.149], PI[1668348.232, 1965697.668]) 1855739.748 (n=5, σ=35197.731, CV=1.90%, CI99%=±3.905%, PI99%=±9.566%, CI[1783267.134, 1928212.362], PI[1678218.824, 2033260.672]) 38716.798 +2.131%

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@codecov

codecov Bot commented Feb 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.98%. Comparing base (c2aca72) to head (24b2d14).
⚠️ Report is 8 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3193      +/-   ##
============================================
+ Coverage     74.87%   74.98%   +0.10%     
============================================
  Files           129      129              
  Lines         71328    71337       +9     
============================================
+ Hits          53407    53490      +83     
+ Misses        17921    17847      -74     
Files with missing lines Coverage Δ
src/util.c 66.92% <100.00%> (+0.26%) ⬆️

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great results, although with 1ms diff we will be reducing amount of syscalls by 99.9% and not by 99.999% like with 100ms, the expected benefit is close to the one showed by the benchmark above.
I think we're good to go with current implementation!

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@zuiderkwast zuiderkwast merged commit e0a467e into valkey-io:unstable Feb 15, 2026
24 checks passed
@zuiderkwast zuiderkwast deleted the ustime-mono-delta branch February 15, 2026 22:53
harrylin98 pushed a commit to harrylin98/valkey_forked that referenced this pull request Feb 19, 2026
Only call gettimeofday() periodically and add monotonic delta. This
avoids a syscall if we have a fast monotonic clock, which we have now by
default on most platforms.

`ustime()` is called once per command execution. In a busy server
running 1M RPS, `ustime()` is called every microsecond. By calling
`gettimeofday()` only once every millisecond, we save 99.9% of these
syscalls.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
@zuiderkwast zuiderkwast linked an issue Feb 24, 2026 that may be closed by this pull request
hpatro pushed a commit to hpatro/valkey that referenced this pull request Mar 5, 2026
Only call gettimeofday() periodically and add monotonic delta. This
avoids a syscall if we have a fast monotonic clock, which we have now by
default on most platforms.

`ustime()` is called once per command execution. In a busy server
running 1M RPS, `ustime()` is called every microsecond. By calling
`gettimeofday()` only once every millisecond, we save 99.9% of these
syscalls.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Unnecessary call to ustime

3 participants