Add warning for suspected slow system clocksource setting #10636

yoav-steinberg · 2022-04-25T15:24:40Z

This PR does 2 main things:

Add warning for suspected slow system clocksource setting. This is Linux specific.
Add a --check-system argument to redis which runs all system checks and prints a report.

System checks

Add a command line option --check-system which runs all known system checks and provides a report to stdout of which systems checks have failed with details on how to reconfigure the system for optimized redis performance.
The --check-system mode exists with an appropriate error code after running all the checks.

Slow clocksource details

We check the system's clocksource performance by running clock_gettime() in a loop and then checking how much time was spent in a system call (via getrusage()). If we spend more than 10% of the time in the kernel then we print a warning. I verified that using the slow clock sources: acpi_pm (~90% in the kernel on my laptop) and xen (~30% in the kernel on an ec2 m4.large) we get this warning.

The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...). Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only executed in the --check-system mode.

For a quick startup check, we specifically warn if the we see the system is using the xen clocksource which we know has bad performance and isn't recommended (at least on ec2). In such a case the user should manually rung redis with --check-system to force the slower clocksource test described above.

Other changes in the PR

All the system checks are now implemented as functions in syscheck.c. They are implemented using a standard interface (see details in syscheck.c). To do this I moved the checking functions linuxOvercommitMemoryValue(), THPIsEnabled(), linuxMadvFreeForkBugCheck() out of server.c and latency.c and into the new syscheck.c. When moving these functions I made sure they don't depend on other functionality provided in server.c and made them use a standard "check functions" interface. Specifically:
- I removed all logging out of linuxMadvFreeForkBugCheck(). In case there's some unexpected error during the check aborts as before, but without any logging. It returns an error code 0 meaning the check didn't not complete.
- All these functions now return 1 on success, -1 on failure, 0 in case the check itself cannot be completed.
- The linuxMadvFreeForkBugCheck() function now internally calls exit() and not exitFromChild() because the latter is only available in server.c and I wanted to remove that dependency. This isn't an because we don't need to worry about the child process created by the test doing anything related to the rdb/aof files which is why exitFromChild() was created.
This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general. Not that before this fix the rss info in INFO memory was corrupt in case of spaces in the process name. To recreate just rename redis-server to redis server, start it, and run INFO memory.

This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general.

oranagra

reviewed the part in zmalloc.c and the linuxTimeWarnings function.
missing warning suppression, and extraction of the log highlighting thing to a different PR.

src/server.c

src/zmalloc.c

This is to accommodate a xen cloucksource on ec2 spending about 30% of the time in system calls.

Some cleanups.

src/server.c

Comparing process time to wall clock is apples and oranges and should be avoided as best practice even though in this case we are single threaded.

src/server.c

* clocksource test is now part of `--check-system`. * added explicit `xen` clocksource warning check on startup. * created `--check-system` with clocksource, xen, thp and overcommit tests.

src/latency.c

src/server.c

src/syscheck.c

src/server.c

src/syscheck.c

src/server.c

src/syscheck.c

src/zmalloc.c

oranagra · 2022-05-11T12:19:35Z

@redis/core-team please approve. see top comment.

src/syscheck.c

madolson

Seems reasonable.

src/syscheck.c

madolson · 2022-05-15T04:33:04Z

src/syscheck.c

+/* Make sure transparent huge pages aren't always enabled. When they are this can cause copy-on-write logic
+ * to consume much more memory and reduce performance during forks. */


If someone is running a non-durable variant of Redis, THP can be a good thing since it reduce TLB misses. It seems like the warning message doesn't really address that situation very clearly.

I think there might be other instances where a fork might cause THP issues. Redis might fork for module processing (CHILD_TYPE_MODULE) or for disk based replication. I was under the impression that this should always be turned off. But you are right that there are other cases.
I have a feeling that too much detail here will become cumbersome. For the general use case we should recommend turning this off.
It might be wise to add a link to the docs website for a detailed explanation, and perhaps do the same for all the other warnings. The downside being that we'll have to keep the link valid.

i don't think a link to the website is suitable, might get outdated.
i think we need to provide enough information so that people can google it, or find an answer in stack overflow.
i'd prefer to avoid providing incorrect or misleading information, so in this context maybe we should replace This will create latency and memory usage issues with Redis with This may ....
But i'll leave that for another opportunity (this PR just relocates that text, it didn't introduce it).

oranagra · 2022-05-16T06:07:30Z

approved in a core-team meeting.

sundb · 2022-05-17T02:58:51Z

src/server.c

    #ifdef __linux__
        linuxMemoryWarnings();
+        sds err_msg;
+        if (checkXenClocksource(&err_msg) < 0) {


Maybe we can change linuxMemoryWarnings() to int checkLinuxMemoryWarnings(sds *err_msg), so we don't need to repeat serverLog(LL_WARNING, ...) and sdsfree(...) multiple times.

if (checkLinuxMemoryWarnings(&err_msg) < 0 || checkXenClocksource(&err_msg) < 0) { serverLog(LL_WARNING, "WARNING %s", err_msg); sdsfree(err_msg); }

But we might want to print both warnings.

Indeed, I was wrong.
BTW, When I try to add madv DONTNEED detection based on this pr and output warnings, I can only copy most of the warning ouput code similar to checkLinuxMadvFreeForkBug, which is not very comfortable.

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>

When `/proc/sys/vm/overcommit_memory` is inaccessible, fp is NULL. `checkOvercommit` will return -1 without setting err_msg, and then the err_msg is used to print the log, crash the server. Set the err_msg variables to Null when declaring it, seems safer. And the overcommit_memory error log will print two "WARNING", like `WARNING WARNING overcommit_memory is set to 0!`, this PR also removes the second WARNING in `checkOvercommit`. Reported in #10846. Fixes #10846. Introduced in #10636 (7.0.1)

A regression caused by #10636 (released in 7.0.1) causes Redis startup warning about overcommit to be missing when needed and printed when not. Also, using atoi() to convert the string's value to an integer cannot discern between an actual 0 (zero) having been read, or a conversion error.

This PR does 2 main things: 1) Add warning for suspected slow system clocksource setting. This is Linux specific. 2) Add a `--check-system` argument to redis which runs all system checks and prints a report. ## System checks Add a command line option `--check-system` which runs all known system checks and provides a report to stdout of which systems checks have failed with details on how to reconfigure the system for optimized redis performance. The `--system-check` mode exists with an appropriate error code after running all the checks. ## Slow clocksource details We check the system's clocksource performance by running `clock_gettime()` in a loop and then checking how much time was spent in a system call (via `getrusage()`). If we spend more than 10% of the time in the kernel then we print a warning. I verified that using the slow clock sources: `acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`) we get this warning. The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...). Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only executed in the `--check-system` mode. For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource which we know has bad performance and isn't recommended (at least on ec2). In such a case the user should manually rung redis with `--check-system` to force the slower clocksource test described above. ## Other changes in the PR * All the system checks are now implemented as functions in _syscheck.c_. They are implemented using a standard interface (see details in _syscheck.c_). To do this I moved the checking functions `linuxOvercommitMemoryValue()`, `THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_ and into the new _syscheck.c_. When moving these functions I made sure they don't depend on other functionality provided in _server.c_ and made them use a standard "check functions" interface. Specifically: * I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some unexpected error during the check aborts as before, but without any logging. It returns an error code 0 meaning the check didn't not complete. * All these functions now return 1 on success, -1 on failure, 0 in case the check itself cannot be completed. * The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not `exitFromChild()` because the latter is only available in _server.c_ and I wanted to remove that dependency. This isn't an because we don't need to worry about the child process created by the test doing anything related to the rdb/aof files which is why `exitFromChild()` was created. * This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces in the process name and be more robust in general. Not that before this fix the rss info in `INFO memory` was corrupt in case of spaces in the process name. To recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.

When `/proc/sys/vm/overcommit_memory` is inaccessible, fp is NULL. `checkOvercommit` will return -1 without setting err_msg, and then the err_msg is used to print the log, crash the server. Set the err_msg variables to Null when declaring it, seems safer. And the overcommit_memory error log will print two "WARNING", like `WARNING WARNING overcommit_memory is set to 0!`, this PR also removes the second WARNING in `checkOvercommit`. Reported in redis#10846. Fixes redis#10846. Introduced in redis#10636 (7.0.1)

A regression caused by redis#10636 (released in 7.0.1) causes Redis startup warning about overcommit to be missing when needed and printed when not. Also, using atoi() to convert the string's value to an integer cannot discern between an actual 0 (zero) having been read, or a conversion error.

Add warning for suspected slow system clocksource setting.

96c9326

This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general.

yoav-steinberg mentioned this pull request Apr 25, 2022

Add runtime check to warn when the kernel's clock source may be wrong #10604

Closed

oranagra linked an issue Apr 26, 2022 that may be closed by this pull request

Add runtime check to warn when the kernel's clock source may be wrong #10604

Closed

oranagra added the 7.0-must-have label Apr 26, 2022

yoav-steinberg marked this pull request as ready for review April 26, 2022 07:17

oranagra reviewed Apr 26, 2022

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

src/server.c Outdated Show resolved Hide resolved

src/zmalloc.c Outdated Show resolved Hide resolved

src/zmalloc.c Outdated Show resolved Hide resolved

yoav-steinberg added 4 commits April 27, 2022 11:04

Detailed warning for slow clocksource

923035c

Lower thershold for clocksource warning print.

8e64305

This is to accommodate a xen cloucksource on ec2 spending about 30% of the time in system calls.

Cleanups

b585ae1

Added ability to skip slow clocksource warning check (faster startup).

1e0f676

Some cleanups.

yoav-steinberg force-pushed the clocksource_test branch from f5eabd1 to 1e0f676 Compare April 27, 2022 08:05

yoav-steinberg added 3 commits April 27, 2022 11:32

cleanup

38d9626

mac build issue

d3bc427

fix case of ')' in process name

69eb449

oranagra reviewed Apr 27, 2022

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

yoav-steinberg added 4 commits April 28, 2022 08:35

close file on read err

a4421dc

Add fix suggestion to error message.

38c87b8

Compare stime to utime instead of wall clock

dd49cda

Comparing process time to wall clock is apples and oranges and should be avoided as best practice even though in this case we are single threaded.

dummy

aaee209

yoav-steinberg requested review from oranagra and yossigo May 3, 2022 15:41

oranagra reviewed May 4, 2022

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

added --check-system facility.

61d431a

* clocksource test is now part of `--check-system`. * added explicit `xen` clocksource warning check on startup. * created `--check-system` with clocksource, xen, thp and overcommit tests.

yoav-steinberg requested a review from oranagra May 9, 2022 12:32

yoav-steinberg added 2 commits May 9, 2022 15:43

cleanup, bugfix

8c81379

fixes

98d79bb

yoav-steinberg added the state:needs-doc-pr requires a PR to redis-doc repository label May 9, 2022

oranagra reviewed May 9, 2022

View reviewed changes

yoav-steinberg added 2 commits May 10, 2022 10:45

CR fixes.

2f0ee54

CR cleanups.

82f99c8

oranagra reviewed May 10, 2022

View reviewed changes

src/server.c Show resolved Hide resolved

docs

6088254

sundb reviewed May 11, 2022

View reviewed changes

src/syscheck.c Outdated Show resolved Hide resolved

src/zmalloc.c Outdated Show resolved Hide resolved

yoav-steinberg added 2 commits May 11, 2022 11:46

cr

1cb5096

cr

4f46b02

oranagra approved these changes May 11, 2022

View reviewed changes

oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus approval-needed Waiting for core team approval to be merged labels May 11, 2022

madolson reviewed May 15, 2022

View reviewed changes

src/syscheck.c Outdated Show resolved Hide resolved

madolson approved these changes May 15, 2022

View reviewed changes

sundb reviewed May 17, 2022

View reviewed changes

yoav-steinberg and others added 2 commits May 17, 2022 09:17

Update src/syscheck.c

24ee7aa

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>

cr

e876f28

oranagra merged commit 843a4cd into redis:unstable May 22, 2022

This was referenced Jun 8, 2022

Release 7.0.1 #10829

Merged

Correctly check for vm.overcommit_memory == 0 #10841

Merged

[CRASH] redis 7.0.1 crashes on start after upgrading from 7.0.0 #10846

Closed

enjoy-binbin mentioned this pull request Jun 10, 2022

Fix crash when overcommit_memory is inaccessible #10848

Merged

enjoy-binbin mentioned this pull request Jul 11, 2022

Add --check-system in redis-server usage #10960

Merged

		/* Make sure transparent huge pages aren't always enabled. When they are this can cause copy-on-write logic
		* to consume much more memory and reduce performance during forks. */

Add warning for suspected slow system clocksource setting #10636

Add warning for suspected slow system clocksource setting #10636

Uh oh!

Conversation

yoav-steinberg commented Apr 25, 2022 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

System checks

Slow clocksource details

Other changes in the PR

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra commented May 11, 2022

Uh oh!

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson May 15, 2022

Choose a reason for hiding this comment

Uh oh!

yoav-steinberg May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oranagra May 22, 2022

Choose a reason for hiding this comment

Uh oh!

oranagra commented May 16, 2022

Uh oh!

sundb May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yoav-steinberg May 17, 2022

Choose a reason for hiding this comment

Uh oh!

sundb May 17, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yoav-steinberg commented Apr 25, 2022 •

edited by oranagra

Loading

yoav-steinberg May 17, 2022 •

edited

Loading

sundb May 17, 2022 •

edited

Loading