Skip to content

Conversation

@yoav-steinberg
Copy link
Contributor

@yoav-steinberg yoav-steinberg commented Apr 25, 2022

This PR does 2 main things:

  1. Add warning for suspected slow system clocksource setting. This is Linux specific.
  2. Add a --check-system argument to redis which runs all system checks and prints a report.

System checks

Add a command line option --check-system which runs all known system checks and provides a report to stdout of which systems checks have failed with details on how to reconfigure the system for optimized redis performance.
The --check-system mode exists with an appropriate error code after running all the checks.

Slow clocksource details

We check the system's clocksource performance by running clock_gettime() in a loop and then checking how much time was spent in a system call (via getrusage()). If we spend more than 10% of the time in the kernel then we print a warning. I verified that using the slow clock sources: acpi_pm (~90% in the kernel on my laptop) and xen (~30% in the kernel on an ec2 m4.large) we get this warning.

The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...). Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only executed in the --check-system mode.

For a quick startup check, we specifically warn if the we see the system is using the xen clocksource which we know has bad performance and isn't recommended (at least on ec2). In such a case the user should manually rung redis with --check-system to force the slower clocksource test described above.

Other changes in the PR

  • All the system checks are now implemented as functions in syscheck.c. They are implemented using a standard interface (see details in syscheck.c). To do this I moved the checking functions linuxOvercommitMemoryValue(), THPIsEnabled(), linuxMadvFreeForkBugCheck() out of server.c and latency.c and into the new syscheck.c. When moving these functions I made sure they don't depend on other functionality provided in server.c and made them use a standard "check functions" interface. Specifically:

    • I removed all logging out of linuxMadvFreeForkBugCheck(). In case there's some unexpected error during the check aborts as before, but without any logging. It returns an error code 0 meaning the check didn't not complete.
    • All these functions now return 1 on success, -1 on failure, 0 in case the check itself cannot be completed.
    • The linuxMadvFreeForkBugCheck() function now internally calls exit() and not exitFromChild() because the latter is only available in server.c and I wanted to remove that dependency. This isn't an because we don't need to worry about the child process created by the test doing anything related to the rdb/aof files which is why exitFromChild() was created.
  • This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general. Not that before this fix the rss info in INFO memory was corrupt in case of spaces in the process name. To recreate just rename redis-server to redis server, start it, and run INFO memory.

This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general.
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed the part in zmalloc.c and the linuxTimeWarnings function.
missing warning suppression, and extraction of the log highlighting thing to a different PR.

Comparing process time to wall clock is apples and oranges and
should be avoided as best practice even though in this case we are single
threaded.
@yoav-steinberg yoav-steinberg requested review from oranagra and yossigo May 3, 2022 15:41
* clocksource test is now part of `--check-system`.
* added explicit `xen` clocksource warning check on startup.
* created `--check-system` with clocksource, xen, thp and overcommit tests.
@yoav-steinberg yoav-steinberg requested a review from oranagra May 9, 2022 12:32
@yoav-steinberg yoav-steinberg added the state:needs-doc-pr requires a PR to redis-doc repository label May 9, 2022
@oranagra
Copy link
Member

@redis/core-team please approve. see top comment.

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus approval-needed Waiting for core team approval to be merged labels May 11, 2022
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

Comment on lines +161 to +162
/* Make sure transparent huge pages aren't always enabled. When they are this can cause copy-on-write logic
* to consume much more memory and reduce performance during forks. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is running a non-durable variant of Redis, THP can be a good thing since it reduce TLB misses. It seems like the warning message doesn't really address that situation very clearly.

Copy link
Contributor Author

@yoav-steinberg yoav-steinberg May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be other instances where a fork might cause THP issues. Redis might fork for module processing (CHILD_TYPE_MODULE) or for disk based replication. I was under the impression that this should always be turned off. But you are right that there are other cases.
I have a feeling that too much detail here will become cumbersome. For the general use case we should recommend turning this off.
It might be wise to add a link to the docs website for a detailed explanation, and perhaps do the same for all the other warnings. The downside being that we'll have to keep the link valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think a link to the website is suitable, might get outdated.
i think we need to provide enough information so that people can google it, or find an answer in stack overflow.
i'd prefer to avoid providing incorrect or misleading information, so in this context maybe we should replace This will create latency and memory usage issues with Redis with This may ....
But i'll leave that for another opportunity (this PR just relocates that text, it didn't introduce it).

@oranagra
Copy link
Member

approved in a core-team meeting.

#ifdef __linux__
linuxMemoryWarnings();
sds err_msg;
if (checkXenClocksource(&err_msg) < 0) {
Copy link
Collaborator

@sundb sundb May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can change linuxMemoryWarnings() to int checkLinuxMemoryWarnings(sds *err_msg), so we don't need to repeat serverLog(LL_WARNING, ...) and sdsfree(...) multiple times.

if (checkLinuxMemoryWarnings(&err_msg) < 0 ||
    checkXenClocksource(&err_msg) < 0)
{
    serverLog(LL_WARNING, "WARNING %s", err_msg);
    sdsfree(err_msg);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we might want to print both warnings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I was wrong.
BTW, When I try to add madv DONTNEED detection based on this pr and output warnings, I can only copy most of the warning ouput code similar to checkLinuxMadvFreeForkBug, which is not very comfortable.

yoav-steinberg and others added 2 commits May 17, 2022 09:17
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
@oranagra oranagra merged commit 843a4cd into redis:unstable May 22, 2022
oranagra pushed a commit that referenced this pull request Jun 11, 2022
When `/proc/sys/vm/overcommit_memory` is inaccessible, fp is NULL.
`checkOvercommit` will return -1 without setting err_msg, and then
the err_msg is used to print the log, crash the server.
Set the err_msg variables to Null when declaring it, seems safer.

And the overcommit_memory error log will print two "WARNING",
like `WARNING WARNING overcommit_memory is set to 0!`, this PR
also removes the second WARNING in `checkOvercommit`.

Reported in #10846. Fixes #10846. Introduced in #10636 (7.0.1)
oranagra pushed a commit that referenced this pull request Jun 12, 2022
A regression caused by #10636 (released in 7.0.1) causes Redis startup warning about overcommit to be missing when needed and printed when not. 

Also, using atoi() to convert the string's value to an integer cannot discern
between an actual 0 (zero) having been read, or a conversion error.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
This PR does 2 main things:
1) Add warning for suspected slow system clocksource setting. This is Linux specific.
2) Add a `--check-system` argument to redis which runs all system checks and prints a report.

## System checks
Add a command line option `--check-system` which runs all known system checks and provides
a report to stdout of which systems checks have failed with details on how to reconfigure the
system for optimized redis performance.
The `--system-check` mode exists with an appropriate error code after running all the checks.

## Slow clocksource details
We check the system's clocksource performance by running `clock_gettime()` in a loop and then
checking how much time was spent in a system call (via `getrusage()`). If we spend more than
10% of the time in the kernel then we print a warning. I verified that using the slow clock sources:
`acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`)
we get this warning.

The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...).
Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means
running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only
executed in the `--check-system` mode.

For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource
which we know has bad performance and isn't recommended (at least on ec2). In such a case the
user should manually rung redis with `--check-system` to force the slower clocksource test described
above.

## Other changes in the PR

* All the system checks are now implemented as functions in _syscheck.c_.
  They are implemented using a standard interface (see details in _syscheck.c_).
  To do this I moved the checking functions `linuxOvercommitMemoryValue()`,
  `THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_
  and into the new _syscheck.c_. When moving these functions I made sure they don't
  depend on other functionality provided in _server.c_ and made them use a standard
  "check functions" interface. Specifically:
  * I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some
    unexpected error during the check aborts as before, but without any logging.
    It returns an error code 0 meaning the check didn't not complete.
  * All these functions now return 1 on success, -1 on failure, 0 in case the check itself
    cannot be completed.
  * The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not
    `exitFromChild()` because the latter is only available in _server.c_ and I wanted to
    remove that dependency. This isn't an because we don't need to worry about the
    child process created by the test doing anything related to the rdb/aof files which
    is why `exitFromChild()` was created.

* This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces
  in the process name and be more robust in general. Not that before this fix the rss
  info in `INFO memory` was corrupt in case of spaces in the process name. To
  recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
When `/proc/sys/vm/overcommit_memory` is inaccessible, fp is NULL.
`checkOvercommit` will return -1 without setting err_msg, and then
the err_msg is used to print the log, crash the server.
Set the err_msg variables to Null when declaring it, seems safer.

And the overcommit_memory error log will print two "WARNING",
like `WARNING WARNING overcommit_memory is set to 0!`, this PR
also removes the second WARNING in `checkOvercommit`.

Reported in redis#10846. Fixes redis#10846. Introduced in redis#10636 (7.0.1)
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
A regression caused by redis#10636 (released in 7.0.1) causes Redis startup warning about overcommit to be missing when needed and printed when not. 

Also, using atoi() to convert the string's value to an integer cannot discern
between an actual 0 (zero) having been read, or a conversion error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add runtime check to warn when the kernel's clock source may be wrong

5 participants