-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Add warning for suspected slow system clocksource setting #10636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general.
oranagra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed the part in zmalloc.c and the linuxTimeWarnings function.
missing warning suppression, and extraction of the log highlighting thing to a different PR.
This is to accommodate a xen cloucksource on ec2 spending about 30% of the time in system calls.
f5eabd1 to
1e0f676
Compare
Comparing process time to wall clock is apples and oranges and should be avoided as best practice even though in this case we are single threaded.
* clocksource test is now part of `--check-system`. * added explicit `xen` clocksource warning check on startup. * created `--check-system` with clocksource, xen, thp and overcommit tests.
|
@redis/core-team please approve. see top comment. |
madolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
| /* Make sure transparent huge pages aren't always enabled. When they are this can cause copy-on-write logic | ||
| * to consume much more memory and reduce performance during forks. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone is running a non-durable variant of Redis, THP can be a good thing since it reduce TLB misses. It seems like the warning message doesn't really address that situation very clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be other instances where a fork might cause THP issues. Redis might fork for module processing (CHILD_TYPE_MODULE) or for disk based replication. I was under the impression that this should always be turned off. But you are right that there are other cases.
I have a feeling that too much detail here will become cumbersome. For the general use case we should recommend turning this off.
It might be wise to add a link to the docs website for a detailed explanation, and perhaps do the same for all the other warnings. The downside being that we'll have to keep the link valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think a link to the website is suitable, might get outdated.
i think we need to provide enough information so that people can google it, or find an answer in stack overflow.
i'd prefer to avoid providing incorrect or misleading information, so in this context maybe we should replace This will create latency and memory usage issues with Redis with This may ....
But i'll leave that for another opportunity (this PR just relocates that text, it didn't introduce it).
|
approved in a core-team meeting. |
| #ifdef __linux__ | ||
| linuxMemoryWarnings(); | ||
| sds err_msg; | ||
| if (checkXenClocksource(&err_msg) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can change linuxMemoryWarnings() to int checkLinuxMemoryWarnings(sds *err_msg), so we don't need to repeat serverLog(LL_WARNING, ...) and sdsfree(...) multiple times.
if (checkLinuxMemoryWarnings(&err_msg) < 0 ||
checkXenClocksource(&err_msg) < 0)
{
serverLog(LL_WARNING, "WARNING %s", err_msg);
sdsfree(err_msg);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we might want to print both warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I was wrong.
BTW, When I try to add madv DONTNEED detection based on this pr and output warnings, I can only copy most of the warning ouput code similar to checkLinuxMadvFreeForkBug, which is not very comfortable.
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
When `/proc/sys/vm/overcommit_memory` is inaccessible, fp is NULL. `checkOvercommit` will return -1 without setting err_msg, and then the err_msg is used to print the log, crash the server. Set the err_msg variables to Null when declaring it, seems safer. And the overcommit_memory error log will print two "WARNING", like `WARNING WARNING overcommit_memory is set to 0!`, this PR also removes the second WARNING in `checkOvercommit`. Reported in #10846. Fixes #10846. Introduced in #10636 (7.0.1)
A regression caused by #10636 (released in 7.0.1) causes Redis startup warning about overcommit to be missing when needed and printed when not. Also, using atoi() to convert the string's value to an integer cannot discern between an actual 0 (zero) having been read, or a conversion error.
This PR does 2 main things:
1) Add warning for suspected slow system clocksource setting. This is Linux specific.
2) Add a `--check-system` argument to redis which runs all system checks and prints a report.
## System checks
Add a command line option `--check-system` which runs all known system checks and provides
a report to stdout of which systems checks have failed with details on how to reconfigure the
system for optimized redis performance.
The `--system-check` mode exists with an appropriate error code after running all the checks.
## Slow clocksource details
We check the system's clocksource performance by running `clock_gettime()` in a loop and then
checking how much time was spent in a system call (via `getrusage()`). If we spend more than
10% of the time in the kernel then we print a warning. I verified that using the slow clock sources:
`acpi_pm` (~90% in the kernel on my laptop) and `xen` (~30% in the kernel on an ec2 `m4.large`)
we get this warning.
The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...).
Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means
running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only
executed in the `--check-system` mode.
For a quick startup check, we specifically warn if the we see the system is using the `xen` clocksource
which we know has bad performance and isn't recommended (at least on ec2). In such a case the
user should manually rung redis with `--check-system` to force the slower clocksource test described
above.
## Other changes in the PR
* All the system checks are now implemented as functions in _syscheck.c_.
They are implemented using a standard interface (see details in _syscheck.c_).
To do this I moved the checking functions `linuxOvercommitMemoryValue()`,
`THPIsEnabled()`, `linuxMadvFreeForkBugCheck()` out of _server.c_ and _latency.c_
and into the new _syscheck.c_. When moving these functions I made sure they don't
depend on other functionality provided in _server.c_ and made them use a standard
"check functions" interface. Specifically:
* I removed all logging out of `linuxMadvFreeForkBugCheck()`. In case there's some
unexpected error during the check aborts as before, but without any logging.
It returns an error code 0 meaning the check didn't not complete.
* All these functions now return 1 on success, -1 on failure, 0 in case the check itself
cannot be completed.
* The `linuxMadvFreeForkBugCheck()` function now internally calls `exit()` and not
`exitFromChild()` because the latter is only available in _server.c_ and I wanted to
remove that dependency. This isn't an because we don't need to worry about the
child process created by the test doing anything related to the rdb/aof files which
is why `exitFromChild()` was created.
* This also fixes parsing of other /proc/\<pid\>/stat fields to correctly handle spaces
in the process name and be more robust in general. Not that before this fix the rss
info in `INFO memory` was corrupt in case of spaces in the process name. To
recreate just rename `redis-server` to `redis server`, start it, and run `INFO memory`.
When `/proc/sys/vm/overcommit_memory` is inaccessible, fp is NULL. `checkOvercommit` will return -1 without setting err_msg, and then the err_msg is used to print the log, crash the server. Set the err_msg variables to Null when declaring it, seems safer. And the overcommit_memory error log will print two "WARNING", like `WARNING WARNING overcommit_memory is set to 0!`, this PR also removes the second WARNING in `checkOvercommit`. Reported in redis#10846. Fixes redis#10846. Introduced in redis#10636 (7.0.1)
A regression caused by redis#10636 (released in 7.0.1) causes Redis startup warning about overcommit to be missing when needed and printed when not. Also, using atoi() to convert the string's value to an integer cannot discern between an actual 0 (zero) having been read, or a conversion error.
This PR does 2 main things:
--check-systemargument to redis which runs all system checks and prints a report.System checks
Add a command line option
--check-systemwhich runs all known system checks and provides a report to stdout of which systems checks have failed with details on how to reconfigure the system for optimized redis performance.The
--check-systemmode exists with an appropriate error code after running all the checks.Slow clocksource details
We check the system's clocksource performance by running
clock_gettime()in a loop and then checking how much time was spent in a system call (viagetrusage()). If we spend more than 10% of the time in the kernel then we print a warning. I verified that using the slow clock sources:acpi_pm(~90% in the kernel on my laptop) andxen(~30% in the kernel on an ec2m4.large) we get this warning.The check runs 5 system ticks so we can detect time spent in kernel at 20% jumps (0%,20%,40%...). Anything more accurate will require the test to run longer. Typically 5 ticks are 50ms. This means running the test on startup will delay startup by 50ms. To avoid this we make sure the test is only executed in the
--check-systemmode.For a quick startup check, we specifically warn if the we see the system is using the
xenclocksource which we know has bad performance and isn't recommended (at least on ec2). In such a case the user should manually rung redis with--check-systemto force the slower clocksource test described above.Other changes in the PR
All the system checks are now implemented as functions in syscheck.c. They are implemented using a standard interface (see details in syscheck.c). To do this I moved the checking functions
linuxOvercommitMemoryValue(),THPIsEnabled(),linuxMadvFreeForkBugCheck()out of server.c and latency.c and into the new syscheck.c. When moving these functions I made sure they don't depend on other functionality provided in server.c and made them use a standard "check functions" interface. Specifically:linuxMadvFreeForkBugCheck(). In case there's some unexpected error during the check aborts as before, but without any logging. It returns an error code 0 meaning the check didn't not complete.linuxMadvFreeForkBugCheck()function now internally callsexit()and notexitFromChild()because the latter is only available in server.c and I wanted to remove that dependency. This isn't an because we don't need to worry about the child process created by the test doing anything related to the rdb/aof files which is whyexitFromChild()was created.This also fixes parsing of other /proc/<pid>/stat fields to correctly handle spaces in the process name and be more robust in general. Not that before this fix the rss info in
INFO memorywas corrupt in case of spaces in the process name. To recreate just renameredis-servertoredis server, start it, and runINFO memory.