fix(wal_replay): limit the number of wal files preloaded to num_cpus#26716
fix(wal_replay): limit the number of wal files preloaded to num_cpus#26716
Conversation
Wal replay currently loads all the wal files into memory and decodes them by default. If that's 10s of GB or 100s of GB, it'll try to do it potentially causing OOMs if it exceeds system memory. We likely keep most of the speed from preloading but decrease chance of OOM by preloading a more limited number of wal files. In the absence of an option to directly limit the memory used in preload, we can use the number of cpu cores available as a proxy. This will be the number of wal files loaded to replay, which has to happen in order still. The current recommendation is to use 10 if you encounter an OOM so let's use that as the minimum if a specific value isn't set. The new logic is num_files_preloaded = (user's choice) of if not set (max(10, num_cpus) This should improve the experience restarting the server when there is a lot of wal data. * closes #26715
hiltontj
left a comment
There was a problem hiding this comment.
Looks good! I have a suggestion that you can take or leave.
The only thing I would do differently is have the default set in the clap block here:
influxdb/influxdb3/src/commands/serve.rs
Lines 521 to 525 in ec343b9
That way, replay could take a usize instead of Option<usize>, but more importantly, the default setting is visible at the serve command level, instead of nested down in the influxdb3_wal crate there.
influxdb3_wal/Cargo.toml
Outdated
| bytes.workspace = true | ||
| byteorder.workspace = true | ||
| bytes.workspace = true | ||
| clap = "4.5.23" |
There was a problem hiding this comment.
| clap = "4.5.23" | |
| clap.workspace = true |
There was a problem hiding this comment.
yes of course. done.
influxdb3/src/help/serve_all.txt
Outdated
| --wal-replay-concurrency-limit <LIMIT> | ||
| Concurrency limit during WAL replay [default: no_limit] | ||
| Concurrency limit during WAL replay [default: max(num_cpus, 10)] | ||
| If replay runs into OOM, set this to a lower number eg. 10 |
There was a problem hiding this comment.
| If replay runs into OOM, set this to a lower number eg. 10 | |
| Setting this number too high can lead to OOM. |
Or something to that effect. I think the previous statement no longer applies if we are setting a reasonable default.
…oncurrency limit - Add wal_replay_concurrency_limit_default() helper fn for max(num_cpus, 10) - Change field type from Option<usize> to usize - Update help text to clarify dynamic nature and OOM warning
philjb
left a comment
There was a problem hiding this comment.
suggestions implemented!
influxdb3_wal/Cargo.toml
Outdated
| bytes.workspace = true | ||
| byteorder.workspace = true | ||
| bytes.workspace = true | ||
| clap = "4.5.23" |
There was a problem hiding this comment.
yes of course. done.
influxdb3/src/help/serve_all.txt
Outdated
| --wal-replay-concurrency-limit <LIMIT> | ||
| Concurrency limit during WAL replay [default: no_limit] | ||
| Concurrency limit during WAL replay [default: max(num_cpus, 10)] | ||
| If replay runs into OOM, set this to a lower number eg. 10 |
| )]; | ||
|
|
||
| const MIN_REPLAY_PRELOAD_CONCURRENCY: usize = 10; // the min number of files that will be held in memory | ||
| fn wal_replay_concurrency_limit_default() -> String { |
There was a problem hiding this comment.
I didn't know the clap default value macro would take a function call but it does so this works nicely to dynamically determine the default value. thanks for the suggestion.
But of course we only print the curated help file, we can't actually get clap to print in cli help the calculated value in the current step up, which is kinda shame.
but i tested it by commenting out the maybe_print_help() method and it works.
Perhaps we can bring back the help subcommand which will still work with our curated help but will display all the clap generated help too.
Line 51 in 03dc8c6
Wal replay currently loads all the wal files into memory and decodes them by default. If that's 10s of GB or 100s of GB, it'll try to do it potentially causing OOMs if it exceeds system memory. We likely keep most of the speed from preloading but decrease chance of OOM by preloading a more limited number of wal files. In the absence of an option to directly limit the memory used in preload, we can use the number of cpu cores available as a proxy. This will be the number of wal files loaded to replay, which has to happen in order still. The current recommendation is to use 10 if you encounter an OOM so let's use that as the minimum if a specific value isn't set. The new logic is
num_files_preloaded = (user's choice) or if not set (max(10, num_cpus))
This should improve the experience restarting the server when there are a lot of wal data.