-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Problem
When server is restarted with significant (this is relative to available memory) number of WAL files, replay runs into OOM. In a constrained environment (for eg 1G) even though the snapshot process never runs into OOM it is still possible to shutdown the server and restart it to see replay struggling and eventually running into OOM. This is not even getting to a point of snapshotting (reason for comparison with snapshot is because that's when we usually see OOM currently), it looks like loading all wal files concurrently is the issue (relevant code below)
influxdb/influxdb3_wal/src/object_store.rs
Lines 151 to 158 in be25c6f
| let mut replay_tasks = Vec::new(); | |
| for path in paths { | |
| let object_store = Arc::clone(&self.object_store); | |
| replay_tasks.push(tokio::spawn(get_contents(object_store, path))); | |
| } | |
| for wal_contents in replay_tasks { | |
| let wal_contents = wal_contents.await??; |
Once the process reaches this state, it is highly likely more memory needs to be allocated to the process (like moving from 1G to 2G). It would be better to expose some CLI parameters so that user can control how many files can be loaded concurrently.
Solution
Stream the files and load them with a concurrency limit (this could be unbounded by default). If the user runs into OOM then that CLI parameter can be tweaked to get the files loaded again. The default could be more conservative (like 20 for example), but given the user won't be running into this often it's probably better to use unbounded as default.
let stream = futures::stream::iter(paths);
let mut replay_tasks = stream
.map(|path| {
let object_store = Arc::clone(&self.object_store);
async move { get_contents(object_store, path).await }
})
.buffered(wal_replay_concurrency_limit.unwrap_or(usize::MAX));
while let Some(wal_contents) = replay_tasks.next().await {
let wal_contents = wal_contents?;Alternatives considered
I cannot think of any other mechanism that could work around this issue without having a cap on concurrency.