Skip to content

Stop WAL replay from running into OOM #26481

@praveen-influx

Description

@praveen-influx

Problem

When server is restarted with significant (this is relative to available memory) number of WAL files, replay runs into OOM. In a constrained environment (for eg 1G) even though the snapshot process never runs into OOM it is still possible to shutdown the server and restart it to see replay struggling and eventually running into OOM. This is not even getting to a point of snapshotting (reason for comparison with snapshot is because that's when we usually see OOM currently), it looks like loading all wal files concurrently is the issue (relevant code below)

let mut replay_tasks = Vec::new();
for path in paths {
let object_store = Arc::clone(&self.object_store);
replay_tasks.push(tokio::spawn(get_contents(object_store, path)));
}
for wal_contents in replay_tasks {
let wal_contents = wal_contents.await??;

Once the process reaches this state, it is highly likely more memory needs to be allocated to the process (like moving from 1G to 2G). It would be better to expose some CLI parameters so that user can control how many files can be loaded concurrently.

Solution

Stream the files and load them with a concurrency limit (this could be unbounded by default). If the user runs into OOM then that CLI parameter can be tweaked to get the files loaded again. The default could be more conservative (like 20 for example), but given the user won't be running into this often it's probably better to use unbounded as default.

        let stream = futures::stream::iter(paths);
        let mut replay_tasks = stream
            .map(|path| {
                let object_store = Arc::clone(&self.object_store);
                async move { get_contents(object_store, path).await }
            })
            .buffered(wal_replay_concurrency_limit.unwrap_or(usize::MAX));

        while let Some(wal_contents) = replay_tasks.next().await {
            let wal_contents = wal_contents?;

Alternatives considered

I cannot think of any other mechanism that could work around this issue without having a cap on concurrency.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions