Skip to content

storage: avoid creating excessively large snapshots #7581

@petermattis

Description

@petermattis

We sometimes see out of memory errors when creating excessively large snapshots. For example, from #6991:

I160701 14:45:05.143248 storage/replica_raftstorage.go:524  generated snapshot for range 403 at index
 3533112 in 31.747833551s. encoded size=1072891308, 6966 KV pairs, 1677671 log entries
fatal error: runtime: out of memory

Notice the huge raft log. Because the raft log is part of the replicated state we must send all of the raft log with the snapshot in order to avoid divergence of the new replica. (@bdarnell Perhaps the applied portion of the raft log should not be considered during consistency checks).

It should be possible to add a failsafe to Replica.snapshot() so that if we see a very large snapshot is being created we return raft.ErrSnapshotTemporarilyUnavailable and possibly add the replica to the raft-log-gc queue.

Metadata

Metadata

Assignees

Labels

S-1-stabilitySevere stability issues that can be fixed by upgrading, but usually don’t resolve by restarting

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions