-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: avoid creating excessively large snapshots #7581
Copy link
Copy link
Closed
Labels
S-1-stabilitySevere stability issues that can be fixed by upgrading, but usually don’t resolve by restartingSevere stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone
Description
We sometimes see out of memory errors when creating excessively large snapshots. For example, from #6991:
I160701 14:45:05.143248 storage/replica_raftstorage.go:524 generated snapshot for range 403 at index
3533112 in 31.747833551s. encoded size=1072891308, 6966 KV pairs, 1677671 log entries
fatal error: runtime: out of memory
Notice the huge raft log. Because the raft log is part of the replicated state we must send all of the raft log with the snapshot in order to avoid divergence of the new replica. (@bdarnell Perhaps the applied portion of the raft log should not be considered during consistency checks).
It should be possible to add a failsafe to Replica.snapshot() so that if we see a very large snapshot is being created we return raft.ErrSnapshotTemporarilyUnavailable and possibly add the replica to the raft-log-gc queue.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
S-1-stabilitySevere stability issues that can be fixed by upgrading, but usually don’t resolve by restartingSevere stability issues that can be fixed by upgrading, but usually don’t resolve by restarting