We sometimes see out of memory errors when creating excessively large snapshots. For example, from #6991:
I160701 14:45:05.143248 storage/replica_raftstorage.go:524 generated snapshot for range 403 at index
3533112 in 31.747833551s. encoded size=1072891308, 6966 KV pairs, 1677671 log entries
fatal error: runtime: out of memory
Notice the huge raft log. Because the raft log is part of the replicated state we must send all of the raft log with the snapshot in order to avoid divergence of the new replica. (@bdarnell Perhaps the applied portion of the raft log should not be considered during consistency checks).
It should be possible to add a failsafe to Replica.snapshot() so that if we see a very large snapshot is being created we return raft.ErrSnapshotTemporarilyUnavailable and possibly add the replica to the raft-log-gc queue.
We sometimes see out of memory errors when creating excessively large snapshots. For example, from #6991:
Notice the huge raft log. Because the raft log is part of the replicated state we must send all of the raft log with the snapshot in order to avoid divergence of the new replica. (@bdarnell Perhaps the applied portion of the raft log should not be considered during consistency checks).
It should be possible to add a failsafe to
Replica.snapshot()so that if we see a very large snapshot is being created we returnraft.ErrSnapshotTemporarilyUnavailableand possibly add the replica to the raft-log-gc queue.