-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: kv-level memory accounting/bounding #19721
Description
#8691 introduces a mechanism to track and limit memory use at the SQL layer, but no effort has been put into tracking and limiting memory usage at any layers beneath this. Without this protection, scans over large tables can still result in trivial out-of-memory crashes, as we've recently seen. I think we'll want to add some memory accounting down in the kv-layer so that we can track memory across all memory-intensive operations in a node (MVCC scans, BatchResponse serialization/deserialization, etc.).
I also think we should add some kind of memory limit to BatchRequests so that responses will not exceed available memory for a client node. Even if one node has enough memory to field a KV request, another may not have enough to receive its response. This could also be accomplished by adding some kind of interceptor for BatchResponses that stops reading the response when it gets too large.
There have been some attempts to limit the size of single kv scans to reasonable numbers of rows. However, these attempts use a constant maximum row count. If rows are unexpectedly large than the current max (10,000 rows) may still be big enough to cause an OOM. We'll probably want to dynamically adjust this maximum limit based on the size of each row. Unfortunately, this problem is exacerbated by the fact that SQL scans can't push column filters down through ScanRequests. This means that large keys may be returned in a ScanResponse even if they are not necessary for the query, leading to unexpected OOMs. This is, for the most part, an orthogonal issue, but demonstrates why we might want memory accounting beneath SQL.
cc. @andreimatei @tschottdorf @knz
Jira issue: CRDB-5958