-
Notifications
You must be signed in to change notification settings - Fork 4.1k
colexec: poor limiting of batches in size in several operators #76464
Description
In the cFetcher, the hash aggregator, and the ordered synchronizer we are performing careful memory accounting in order to limit the size of the output batches.
However, we have an incorrect assumption that could lead to batches being larger than desired. The first time the batch exceeds the target size, we memorize its "capacity" (i.e. the number of rows that fit into that batch), and in the future we would always put up to that number of rows in the batch. This approach works well when each row is of roughly the same size throughout the lifetime of an operator; however, that is not always the case. Imagine we have 1024 small rows followed by 1024 large rows - first, all 1024 small rows are added into a single batch, we "fix" the capacity at 1024, so all 1024 large rows will be put into a single batch as well, significantly exceeding the target size.
I started working on fixing this in #76421, but it's not a trivial problem.
Jira issue: CRDB-13137
Metadata
Metadata
Assignees
Labels
Type
Projects
Status