colexec: poor limiting of batches in size in several operators

In the cFetcher, the hash aggregator, and the ordered synchronizer we are performing careful memory accounting in order to limit the size of the output batches.

However, we have an incorrect assumption that could lead to batches being larger than desired. The first time the batch exceeds the target size, we memorize its "capacity" (i.e. the number of rows that fit into that batch), and in the future we would always put up to that number of rows in the batch. This approach works well when each row is of roughly the same size throughout the lifetime of an operator; however, that is not always the case. Imagine we have 1024 small rows followed by 1024 large rows - first, all 1024 small rows are added into a single batch, we "fix" the capacity at 1024, so all 1024 large rows will be put into a single batch as well, significantly exceeding the target size.

I started working on fixing this in #76421, but it's not a trivial problem.

Jira issue: CRDB-13137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: poor limiting of batches in size in several operators #76464

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

colexec: poor limiting of batches in size in several operators #76464

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions