Skip to content

Ensure GroupByHash does not error when trying to spill (calling try_resize where error is not acceptible) #14851

@EmilyMatt

Description

@EmilyMatt

Describe the bug

In the GroupedHashAggregateStream, update_memory_reservation() has a try_resize() call, an error returned from this function is usually handled gracefully, as we are usually in the middle of emitting and can just emit again, or are accumulating and can spill(unless not even a single batch fits in memory, in which case this can be a justified panic).

However, when creating a merge stream using the spill data, there is no viable error fallback, this early exits on error if the memory pool denied the reservation.
Using a custom memory pool, it should be possible to implement a "burst" allocation or something similar, and avoid aborting a query, but this requires the aggregate a "declaration of intentions" of sorts, shouldn't resize() be used in such cases, instead of try_resize()?

This is the problematic line:

self.update_memory_reservation()?;

I believe in line 910 as well,

match self.update_memory_reservation() {
            // Here we can ignore `insufficient_capacity_err` because we will spill later,
            // but at least one batch should fit in the memory
            Err(DataFusionError::ResourcesExhausted(_))
                if self.group_values.len() >= self.batch_size =>
            {
                Ok(())
            }
            other => other,
        }

This should check preemptively if we have a possibility of spilling here, and if not, it should be a resize(), not a try_resize() as update_memory_reservation() does.
But this one actually has a possibility of not aborting so seems more agreeable.

It's possible that there are other design ideas taken into consideration here, but this is my belief at least.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions