Skip to content

Add MemoryReservation to batch splitting in joins  #13003

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Follow on to #12969 and #12633

In #12633 @mhilton noted that joins sometimes generate giant record batches which causes issues. @alihan-synnada fixed this in #12969 but internally sometimes the joins still generate giant output batches.

As @mhilton says in #12969 (comment)

Unfortunately this doesn't address the actual problem with creating giant batches, which is they require a lot of memory and that memory isn't accounted for in any MemoryPool. Wiring a MemoryReservation into BatchSplitter would probably be enough to address this though.

Describe the solution you'd like

I would like the memory accounting to take into account the large output batch

Describe alternatives you've considered

Wiring a MemoryReservation into BatchSplitter would probably be enough to address

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions