Skip to content

[VL] Spill related issues tracker #3030

@zhztheplayer

Description

@zhztheplayer

Description

Mirror issue in facebookincubator/velox facebookincubator/velox#6414

This is to list the large memory occupations that are not spillable so far, which means, that are not able to be spilled to disk, in Velox backend's query execution.

Technically the listed items should be all finally fixed ("fix" means to make then spillable), to ensure the memory stability of Gluten. Otherwise there would be chance that OOM error raises during execution that would fail the user query.

The list of non-spillable large occupations (attach PR following each item once fixing):

  • Buffered inputs from Velox's window operator
    • Streaming window
      • Streaming window build
      • Streaming window functions without build, not planned in Velox yet
    • Spillable sort window
  • Buffered inputs from Velox's hash-aggregate operator, when aggregate is distinct aggregate
  • Buffered inputs from Velox's hash-aggregate operator, when aggregate is partial aggregate (needs confirmation)
  • Buffered input in Velox's hash-aggregate/hash-join(build)/sort operator, after all input is added
    • Hash-aggregate
    • Hash-join(build) Velox community is working on this now.
    • Sort
  • Pre-allocate split buffers from Gluten's Velox shuffle writer
  • A task can take use executor's memory if no other task running in the executor, TPCDS Q67. Vanilla spark does this
  • External sort in fallbacked partition write ( can’t be triggered by gluten)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttrackerTracker of issues in the same category

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions