Skip to content

Consolidate MergeJoin with HashJoin to adaptive join relations according to runtime resources and table sizes #2316

@yjshen

Description

@yjshen

A possible solution I could think of currently:

  1. Always choose to use HashJoin when there is no statistical information indicating that both tables are large.
  2. Memory tracking while building hashtable for building side.
  3. When the hash-builder fails to grow its memory
    3.1. sort and spill the in-memory hashtable into spill0, free memory.
    3.2. buffer and sort the incoming records for the buffer table until it's exhausted, do a sort.
    3.3. buffer and sort the records for the streaming side until it's finished, do a sort.
    3.4 MergeJoin the two sides.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions