Skip to content

Optimize sort preserving merge #416

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The new sort preserving merge operator, introduced in #379 likely has room for performance improvement.

Describe the solution you'd like

  1. Create a benchmark for the merging operator
  2. Optimize / improve benchmark as appropriate

Here is a suggestion from @jhorstmann https://github.com/apache/arrow-datafusion/pull/379/files#r637948151 as a separate ticket so it doesn't get lost:

For bigger number of partitions, storing the cursors in a BinaryHeap, sorted by their current item, would be beneficial.

A rust implementation of that approach can be seen in this blog post and the first comment under it. I have implemented the same approach in java before. I agree with @alamb though to make it work first, and then optimize later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions