Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented May 3, 2022

Which issue does this PR close?

Part of #2427

Rationale for this change

Add benchmarks for the cases I intended to optimize for in #2427

What changes are included in this PR?

new merge benchmark

run:

cargo bench --bench merge

Here is an example flamegraph produced via:

TODO

Are there any user-facing changes?

No

@alamb alamb force-pushed the alamb/merge_benchmark branch from b198895 to 4e6d284 Compare May 9, 2022 21:07
@alamb alamb force-pushed the alamb/merge_benchmark branch from aaacfe9 to fc5f039 Compare May 18, 2022 13:54
@alamb alamb force-pushed the alamb/merge_benchmark branch from fc5f039 to 5b6e9c5 Compare May 18, 2022 16:41
@alamb alamb changed the title WIP Add benchmark for sort preserving merge Benchmark for sort preserving merge May 18, 2022
@alamb alamb marked this pull request as ready for review May 18, 2022 18:15
@alamb alamb requested a review from tustvold May 18, 2022 18:57
@alamb
Copy link
Contributor Author

alamb commented May 18, 2022

cc @tustvold @yjshen @richox

Comment on lines +32 to +67
//! Rows are randombly
//! divided into separate
//! RecordBatch "streams",
//! ┌────┐ ┌────┐ ┌────┐ preserving the order ┌────┐ ┌────┐ ┌────┐
//! │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ ──────────────┐ │ │ │ │ │ │
//! │ │ │ │ │ │ └─────────────▶ │ C1 │ │... │ │ CN │
//! │ │ │ │ │ │ ───────────────┐ │ │ │ │ │ │
//! │ │ │ │ │ │ ┌┼─────────────▶ │ │ │ │ │ │
//! │ │ │ │ │ │ ││ │ │ │ │ │ │
//! │ │ │ │ │ │ ││ └────┘ └────┘ └────┘
//! │ │ │ │ │ │ ││ ┌────┐ ┌────┐ ┌────┐
//! │ │ │ │ │ │ │└───────────────▶│ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ ... │ │ C1 │ │... │ │ CN │
//! │ │ │ │ │ │ ──────────────┘ │ │ │ │ │ │
//! │ │ │ │ │ │ ┌──────────────▶ │ │ │ │ │ │
//! │ C1 │ │... │ │ CN │ │ │ │ │ │ │ │
//! │ │ │ │ │ │───────────────┐│ └────┘ └────┘ └────┘
//! │ │ │ │ │ │ ││
//! │ │ │ │ │ │ ││
//! │ │ │ │ │ │ ││ ...
//! │ │ │ │ │ │ ────────────┼┼┐
//! │ │ │ │ │ │ │││
//! │ │ │ │ │ │ │││ ┌────┐ ┌────┐ ┌────┐
//! │ │ │ │ │ │ ──────────────┼┘│ │ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ C1 │ │... │ │ CN │
//! │ │ │ │ │ │ └─┼────────────▶ │ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ └─────────────▶ │ │ │ │ │ │
//! └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
//! Input RecordBatch NUM_STREAMS input
//! Columns 1..N RecordBatches
//! INPUT_SIZE sorted rows (still INPUT_SIZE total
//! ~10% duplicates rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ Love the diagram!

@andygrove andygrove merged commit c327983 into apache:master May 20, 2022
@alamb alamb deleted the alamb/merge_benchmark branch May 20, 2022 17:26
@alamb
Copy link
Contributor Author

alamb commented May 20, 2022

Thanks @andygrove

@alamb alamb mentioned this pull request May 29, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants