-
Notifications
You must be signed in to change notification settings - Fork 3.1k
A confusion about Top-N operator #14896
Copy link
Copy link
Closed
Labels
Description
What happens?
In TopNHeap::Reduce, the reduce threshold is determined by payload_chunk.size(). However, the payload_chunk is only used as a temporary data holder in TopNHeap::Combine. It's heap_data right?
To Reproduce
void TopNHeap::Combine(TopNHeap &other) {
other.Finalize();
// FIXME: heaps can be merged directly instead of doing it like this
// that only really speeds things up if heaps are very large, however
TopNScanState state;
other.InitializeScan(state, false);
while (true) {
payload_chunk.Reset();
other.Scan(state, payload_chunk);
if (payload_chunk.size() == 0) {
break;
}
Sink(payload_chunk);
}
Reduce();
}
void TopNHeap::Reduce() {
if (payload_chunk.size() < ReduceThreshold()) {
// only reduce when we pass the reduce threshold
return;
}
...
}OS:
x86_64
DuckDB Version:
newest commit id: b470dea
DuckDB Client:
none
Hardware:
No response
Full Name:
Yiliang Qiu
Affiliation:
Baidu Inc.
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have
Reactions are currently unavailable