Skip to content

A confusion about Top-N operator #14896

@yiliangyl

Description

@yiliangyl

What happens?

In TopNHeap::Reduce, the reduce threshold is determined by payload_chunk.size(). However, the payload_chunk is only used as a temporary data holder in TopNHeap::Combine. It's heap_data right?

To Reproduce

void TopNHeap::Combine(TopNHeap &other) {
	other.Finalize();

	// FIXME: heaps can be merged directly instead of doing it like this
	// that only really speeds things up if heaps are very large, however
	TopNScanState state;
	other.InitializeScan(state, false);
	while (true) {
		payload_chunk.Reset();
		other.Scan(state, payload_chunk);
		if (payload_chunk.size() == 0) {
			break;
		}
		Sink(payload_chunk);
	}
	Reduce();
}

void TopNHeap::Reduce() {
	if (payload_chunk.size() < ReduceThreshold()) {
		// only reduce when we pass the reduce threshold
		return;
	}
	...
}

OS:

x86_64

DuckDB Version:

newest commit id: b470dea

DuckDB Client:

none

Hardware:

No response

Full Name:

Yiliang Qiu

Affiliation:

Baidu Inc.

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions