Skip to content

FunctionTopKFilter only implements numeric, non-nullable threshold semantics #99024

@murphy-4o

Description

@murphy-4o

Company or project name

No response

Describe the unexpected behaviour

FunctionTopKFilter currently only supports the subset of ORDER BY semantics that can be represented by plain direction-only threshold comparisons. In practice, this means the optimization paths that depend on it are restricted to numeric and non-nullable types.

Problem

FunctionTopKFilter is the core building block behind dynamic top-k filtering, but it currently uses simplified comparison semantics:

  • ASC uses lessOrEquals
  • DESC uses greaterOrEquals

It does not model:

  • NULL ordering
  • collation-aware string ordering
  • more general type-specific ordering semantics required to match final ORDER BY

Because of that, callers such as the TopN aggregation optimization conservatively gate usage to numeric and non-nullable columns.

Why this matters

This is not just a limitation of use_top_k_dynamic_filtering as a setting. The deeper limitation is in the underlying threshold/filter implementation:

  • FunctionTopKFilter
  • TopKThresholdTracker
  • any storage-side skipping that relies on the same threshold semantics

As long as FunctionTopKFilter only knows direction-based comparisons, it cannot safely filter rows for nullable or collation-sensitive sort keys without risking incorrect results.

Current behavior

FunctionTopKFilter currently hardcodes direction-only comparators:

String comparator = "lessOrEquals";

if (threshold_tracker->getDirection() == -1)
    comparator = "greaterOrEquals";

There is already a TODO noting the missing semantics:

/// TODO: Add NULL-ordering and collation-aware threshold comparisons so
/// this filter can be safely used for nullable / non-numeric ORDER BY args.

Correspondingly, the TopN optimization only enables the path for numeric and non-nullable arguments.

Root cause

Final sorting uses full sort semantics, but FunctionTopKFilter and the threshold tracker only use simplified value comparisons.

So the filter can disagree with the actual final ORDER BY result for:

  • Nullable(T)
  • String with collation
  • other non-numeric orderable types

How to reproduce

omitted

Expected behavior

Proposed direction

Generalize FunctionTopKFilter and related threshold logic to use the same comparison semantics as the real sort path.

That likely means introducing a shared comparator abstraction that includes:

  • sort direction
  • nulls_direction
  • optional collator
  • type-correct comparison rules

and using it consistently in:

  • FunctionTopKFilter
  • TopKThresholdTracker
  • storage-level top-k skipping logic

After that, optimizer gates can be relaxed for supported types.

Impact

This currently blocks top-k dynamic filtering for many valid ORDER BY ... LIMIT cases, even when the optimization would otherwise be beneficial.

Error message and/or stacktrace

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions