-
Notifications
You must be signed in to change notification settings - Fork 8.3k
FunctionTopKFilter only implements numeric, non-nullable threshold semantics #99024
Description
Company or project name
No response
Describe the unexpected behaviour
FunctionTopKFilter currently only supports the subset of ORDER BY semantics that can be represented by plain direction-only threshold comparisons. In practice, this means the optimization paths that depend on it are restricted to numeric and non-nullable types.
Problem
FunctionTopKFilter is the core building block behind dynamic top-k filtering, but it currently uses simplified comparison semantics:
ASCuseslessOrEqualsDESCusesgreaterOrEquals
It does not model:
NULLordering- collation-aware string ordering
- more general type-specific ordering semantics required to match final
ORDER BY
Because of that, callers such as the TopN aggregation optimization conservatively gate usage to numeric and non-nullable columns.
Why this matters
This is not just a limitation of use_top_k_dynamic_filtering as a setting. The deeper limitation is in the underlying threshold/filter implementation:
FunctionTopKFilterTopKThresholdTracker- any storage-side skipping that relies on the same threshold semantics
As long as FunctionTopKFilter only knows direction-based comparisons, it cannot safely filter rows for nullable or collation-sensitive sort keys without risking incorrect results.
Current behavior
FunctionTopKFilter currently hardcodes direction-only comparators:
String comparator = "lessOrEquals";
if (threshold_tracker->getDirection() == -1)
comparator = "greaterOrEquals";There is already a TODO noting the missing semantics:
/// TODO: Add NULL-ordering and collation-aware threshold comparisons so
/// this filter can be safely used for nullable / non-numeric ORDER BY args.Correspondingly, the TopN optimization only enables the path for numeric and non-nullable arguments.
Root cause
Final sorting uses full sort semantics, but FunctionTopKFilter and the threshold tracker only use simplified value comparisons.
So the filter can disagree with the actual final ORDER BY result for:
Nullable(T)Stringwith collation- other non-numeric orderable types
How to reproduce
omitted
Expected behavior
Proposed direction
Generalize FunctionTopKFilter and related threshold logic to use the same comparison semantics as the real sort path.
That likely means introducing a shared comparator abstraction that includes:
- sort direction
nulls_direction- optional collator
- type-correct comparison rules
and using it consistently in:
FunctionTopKFilterTopKThresholdTracker- storage-level top-k skipping logic
After that, optimizer gates can be relaxed for supported types.
Impact
This currently blocks top-k dynamic filtering for many valid ORDER BY ... LIMIT cases, even when the optimization would otherwise be beneficial.
Error message and/or stacktrace
No response
Additional context
No response