Add LimitPushdown optimization rule and CoalesceBatchesExec fetch #11652

alihandroid · 2024-07-25T15:24:47Z

Which issue does this PR close?

Closes #9792.

Rationale for this change

Physical plans can be optimized further by pushing GlobalLimitExec and LocalLimitExec down through certain nodes, or using versions of their children nodes with fetch limits, without changing the result. This reduces unnecessary data transfer and processing for a more efficient plan execution.

CoalesceBatchesExec can also benefit from this improvement, and as such, a fetch limit functionality is implemented for it.

For example,

GlobalLimitExec: skip=0, fetch=5
  StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true

can be turned into

StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true, fetch=5

and

GlobalLimitExec: skip=0, fetch=5
  CoalescePartitionsExec
    FilterExec: c3@2 > 0
      RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
        StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true

can be turned into

GlobalLimitExec: skip=0, fetch=5
  CoalescePartitionsExec
    LocalLimitExec: fetch=5
      FilterExec: c3@2 > 0
        RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
          StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true

without changing the result, but using fewer resources and finishing faster

The physical plan in the following excerpt

datafusion/datafusion/sqllogictest/test_files/repartition.slt

Lines 116 to 129 in ecf5323

    
           query TT 
        
           EXPLAIN SELECT c1, c2, c3 FROM sink_table WHERE c3 > 0 LIMIT 5; 
        
           ---- 
        
           logical_plan 
        
           01)Limit: skip=0, fetch=5 
        
           02)--Filter: sink_table.c3 > Int16(0) 
        
           03)----TableScan: sink_table projection=[c1, c2, c3] 
        
           physical_plan 
        
           01)GlobalLimitExec: skip=0, fetch=5 
        
           02)--CoalescePartitionsExec 
        
           03)----CoalesceBatchesExec: target_batch_size=8192 
        
           04)------FilterExec: c3@2 > 0 
        
           05)--------RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1 
        
           06)----------StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true

will turn into

01)GlobalLimitExec: skip=0, fetch=5
02)--CoalescePartitionsExec
03)----CoalesceBatchesExec: target_batch_size=8192, fetch=5
04)------FilterExec: c3@2 > 0
05)--------RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1
06)----------StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true

Other examples can be found in the tests provided in limit_pushdown.rs and other .slt tests

What changes are included in this PR?

Implement LimitPushdown Rule:

Introduced new APIs in the ExecutionPlan trait:
- with_fetch(&self, fetch: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>: Returns fetching version if supported, None otherwise. The default implementation returns None
- supports_limit_pushdown(&self) -> bool: Returns true if a node supports limit pushdown. The default implemenation returns false

Add fetch support to CoalesceBatchesExec:

Add fetch field and with_fetch implementation
Implement fetch limit functionality

Are these changes tested?

Unit tests are provided for LimitPushdown and the new fetching support for CoalesceBatchesExec

Are there any user-facing changes?

No. The changes only affect performance

…c as supporting limit pushdown

alamb · 2024-07-25T19:45:11Z

I believe if you merge up from main the CI will pass on this PR

datafusion/core/src/physical_optimizer/limit_pushdown.rs

Remove redundant lınes ın docstrıng

ozankabak

I think this PR looks good and can be merged to unblock other work. The rule itself still has a few unnecessary object creations that can be refactored out, but we can do it as a follow-on PR.

alamb

Thank you @alihandroid and @ozankabak -- I think this is a really neat optimization

I spent time reviewing the changes to coalesce batches and the plans and they all made sense to me. I have some comment / documentation suggestions but we could also do them as follow on PRs as well

One thing I find interesting is that this is another example of a optimizer pass in the ExectionPlans that mirrors one DataFusion has for LogicalPlans already

DataFusion have several examples of this already (like projection pushdown) -- and we have several of them in InfluxDB: https://github.com/influxdata/influxdb3_core/tree/main/iox_query/src/physical_optimizer (we also have predicate pushdown and projection pushdown)

I think @crepererum implemented those passes because we make a bunch of ExecutionPlan nodes directly so the LogicalPlan passes can't be used. Maybe we should consider upstreaming them 🤔

alamb · 2024-07-26T20:32:39Z

datafusion/sqllogictest/test_files/explain.slt

-physical_plan
-01)GlobalLimitExec: skip=0, fetch=10, statistics=[Rows=Exact(8), Bytes=Absent, [(Col[0]:),(Col[1]:),(Col[2]:),(Col[3]:),(Col[4]:),(Col[5]:),(Col[6]:),(Col[7]:),(Col[8]:),(Col[9]:),(Col[10]:)]]
-02)--ParquetExec: file_groups={1 group: [[WORKSPACE_ROOT/parquet-testing/data/alltypes_plain.parquet]]}, projection=[id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col], limit=10, statistics=[Rows=Exact(8), Bytes=Absent, [(Col[0]:),(Col[1]:),(Col[2]:),(Col[3]:),(Col[4]:),(Col[5]:),(Col[6]:),(Col[7]:),(Col[8]:),(Col[9]:),(Col[10]:)]]
+physical_plan ParquetExec: file_groups={1 group: [[WORKSPACE_ROOT/parquet-testing/data/alltypes_plain.parquet]]}, projection=[id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col], limit=10, statistics=[Rows=Exact(8), Bytes=Absent, [(Col[0]:),(Col[1]:),(Col[2]:),(Col[3]:),(Col[4]:),(Col[5]:),(Col[6]:),(Col[7]:),(Col[8]:),(Col[9]:),(Col[10]:)]]


I agree this is a better plan 👍 as it has only a single partition the global limit is unecessary

alamb · 2024-07-26T21:27:21Z

datafusion/physical-plan/src/lib.rs

+        false
+    }
+
+    /// Returns a fetching variant of this `ExecutionPlan` node, if it supports


alamb · 2024-07-26T21:30:31Z

datafusion/physical-plan/src/lib.rs

+
+    /// Returns `true` if a limit can be safely pushed down through this
+    /// `ExecutionPlan` node.
+    fn supports_limit_pushdown(&self) -> bool {


I found this name somewhat confusing as it implied to me it was reporting if with_fetch was implemented, where now I see it refers to if it is ok to push a limit through the node

Perhaps we could rename it to something different. Perhaps can_push_limit or preserves_limit ?

I don't feel strongly about this

I am also not sure of the best name. If we find something that's clearly better, maybe we can change it in a follow-on before a public release.

datafusion/physical-plan/src/lib.rs

datafusion/physical-plan/src/coalesce_batches.rs

alamb · 2024-07-26T21:36:46Z

datafusion/sqllogictest/test_files/explain.slt

-01)GlobalLimitExec: skip=0, fetch=10
-02)--CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_test_100_order_by_c1_asc.csv]]}, projection=[c1], output_ordering=[c1@0 ASC NULLS LAST], has_header=true
-
+physical_plan CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/core/tests/data/aggregate_test_100_order_by_c1_asc.csv]]}, projection=[c1], limit=10, output_ordering=[c1@0 ASC NULLS LAST], has_header=true


The GlobalLimitExec is not needed here because the CsvExec already has a limit and there is a single partition ✅

alamb · 2024-07-26T21:39:19Z

datafusion/sqllogictest/test_files/group_by.slt

-04)------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
-05)--------StreamingTableExec: partition_sizes=1, projection=[name, ts], infinite_source=true, output_ordering=[name@0 DESC, ts@1 DESC]
+04)------LocalLimitExec: fetch=5
+05)--------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1


I wonder why wasn't the limit pushed into the RepartitionExec here 🤔

I think we haven't covered RepartitionExec yet, would be a good idea to do it in a follow-on PR.

If I am able to pull the CoalesceBatch into Repartition I think adding support for limit in repartition will become quite easy (as the actual limit code will be reused)

datafusion/core/src/physical_optimizer/limit_pushdown.rs

ozankabak · 2024-07-27T06:33:00Z

I will wait for a little bit longer to merge this in case anyone has more feedback

alamb · 2024-07-26T21:45:50Z

datafusion/core/src/physical_optimizer/limit_pushdown.rs

+/// Merge the limits of the parent and the child. If at least one of them is a
+/// [`GlobalLimitExec`], the result is also a [`GlobalLimitExec`]. Otherwise,
+/// the result is a [`LocalLimitExec`].
+fn merge_limits(


structurally this could be a method on LimitExec too , though this is totally fine too

alamb · 2024-07-27T11:20:27Z

datafusion/sqllogictest/test_files/group_by.slt

-04)------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1
-05)--------StreamingTableExec: partition_sizes=1, projection=[name, ts], infinite_source=true, output_ordering=[name@0 DESC, ts@1 DESC]
+04)------LocalLimitExec: fetch=5
+05)--------RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1


If I am able to pull the CoalesceBatch into Repartition I think adding support for limit in repartition will become quite easy (as the actual limit code will be reused)

alihandroid and others added 30 commits July 17, 2024 14:12

Add LimitPushdown skeleton

d868d60

Transform StreamTableExec into fetching version when skip is 0

36f0ba9

Transform StreamTableExec into fetching version when skip is non-zero

532fe69

Fix non-zero skip test

8dcd86b

Add fetch field to CoalesceBatchesExec

26636f1

Tag ProjectionExec, CoalescePartitionsExec and SortPreservingMergeExe…

145da2d

…c as supporting limit pushdown

Add with_fetch to SortExec

1d0d2c8

Push limit down through supporting ExecutionPlans

d4eab68

Reorder LimitPushdown optimization to before SanityCheckPlan

ff1609f

Refactor LimitPushdown tests

3a8989f

Refactor LimitPushdown tests

10d6436

Add more LimitPushdown tests

26c907a

Add fetch support to CoalesceBatchesExec

1f3299a

Fix tests that were affected

49efeb7

Refactor LimitPushdown push_down_limits

68a315c

Remove unnecessary parameter from coalesce_batches_exec

7960c0a

Format files

5294cd2

Apply clippy fixes

2bb7385

Make CoalesceBatchesExec display consistent

54d8713

Fix slt tests according to LimitPushdown rules

db48495

Resolve linter errors

f57326c

Minor changes

faaa4b5

Merge branch 'apache_main' into alihan_apache_main

12a8d44

Minor changes

c9c0845

Fix GlobalLimitExec sometimes replacing LocalLimitExec

0fb546f

Fix unnecessary LocalLimitExec for ProjectionExec

79b0ba9

Rename GlobalOrLocal into LimitExec

c2984d1

Clarify pushdown recursion

62c1f10

Minor changes

5509bcf

Minor

082d25e

ozankabak and others added 6 commits July 22, 2024 16:02

Update datafusion/core/src/physical_optimizer/limit_pushdown.rs

26a8d3c

Update datafusion/physical-plan/src/lib.rs

d57a3a6

Implement with_fetch() for other source execs

f888826

Merge branch 'tmp' into apache_main

a58d7c2

Minor

a096ea4

Merge all Global/Local-LimitExec combinations in LimitPushdown

0f7aa8d

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jul 25, 2024

alihandroid mentioned this pull request Jul 25, 2024

Add LimitPushdown optimization rule and CoalesceBatchesExec fetch synnada-ai/datafusion-upstream#27

Closed

alihandroid added 2 commits July 25, 2024 18:51

Merge remote-tracking branch 'upstream/main' into apache_main

dd9ad7a

Fix compile errors after merge

f9becb6

Merge remote-tracking branch 'upstream/main' into apache_main

0d01e76

ozankabak reviewed Jul 26, 2024

View reviewed changes

datafusion/core/src/physical_optimizer/limit_pushdown.rs Outdated Show resolved Hide resolved

ozankabak added 2 commits July 26, 2024 13:57

Update datafusion/core/src/physical_optimizer/limit_pushdown.rs

a4a7794

Remove redundant lınes ın docstrıng

Avoid code duplication

c4c7276

github-actions bot added the optimizer Optimizer rules label Jul 26, 2024

ozankabak approved these changes Jul 26, 2024

View reviewed changes

alamb approved these changes Jul 26, 2024

View reviewed changes

Incorporate review feedback

dcd69b6

alamb reviewed Jul 27, 2024

View reviewed changes

ozankabak merged commit 5ad6067 into apache:main Jul 28, 2024

alamb mentioned this pull request Jul 29, 2024

CoalesceBatchesStream poll_next_inner function bug #1879

Closed

mustafasrepo mentioned this pull request Aug 7, 2024

Enforce sorting handle fetchable operators, add option to repartition based on row count estimates #11875

Merged

berkaysynnada mentioned this pull request Aug 15, 2024

feat: optimize CoalesceBatches in limit #11983

Closed

acking-you mentioned this pull request Aug 15, 2024

The batch_size selection for CoalesceBatches doesn't account for cases with a limit #11980

Closed

haohuaijin mentioned this pull request Aug 22, 2024

fix: ser/de fetch in CoalesceBatchesExec #12107

Merged

This was referenced Feb 2, 2025

Limits are not applied correctly #14406

Closed

fix: Limits are not applied correctly #14418

Merged

	query TT
	EXPLAIN SELECT c1, c2, c3 FROM sink_table WHERE c3 > 0 LIMIT 5;
	----
	logical_plan
	01)Limit: skip=0, fetch=5
	02)--Filter: sink_table.c3 > Int16(0)
	03)----TableScan: sink_table projection=[c1, c2, c3]
	physical_plan
	01)GlobalLimitExec: skip=0, fetch=5
	02)--CoalescePartitionsExec
	03)----CoalesceBatchesExec: target_batch_size=8192
	04)------FilterExec: c3@2 > 0
	05)--------RepartitionExec: partitioning=RoundRobinBatch(3), input_partitions=1
	06)----------StreamingTableExec: partition_sizes=1, projection=[c1, c2, c3], infinite_source=true

Add LimitPushdown optimization rule and CoalesceBatchesExec fetch #11652

Add LimitPushdown optimization rule and CoalesceBatchesExec fetch #11652

Uh oh!

Conversation

alihandroid commented Jul 25, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Jul 25, 2024

Uh oh!

Uh oh!

ozankabak left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ozankabak commented Jul 27, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants