Skip to content

Add new "optimization.tune.active" configuration option to disable partition fusion#12194

Merged
jacobtomlinson merged 8 commits intodask:mainfrom
rjzamora:freeze-partitions
Dec 12, 2025
Merged

Add new "optimization.tune.active" configuration option to disable partition fusion#12194
jacobtomlinson merged 8 commits intodask:mainfrom
rjzamora:freeze-partitions

Conversation

@rjzamora
Copy link
Copy Markdown
Member

@rjzamora rjzamora commented Dec 11, 2025

Motivated by many offline discussions, and issues like #12193

Dask-Dataframe will currently apply a "tune" optimization that automatically fuses partitions generated by read_parquet or from_map whenever a column-projection optimization is applied. This optimization is not always desired.

This PR proposes that we add a simple "optimization.tune.active" configuration option (following the same pattern as "optimization.fuse.active") where users can disable this behavior entirely. E.g.:

import dask
import dask.dataframe as dd
import pandas as pd
import numpy as np

# Generate test parquet data
n_partitions = 10
data = {
    'x': np.random.randn(1000),
    'y': np.random.randn(1000),
}

df = pd.DataFrame(data)
ddf = dd.from_pandas(df, npartitions=n_partitions)
ddf.to_parquet('test.parquet')

# Read back the data
test = dd.read_parquet("test.parquet")

# Partition structure without column projection
print(test.map_partitions(len).compute())

# Partition structure WITH column projection
print(test['x'].map_partitions(len).compute())

# Partition structure WITH column projection, "tune" optimization disabled
dask.config.set({"optimization.tune.active": False})
print(test['x'].map_partitions(len).compute())

Output:

0    100
0    100
0    100
0    100
0    100
0    100
0    100
0    100
0    100
0    100
dtype: int64
0    200
0    200
0    200
0    200
0    200
dtype: int64
0    100
0    100
0    100
0    100
0    100
0    100
0    100
0    100
0    100
0    100
dtype: int64

@rjzamora rjzamora self-assigned this Dec 11, 2025
@rjzamora rjzamora added enhancement Improve existing functionality or make things work better dask-expr labels Dec 11, 2025
@rjzamora
Copy link
Copy Markdown
Member Author

@TomAugspurger - Interested in your thoughts on this when you have time :)

@github-actions
Copy link
Copy Markdown
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      9 files  +     2        9 suites  +2   3h 13m 12s ⏱️ + 41m 49s
 18 161 tests +     4   16 946 ✅ +     7   1 215 💤  -     3  0 ❌ ±0 
162 586 runs  +36 159  150 578 ✅ +33 518  12 008 💤 +2 641  0 ❌ ±0 

Results for commit 22792ec. ± Comparison against base commit 2497ebe.

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjzamora. And thanks @TomAugspurger for the review

@jacobtomlinson jacobtomlinson merged commit ab7a185 into dask:main Dec 12, 2025
22 of 24 checks passed
@rjzamora rjzamora deleted the freeze-partitions branch December 12, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dask-expr enhancement Improve existing functionality or make things work better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants