-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Multiple independent map_overlap operations on same input with different window sizes produce incorrect prev_part sizes #11963
Description
Describe the issue:
Whenever I run multiple map_overlap-type operations (includes also rolling, shift, etc.) on the same input with different window sizes (i.e., different before parameter), I get an error that "Partition size is less than overlapping window size", no matter the actual partitioning.
The error is thrown in the _combined_parts function, but sometimes it is even the case that prev_part.shape[0] is greater than the before parameter, so this is clearly not an issue of partitioning.
Minimal Complete Verifiable Example:
The following should give you the error. You should even see a case where prev_part.shape[0] > before. If you don't see that, play with how many of the parallel operations you run. Sometimes I reproduce a case where prev_part.shape[0] < before, sometimes the other.
import dask
import numpy as np
import dask.dataframe as dd
import pandas as pd
from dask.diagnostics import ProgressBar
df = dd.from_pandas(pd.DataFrame({
'close_price': np.random.sample((2_000_000,))
}), chunksize=1_000_000)
# out1 = df['close_price'] - df['close_price'].shift(14)
# out2 = df['close_price'] - df['close_price'].shift(100)
# out3 = df['close_price'] - df['close_price'].shift(1)
out1 = df['close_price'] - df['close_price'].map_overlap(lambda x: x, before=14, after=0)
out3 = df['close_price'] - df['close_price'].map_overlap(lambda x: x, before=1, after=0)
# dask.visualize(out1, out3, filename='dask.svg', engine='ipycytoscape')
with ProgressBar():
df1, df3 = dask.compute(out1, out3, scheduler='single-threaded')Anything else we need to know?:
Environment:
- Dask version: 2025.5.1
- Python version: 3.11.11
- Operating System: Ubuntu 24.04
- Install method (conda, pip, source): poetry