Skip to content

Multiple independent map_overlap operations on same input with different window sizes produce incorrect prev_part sizes #11963

@neumannjan

Description

@neumannjan

Describe the issue:

Whenever I run multiple map_overlap-type operations (includes also rolling, shift, etc.) on the same input with different window sizes (i.e., different before parameter), I get an error that "Partition size is less than overlapping window size", no matter the actual partitioning.

The error is thrown in the _combined_parts function, but sometimes it is even the case that prev_part.shape[0] is greater than the before parameter, so this is clearly not an issue of partitioning.

Minimal Complete Verifiable Example:

The following should give you the error. You should even see a case where prev_part.shape[0] > before. If you don't see that, play with how many of the parallel operations you run. Sometimes I reproduce a case where prev_part.shape[0] < before, sometimes the other.

import dask
import numpy as np
import dask.dataframe as dd
import pandas as pd
from dask.diagnostics import ProgressBar

df = dd.from_pandas(pd.DataFrame({
    'close_price': np.random.sample((2_000_000,))
}), chunksize=1_000_000)

# out1 = df['close_price'] - df['close_price'].shift(14)
# out2 = df['close_price'] - df['close_price'].shift(100)
# out3 = df['close_price'] - df['close_price'].shift(1)

out1 = df['close_price'] - df['close_price'].map_overlap(lambda x: x, before=14, after=0)
out3 = df['close_price'] - df['close_price'].map_overlap(lambda x: x, before=1, after=0)

# dask.visualize(out1, out3, filename='dask.svg', engine='ipycytoscape')

with ProgressBar():
    df1, df3 = dask.compute(out1, out3, scheduler='single-threaded')

Anything else we need to know?:

Environment:

  • Dask version: 2025.5.1
  • Python version: 3.11.11
  • Operating System: Ubuntu 24.04
  • Install method (conda, pip, source): poetry

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs triageNeeds a response from a contributor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions