Skip to content

Pin RAPIDS 24.04 to dask-2024.1.1 #26

@rjzamora

Description

@rjzamora

The “legacy” Dask Dataframe API is now deprecated (dask>=2024.2.0). Although dask-cudf can (and should) comfortably support the new API in time for the RAPIDS 24.04 release, other down-stream libraries will not have sufficient time to adjust to this migration.

Suggestion: In order to balance both upstream and downstream maintenance risks, we should pin RAPIDS 24.04 to dask-2024.1.1.

Offline discussion

Link to internal document discussing the plan

Notes on related Dask releases

Discussion on dask-expr integration timeline: dask/dask#10934

Dask 2024.1.1

  • Date: (Completed) January 26th 2024
  • Link: commit b1909c7
  • No blocking issues known
  • Supports optional integration with dask-expr, with partial API coverage

Dask 2024.2.0

Dask 2024.2.1

Dask 2024.3.0

  • Date: (Planned) March 8th 2024
  • The “dataframe.query-planning” config default will change to “True” if certain requirements are met (See #10934 for more info)
  • RAPIDS libraries (e.g. dask-cudf) will need to reload dask.dataframe to leverage the “legacy” dask.dataframe API
    • This approach may be fragile. The more reliable solution is to support dask-expr

Dask X.X.X

  • Date: (Planned) Unknown
  • The “dataframe.query-planning”: “False” config option will be disabled
  • There is not a strong motivation to make this change soon, because Dask Array and Dask Bag still use much of the same HighLevelGraph infrastructure as the legacy DataFrame API. However, I strongly suspect that this motivation will grow quickly as soon as there is a clear maintenance burden.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions