Ensure HLGs are using dependencies properly in optimization #11859

fjetter · 2025-04-02T10:20:31Z

There was a bug in the earlier code that would not actually optimize HLGs jointly under some circumstances.

The fix I am proposing here is to perform the HLG merging during the Expression optimization step. From all I can tell this still does not materialize the layers sooner than they did before (there are still cases, e.g. if materialization is triggered to compute __dask_keys__ but we've done this before already)

Further, the FinalizeComputeHLG class no longer assembles the low level graph but instead the HighlevelGraph assembly is a separate step and this step is overriden in the finalizer. The way, the sequence can properly bubble down the finalize without compromising joint optimization

Actually closes #8380

cc @djhoese

Copilot

Pull Request Overview

This pull request addresses a bug in the joint optimization of HLGs by modifying how dependencies are handled during the expression optimization step and refactoring the finalization of compute graphs. Key changes include:

Adding and updating tests in dask/tests/test_expr.py and dask/array/tests/test_array_core.py to validate joint optimization and materialization behaviors.
Revising HLGExpr in dask/_expr.py to use a new property (hlg) and updating dask_graph and finalize_compute methods.
Refactoring the HLGFinalizeCompute class to subclass HLGExpr and adjust the final graph assembly process.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
dask/tests/test_expr.py	Added tests for HLGExpr finalization and joint optimization of HLGs.
dask/array/tests/test_array_core.py	Introduced a new test to verify shared task optimization with stored sources.
dask/_expr.py	Updated methods in HLGExpr and refactored HLGFinalizeCompute for improved dependency handling.

Comments suppressed due to low confidence (1)

dask/_expr.py:1232

[nitpick] In HLGFinalizeCompute, the property 'hlg' retrieves the operand using 'expr = self.operand("dsk")' and then accesses 'expr.dsk'. For consistency with the rest of HLGExpr, consider using 'expr.hlg' instead of 'expr.dsk' throughout this property.

def hlg(self):

dask/_expr.py

github-actions · 2025-04-02T11:05:24Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

9 files ± 0 9 suites ±0 3h 34m 6s ⏱️ + 5m 26s
18 004 tests + 3 16 790 ✅ + 3 1 214 💤 ±0 0 ❌ ±0
161 110 runs +27 149 003 ✅ +27 12 107 💤 ±0 0 ❌ ±0

Results for commit 80fb427. ± Comparison against base commit d9e07f5.

♻️ This comment has been updated with latest results.

Copilot

Pull Request Overview

This PR addresses a bug in the HLG joint optimization by merging HLGs during the Expression optimization step and separating out the HighLevelGraph assembly in the finalizer. It also adds tests to ensure correct handling of expression sequences and optimizations.

Added new functions and tests in test_expr.py to validate HLGExpr sequence finalization and nested keys.
Modified optimize_double in test_base.py to avoid altering keys starting with "finalize".
Introduced a new store test in test_array_core.py to verify shared task optimization.
Refactored HLGExpr, _HLGExprSequence, and HLGFinalizeCompute in _expr.py to improve and separate finalization and optimization logic.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
dask/tests/test_expr.py	New tests and helper functions for HLGExpr sequence optimization and key finalization.
dask/tests/test_base.py	Adjusted optimize_double to skip optimization for keys starting with "finalize".
dask/array/tests/test_array_core.py	Added a test to confirm correct task sharing post store and optimization.
dask/_expr.py	Changes in HLGExpr, _HLGExprSequence, and HLGFinalizeCompute to adjust optimization flow.

Comments suppressed due to low confidence (3)

dask/tests/test_base.py:822

Ensure key_split is imported or defined within this module; otherwise, optimize_double may fail at runtime.

return { k: (mul, 2, v) if not key_split(k).startswith("finalize") else v for k, v in dsk.items() }

dask/_expr.py:1247

Returning self._layer() in dask_graph of HLGFinalizeCompute may be unclear if _layer is not explicitly defined; consider defining a dedicated _layer method to clearly separate finalized graph data.

return self._layer()

dask/_expr.py:1116

The assertion 'assert all_keys is not None' is redundant since all_keys is initialized as a list; consider removing it for cleaner code.

assert all_keys is not None

dask/_expr.py

hendrikmakait · 2025-04-04T08:16:27Z

dask/_expr.py

+                all_keys.extend(op.__dask_keys__())
+            else:
+                all_keys.append(op.__dask_keys__())
+        assert all_keys is not None


all_keys is a list. Did you want to test non-emptiness?

that was old code. The bot already told me this is stupid. I just didn't push the commit yet

hendrikmakait

Overall LGTM. No need to block on the assertion.

pfackeldey · 2025-04-24T19:55:06Z

Our CI tests hang indefinitely over in https://github.com/dask-contrib/dask-awkward and https://github.com/scikit-hep/uproot5 since this PR (found this out using git bisect). This issue seems to be only present for running tests with pytest, there's no problem (as far as we can tell) with just running dask code otherwise. (If I understand it correctly, it seems to be stuck forever in queue.get() for the get_sync scheduler, but I'm not very sure about this.)

Do you have an idea what could cause this @fjetter & @hendrikmakait?
Thanks for your help in advance 🙏

cc @martindurant & @lgray

Ensure HLGs are using dependencies properly in optimization

5c48c38

fjetter requested a review from Copilot April 2, 2025 10:20

Copilot AI reviewed Apr 2, 2025

View reviewed changes

dask/_expr.py Show resolved Hide resolved

use hlg in annotations as well

35f1bb4

fjetter added 3 commits April 2, 2025 13:30

fixes

238d641

Fix __dask_keys__ for nested keys

a275cdd

Ensure finalize HLG layers are not duplicating work

f51776f

fjetter mentioned this pull request Apr 2, 2025

Ensure default dask scheduler only compute what's needed #11861

Merged

use a subclass to differentiate between extend and append

80fb427

fjetter requested a review from Copilot April 2, 2025 14:50

Copilot AI reviewed Apr 2, 2025

View reviewed changes

remove obsolete assert

df99701

fjetter mentioned this pull request Apr 3, 2025

Ensure annotations for HLG objects are properly generated #11866

Merged

hendrikmakait self-requested a review April 4, 2025 08:02

hendrikmakait reviewed Apr 4, 2025

View reviewed changes

hendrikmakait approved these changes Apr 4, 2025

View reviewed changes

typo in comment

5e2290e

fjetter merged commit 01202ab into dask:main Apr 4, 2025
4 of 5 checks passed

fjetter deleted the hlg_expr_refactor branch April 4, 2025 09:23

djhoese mentioned this pull request Apr 8, 2025

Remove source and target optimization in array.store #9732

Closed

3 tasks

pfackeldey mentioned this pull request Apr 24, 2025

fix: avoid dask v2025.4.0 for CI tests dask-contrib/dask-awkward#582

Merged

henryiii mentioned this pull request Apr 25, 2025

chore: move to using dependency-groups scikit-hep/uproot5#1427

Merged

GaetanLepage mentioned this pull request Apr 25, 2025

python3Packages.dask: 2025.7.0 -> 2025.9.0, python3Packages.distributed: 2025.7.0 -> 2025.9.0 NixOS/nixpkgs#401045

Closed

13 tasks

pfackeldey mentioned this pull request Apr 30, 2025

CI tests using dask v2025.4.0 (or newer) hang indefinitely #11927

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ensure HLGs are using dependencies properly in optimization #11859

Ensure HLGs are using dependencies properly in optimization #11859

Uh oh!

fjetter commented Apr 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

hendrikmakait Apr 4, 2025

Uh oh!

fjetter Apr 4, 2025

Uh oh!

hendrikmakait left a comment

Uh oh!

Uh oh!

pfackeldey commented Apr 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Ensure HLGs are using dependencies properly in optimization #11859

Ensure HLGs are using dependencies properly in optimization #11859

Uh oh!

Conversation

fjetter commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

hendrikmakait Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

fjetter Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

hendrikmakait left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pfackeldey commented Apr 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fjetter commented Apr 2, 2025 •

edited

Loading

github-actions bot commented Apr 2, 2025 •

edited

Loading