Tokenize SubgraphCallable by crusaderky · Pull Request #10898 · dask/dask

crusaderky · 2024-02-06T11:27:28Z

No description provided.

github-actions · 2024-02-06T11:59:22Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

15 files ±0 15 suites ±0 3h 20m 23s ⏱️ + 1m 7s
12 987 tests ±0 12 058 ✅ ±0 929 💤 ±0 0 ❌ ±0
160 492 runs ±0 143 987 ✅ +1 16 505 💤 - 1 0 ❌ ±0

Results for commit 400a492. ± Comparison against base commit f51fa77.

♻️ This comment has been updated with latest results.

crusaderky · 2024-02-06T19:16:37Z

dask/tests/test_optimization.py

+    assert tokenize(f4) != tokenize(f1)

-    # Reordering the inputs must not prevent equality
+    # Reordering the inputs must not prevent equality...


I think this is a bad idea, but I'll leave revisiting it as out of scope

phofl · 2024-02-08T14:28:38Z

thx

martindurant · 2024-02-11T18:45:10Z

This change has dramatically degraded performance, to make analyses with coffea (via dask-awkward) untenable, see dask-contrib/dask-awkward#468 .

Since there is no PR description or related issue, can I ask what the intent is, what's going on here? Is it possible to revert?

(to dask-awkward group: we should have nightly dask-dev CI runs)

cc @lgray

lgray · 2024-02-11T19:21:55Z

@martindurant FWIW this PR is mentioned as part of the overall plan in: #10905

agoose77 · 2024-02-12T11:31:51Z

Just to add some context to this --- the change is costly because we treat str as an expensive operation (for reasons).

martindurant · 2024-02-12T14:18:56Z

because we treat str as an expensive operation

which, if it isn't clear, is the reason for proposing #10918; str() is the wrong thing to tokenise on for a dask-aware object.

crusaderky · 2024-02-12T16:11:36Z

This change has dramatically degraded performance, to make analyses with coffea (via dask-awkward) untenable, see dask-contrib/dask-awkward#468 .

Since there is no PR description or related issue, can I ask what the intent is, what's going on here? Is it possible to revert?

(to dask-awkward group: we should have nightly dask-dev CI runs)

The intent is to make the run_spec (the values of the dask graph) deterministically tokenizable in 99% of the cases. As SubGraphCallable is ubiquitously used during optimization, it becomes essential for this object to be deterministically hashable too.

The PR itself can't be reverted fully without throwing #10905 out of the window.

Fixed by either #10919 or adding __dask_tokenize__ on your classes (depending on which route you're taking).
Discussion on dask-contrib/dask-awkward#468

martindurant · 2024-02-12T16:14:11Z

Thanks @crusaderky . We will report back when everything is in order. For the moment, coffea has pinned dask<2024.2, so this is important but not urgent.

crusaderky added a commit to crusaderky/dask-expr that referenced this pull request Feb 6, 2024

Include dask/dask#10898

29ee94b

crusaderky added a commit to fjetter/distributed that referenced this pull request Feb 6, 2024

Include #dask/dask#10898

9dacff1

crusaderky mentioned this pull request Feb 6, 2024

Warn if tasks are submitted with identical keys but different run_spec dask/distributed#8185

Merged

crusaderky marked this pull request as draft February 6, 2024 15:22

crusaderky changed the title ~~Tokenize SubgraphCallable~~ [DNM] Tokenize SubgraphCallable Feb 6, 2024

Tokenize SubgraphCallable

400a492

crusaderky force-pushed the tokenize_subgraphcallable branch from bb21148 to 400a492 Compare February 6, 2024 18:59

crusaderky changed the title ~~[DNM] Tokenize SubgraphCallable~~ Tokenize SubgraphCallable Feb 6, 2024

crusaderky commented Feb 6, 2024

View reviewed changes

crusaderky self-assigned this Feb 6, 2024

crusaderky marked this pull request as ready for review February 6, 2024 19:38

crusaderky mentioned this pull request Feb 6, 2024

Tokenization meta-issue #10905

Closed

phofl approved these changes Feb 8, 2024

View reviewed changes

phofl merged commit 8e10a14 into dask:main Feb 8, 2024

crusaderky deleted the tokenize_subgraphcallable branch February 8, 2024 14:50

lgray mentioned this pull request Feb 11, 2024

massive touching performance regression when using dask 2024.2.0 dask-contrib/dask-awkward#468

Open

lgray mentioned this pull request Feb 11, 2024

fix: sort objects in dicts/sets by normalize_token rather than str #10918

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tokenize SubgraphCallable#10898

Tokenize SubgraphCallable#10898
phofl merged 1 commit intodask:mainfrom
crusaderky:tokenize_subgraphcallable

crusaderky commented Feb 6, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Feb 6, 2024 •

edited

Loading

Uh oh!

crusaderky Feb 6, 2024 •

edited

Loading

Uh oh!

phofl commented Feb 8, 2024

Uh oh!

martindurant commented Feb 11, 2024

Uh oh!

lgray commented Feb 11, 2024

Uh oh!

agoose77 commented Feb 12, 2024 •

edited

Loading

Uh oh!

martindurant commented Feb 12, 2024

Uh oh!

crusaderky commented Feb 12, 2024

Uh oh!

martindurant commented Feb 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

crusaderky commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

crusaderky Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phofl commented Feb 8, 2024

Uh oh!

martindurant commented Feb 11, 2024

Uh oh!

lgray commented Feb 11, 2024

Uh oh!

agoose77 commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindurant commented Feb 12, 2024

Uh oh!

crusaderky commented Feb 12, 2024

Uh oh!

martindurant commented Feb 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crusaderky commented Feb 6, 2024 •

edited

Loading

github-actions bot commented Feb 6, 2024 •

edited

Loading

crusaderky Feb 6, 2024 •

edited

Loading

agoose77 commented Feb 12, 2024 •

edited

Loading