Skip to content

Tokenize SubgraphCallable#10898

Merged
phofl merged 1 commit intodask:mainfrom
crusaderky:tokenize_subgraphcallable
Feb 8, 2024
Merged

Tokenize SubgraphCallable#10898
phofl merged 1 commit intodask:mainfrom
crusaderky:tokenize_subgraphcallable

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 6, 2024

No description provided.

crusaderky added a commit to crusaderky/dask-expr that referenced this pull request Feb 6, 2024
crusaderky added a commit to fjetter/distributed that referenced this pull request Feb 6, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ±0       15 suites  ±0   3h 20m 23s ⏱️ + 1m 7s
 12 987 tests ±0   12 058 ✅ ±0     929 💤 ±0  0 ❌ ±0 
160 492 runs  ±0  143 987 ✅ +1  16 505 💤  - 1  0 ❌ ±0 

Results for commit 400a492. ± Comparison against base commit f51fa77.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky marked this pull request as draft February 6, 2024 15:22
@crusaderky crusaderky changed the title Tokenize SubgraphCallable [DNM] Tokenize SubgraphCallable Feb 6, 2024
@crusaderky crusaderky force-pushed the tokenize_subgraphcallable branch from bb21148 to 400a492 Compare February 6, 2024 18:59
@crusaderky crusaderky changed the title [DNM] Tokenize SubgraphCallable Tokenize SubgraphCallable Feb 6, 2024
assert tokenize(f4) != tokenize(f1)

# Reordering the inputs must not prevent equality
# Reordering the inputs must not prevent equality...
Copy link
Collaborator Author

@crusaderky crusaderky Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bad idea, but I'll leave revisiting it as out of scope

@crusaderky crusaderky self-assigned this Feb 6, 2024
@crusaderky crusaderky marked this pull request as ready for review February 6, 2024 19:38
@phofl phofl merged commit 8e10a14 into dask:main Feb 8, 2024
@phofl
Copy link
Collaborator

phofl commented Feb 8, 2024

thx

@martindurant
Copy link
Member

This change has dramatically degraded performance, to make analyses with coffea (via dask-awkward) untenable, see dask-contrib/dask-awkward#468 .

Since there is no PR description or related issue, can I ask what the intent is, what's going on here? Is it possible to revert?

(to dask-awkward group: we should have nightly dask-dev CI runs)

cc @lgray

@lgray
Copy link
Contributor

lgray commented Feb 11, 2024

@martindurant FWIW this PR is mentioned as part of the overall plan in: #10905

@agoose77
Copy link
Contributor

agoose77 commented Feb 12, 2024

Just to add some context to this --- the change is costly because we treat str as an expensive operation (for reasons).

@martindurant
Copy link
Member

because we treat str as an expensive operation

which, if it isn't clear, is the reason for proposing #10918; str() is the wrong thing to tokenise on for a dask-aware object.

@crusaderky
Copy link
Collaborator Author

This change has dramatically degraded performance, to make analyses with coffea (via dask-awkward) untenable, see dask-contrib/dask-awkward#468 .

Since there is no PR description or related issue, can I ask what the intent is, what's going on here? Is it possible to revert?

(to dask-awkward group: we should have nightly dask-dev CI runs)

The intent is to make the run_spec (the values of the dask graph) deterministically tokenizable in 99% of the cases. As SubGraphCallable is ubiquitously used during optimization, it becomes essential for this object to be deterministically hashable too.

The PR itself can't be reverted fully without throwing #10905 out of the window.

Fixed by either #10919 or adding __dask_tokenize__ on your classes (depending on which route you're taking).
Discussion on dask-contrib/dask-awkward#468

@martindurant
Copy link
Member

Thanks @crusaderky . We will report back when everything is in order. For the moment, coffea has pinned dask<2024.2, so this is important but not urgent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants