Skip to content

Keep old dependencies on run_spec collision#8512

Merged
crusaderky merged 2 commits intodask:mainfrom
crusaderky:keep_old_runspec
Feb 19, 2024
Merged

Keep old dependencies on run_spec collision#8512
crusaderky merged 2 commits intodask:mainfrom
crusaderky:keep_old_runspec

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 16, 2024

While developing #8185, @fjetter and I discussed the option to simply retain the old run_spec in case of key collision. During that conversation, the prevailing opinion was that a hard crash was preferable.
Since then, we realised that a hard crash cannot be implemented as it would break many other use cases impacted by dask/dask#9888, where the run_spec changes and the dependencies are either the same or become a subset of the previous ones. The use cases where dask/dask#9888 causes the cluster crash is when there are added dependencies. So we fell back to just a warning in the scheduler log.

This one-liner PR ensures that, in case of run_spec collision, both run_spec and dependencies remain the same as in the old version (while still printing a warning). This should effectively sweep dask/dask#9888 under the carpet, i.e. whenever two task with different run_spec and possibly different dependencies produce the same output, and just leave the problem of manually crafted keys.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  ± 0      27 suites  ±0   10h 10m 0s ⏱️ - 10m 1s
 3 994 tests + 7   3 881 ✅ + 6    109 💤 ±0  4 ❌ +1 
50 230 runs  +91  47 936 ✅ +91  2 290 💤 ±0  4 ❌ ±0 

For more details on these failures, see this check.

Results for commit cd84237. ± Comparison against base commit 207bbad.

This pull request removes 3 and adds 10 tests. Note that renamed tests count towards both.
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key[False]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key[True]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_warns_only_once[10-3]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_after_previous_is_done[False-less]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_after_previous_is_done[False-more]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_after_previous_is_done[False-same]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_after_previous_is_done[True-less]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_after_previous_is_done[True-more]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_after_previous_is_done[True-same]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_before_previous_is_done[less]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_before_previous_is_done[more]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_before_previous_is_done[same]
distributed.tests.test_scheduler ‑ test_resubmit_different_task_same_key_warns_only_once[10-2]

♻️ This comment has been updated with latest results.

@crusaderky crusaderky self-assigned this Feb 19, 2024
@crusaderky crusaderky changed the title [DNM] Keep old dependencies on run_spec collision Keep old dependencies on run_spec collision Feb 19, 2024
@crusaderky crusaderky marked this pull request as ready for review February 19, 2024 09:57
@crusaderky crusaderky requested a review from fjetter as a code owner February 19, 2024 09:57
@fjetter
Copy link
Member

fjetter commented Feb 19, 2024

I'm a little concerned about the case where the tasks are not producing the same output. With this change there is a (very unlikely) chance to produce wrong results, isn't there?

@crusaderky
Copy link
Collaborator Author

crusaderky commented Feb 19, 2024

I'm a little concerned about the case where the tasks are not producing the same output. With this change there is a (very unlikely) chance to produce wrong results, isn't there?

More or less like it was before.

Legend

✔️ produces correct output
📛 cluster crashes or hangs on AssertionError
😕 task completes successfully, but output is wrong
⚠️ emits warning in the scheduler log

Task output Dependencies Old status 2024.2.0 #8185 #8512 Use case of dask/dask#9888?
same same pending ✔️ ⚠️ ✔️ ⚠️ ✔️ maybe?
same new has fewer pending ✔️ ⚠️ ✔️ ⚠️ ✔️ yes
same new has more pending 📛 ⚠️ 📛 ⚠️ ✔️ yes
same same memory ✔️ ✔️ ✔️ maybe?
same new has fewer memory ✔️ ⚠️ ✔️ ⚠️ ✔️ yes
same new has more memory ✔️ ✔️ ✔️ yes
same * released ✔️ ⚠️ ✔️ ⚠️ ✔️ yes
differs same pending 😕 ⚠️ 😕 ⚠️ 😕 no
differs new has fewer pending 😕 ⚠️ 😕 ⚠️ 😕 no
differs new has more pending 📛 ⚠️ 📛 ⚠️ 😕 no
differs same memory 😕 😕 😕 no
differs new has fewer memory 😕 ⚠️ 😕 ⚠️ 😕 no
differs new has more memory 😕 😕 😕 no
differs * released 😕 ⚠️ 😕 ⚠️ 😕 no

@crusaderky
Copy link
Collaborator Author

crusaderky commented Feb 19, 2024

Out of scope for this PR: I'm unsure why a task that's already in memory only emits a warning when there are less dependencies than before.

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the lesser evil 🙈

@crusaderky crusaderky merged commit b03efee into dask:main Feb 19, 2024
@crusaderky crusaderky deleted the keep_old_runspec branch February 19, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tokenization meta-issue

2 participants