Skip to content

Override tokenize.ensure-deterministic config flag#10913

Merged
crusaderky merged 3 commits intodask:mainfrom
crusaderky:override_ensure_deterministic
Feb 15, 2024
Merged

Override tokenize.ensure-deterministic config flag#10913
crusaderky merged 3 commits intodask:mainfrom
crusaderky:override_ensure_deterministic

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 9, 2024

  • Add optional parameter to tokenize(ensure_deterministic=True) to force raising exception
  • Can now run test_tokenize.py with either value for the config variable
  • Subclass RuntimeError to restrict exception catching (we're calling pickle internally, which is very complex)
  • Better tests for the _seen variable

@crusaderky crusaderky force-pushed the override_ensure_deterministic branch from 7ce25e9 to aadcd40 Compare February 9, 2024 11:47
@crusaderky crusaderky force-pushed the override_ensure_deterministic branch from aadcd40 to a176d83 Compare February 9, 2024 12:11
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 9, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ± 0       15 suites  ±0   3h 23m 44s ⏱️ + 6m 3s
 13 032 tests + 1   12 101 ✅ + 1     931 💤 ±0  0 ❌ ±0 
161 167 runs  +15  144 634 ✅ +20  16 533 💤  - 5  0 ❌ ±0 

Results for commit 10cdfce. ± Comparison against base commit 328d074.

♻️ This comment has been updated with latest results.

Comment on lines +1043 to +1055
# tokenize.ensure-deterministic flag, potentially overridden by tokenize()
_ensure_deterministic: ContextVar[bool] = ContextVar("_ensure_deterministic")

# Circular reference breaker used by _normalize_seq_func.
# This variable is recreated anew every time you call tokenize(). Note that this means
# that you could call tokenize() from inside tokenize() and they would be fully
# independent.
#
# It is a map of {id(obj): (<first seen incremental int>, obj)} which causes an object
# to be tokenized as ("__seen", <incremental>) the second time it's encountered while
# traversing collections. A strong reference to the object is stored in the context to
# prevent ids from being reused by different objects.
_seen: ContextVar[dict[int, tuple[int, object]]] = ContextVar("_seen")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complex. Why isn't a single ContextVar sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only noticed now that this was already introduced in #10876

Comment on lines +291 to +293
# Test ctx variable cleanup
with pytest.raises(LookupError):
_ensure_deterministic.get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is about ufuncs. Asserting on this ctxvar feels odd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test may fail if I were to tamper with the ctxvar without using the helper function _maybe_raise_nondeterministic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it would be useful as a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked it.

@crusaderky crusaderky force-pushed the override_ensure_deterministic branch from a176d83 to 2bef3a3 Compare February 13, 2024 16:48
@hendrikmakait hendrikmakait self-requested a review February 14, 2024 16:34
Comment on lines +291 to +293
# Test ctx variable cleanup
with pytest.raises(LookupError):
_ensure_deterministic.get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it would be useful as a comment.

Comment on lines +285 to +287
assert tokenize(inc, ensure_deterministic=False) != tokenize(
inc, ensure_deterministic=False
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be nested within the block above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. These 3 tests use the same pattern:

with dask.config.set({"tokenize.ensure-deterministic": False}):
    tokenize(...)
with dask.config.set({"tokenize.ensure-deterministic": True}):
    tokenize(...)
tokenize(..., ensure_determinsitic=False)
tokenize(..., ensure_deterministic=True)

This makes them impervious to whatever the config is set to globally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I attributed the inline comment only to this test case, not the one below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wasn't sure if this should test overriding "whatever" the config is set to globally or overriding any possible config value.

@crusaderky
Copy link
Collaborator Author

@hendrikmakait @fjetter all comments have been addressed

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @crusaderky! LGTM assuming CI is green

@crusaderky crusaderky merged commit 76b255e into dask:main Feb 15, 2024
@crusaderky crusaderky deleted the override_ensure_deterministic branch February 15, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants