Skip to content

Remove redundant normalize_token variants#10884

Merged
phofl merged 1 commit intodask:mainfrom
crusaderky:simpler_tokenize
Feb 14, 2024
Merged

Remove redundant normalize_token variants#10884
phofl merged 1 commit intodask:mainfrom
crusaderky:simpler_tokenize

Conversation

@crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 1, 2024

Now that we use pickle to tokenize unknown objects, we can remove a lot of special cases.

Note 1: performance for numpy tokenization is ensured by using pickle5 buffers in _normalize_pickle.
Note 2: I tried removing all special-case handling for pandas, but it broke gpuci. I did not spend time to investigate. Probably a worthy exercise to do at some point later.

@crusaderky crusaderky force-pushed the simpler_tokenize branch 2 times, most recently from 9febbaf to 4dff916 Compare February 1, 2024 18:30
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ±0       15 suites  ±0   3h 24m 5s ⏱️ + 7m 47s
 13 021 tests ±0   12 090 ✅  -  1     931 💤 + 1  0 ❌ ±0 
161 002 runs  ±0  144 495 ✅  - 17  16 507 💤 +17  0 ❌ ±0 

Results for commit d2208b5. ± Comparison against base commit 07099e5.

This pull request removes 2 and adds 2 tests. Note that renamed tests count towards both.
dask.tests.test_tokenize ‑ test_tokenize_local_classes_from_different_contexts
dask.tests.test_tokenize ‑ test_tokenize_local_instances_from_different_contexts
dask.tests.test_tokenize ‑ test_tokenize_local_classes_from_different_contexts[False]
dask.tests.test_tokenize ‑ test_tokenize_local_classes_from_different_contexts[True]
This pull request removes 1 skipped test and adds 2 skipped tests. Note that renamed tests count towards both.
dask.tests.test_tokenize ‑ test_tokenize_local_instances_from_different_contexts
dask.tests.test_tokenize ‑ test_tokenize_local_classes_from_different_contexts[False]
dask.tests.test_tokenize ‑ test_tokenize_local_classes_from_different_contexts[True]

♻️ This comment has been updated with latest results.

@crusaderky crusaderky force-pushed the simpler_tokenize branch 6 times, most recently from b0a9ef5 to 9df700d Compare February 5, 2024 15:31
@crusaderky crusaderky changed the title [DNM] Yank out most special handlers for tokenize [DNM] Remove redundant normalize_token variants Feb 5, 2024
@crusaderky crusaderky force-pushed the simpler_tokenize branch 13 times, most recently from c70308c to 0cad4d1 Compare February 6, 2024 10:36
@crusaderky crusaderky changed the title [DNM] Remove redundant normalize_token variants Remove redundant normalize_token variants Feb 6, 2024
@crusaderky crusaderky force-pushed the simpler_tokenize branch 2 times, most recently from 0cd8fa7 to 04fca10 Compare February 9, 2024 12:11
crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 9, 2024
@crusaderky crusaderky self-assigned this Feb 9, 2024
@crusaderky crusaderky marked this pull request as ready for review February 9, 2024 15:36
@phofl
Copy link
Collaborator

phofl commented Feb 14, 2024

Just double checking: Should we run performance tests for this?

@crusaderky
Copy link
Collaborator Author

Just double checking: Should we run performance tests for this?

Already did. No noticeable regression in the end-to-end coiled/benchmarks, and 50~150ms slowdown overall in the TCPH optimizer runtime (most of it caused by #10883, I expect).

@phofl phofl merged commit 9a1d4f1 into dask:main Feb 14, 2024
@phofl
Copy link
Collaborator

phofl commented Feb 14, 2024

thx

@crusaderky crusaderky deleted the simpler_tokenize branch February 14, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants