Skip to content

[tune](deps): Bump tokenizers from 0.8.1.rc2 to 0.10.1 in /python/requirements#8

Closed
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/python/requirements/tokenizers-0.10.1
Closed

[tune](deps): Bump tokenizers from 0.8.1.rc2 to 0.10.1 in /python/requirements#8
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/python/requirements/tokenizers-0.10.1

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot bot commented on behalf of github Feb 13, 2021

Bumps tokenizers from 0.8.1.rc2 to 0.10.1.

Release notes

Sourced from tokenizers's releases.

Rust v0.10.1

Fixed

  • #226: Fix the word indexes when there are special tokens

Python v0.10.1

Fixed

  • #616: Fix SentencePiece tokenizers conversion
  • #617: Fix offsets produced by Precompiled Normalizer (used by tokenizers converted from SPM)
  • #618: Fix Normalizer.normalize with PyNormalizedStringRefMut
  • #620: Fix serialization/deserialization for overlapping models
  • #621: Fix ByteLevel instantiation from a previously saved state (using __getstate__())

Rust v0.10.0

Changed

  • #222: All Tokenizer's subparts must now be Send + Sync

Added

  • #208: Ability to retrieve the vocabulary from the Tokenizer & Model

Fixed

  • #205: Trim the decoded string in BPEDecoder
  • [b770f36]: Fix a bug with added tokens generated IDs

Python v0.10.0

Added

  • #508: Add a Visualizer for notebooks to help understand how the tokenizers work
  • #519: Add a WordLevelTrainer used to train a WordLevel model
  • #533: Add support for conda builds
  • #542: Add Split pre-tokenizer to easily split using a pattern
  • #544: Ability to train from memory. This also improves the integration with datasets
  • #590: Add getters/setters for components on BaseTokenizer
  • #574: Add fust_unk option to SentencePieceBPETokenizer

Changed

  • #509: Automatically stubbing the .pyi files
  • #519: Each Model can return its associated Trainer with get_trainer()
  • #530: The various attributes on each component can be get/set (ie. tokenizer.model.dropout = 0.1)
  • #538: The API Reference has been improved and is now up-to-date.

Fixed

  • #519: During training, the Model is now trained in-place. This fixes several bugs that were forcing to reload the Model after a training.
  • #539: Fix BaseTokenizer enable_truncation docstring

Python v0.10.0rc1

Added

  • #508: Add a Visualizer for notebooks to help understand how the tokenizers work
  • #519: Add a WordLevelTrainer used to train a WordLevel model
  • #533: Add support for conda builds

... (truncated)

Commits
  • af66d6f Rust - Bump to 0.10.1 for release
  • f9c76b6 Python - Use PyO3 0.9.2 (#227)
  • a6c33f5 Python - update some dependencies
  • d6326a6 Python - Use PyO3 0.9.2
  • bd18df0 Word indexes are None for special tokens (#226)
  • 3ad1360 Word indices are None for special tokens
  • e7949fc Python - Fix build for windows 32-bit (#224)
  • 1b9ead7 Python - Try PyO3 master to fix build
  • 33681fa Python - Check it builds for windows 32
  • b8daeae Python - Force PyO3 to 0.9.0 for now
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

Bumps [tokenizers](https://github.com/huggingface/tokenizers) from 0.8.1.rc2 to 0.10.1.
- [Release notes](https://github.com/huggingface/tokenizers/releases)
- [Commits](huggingface/tokenizers@python-v0.8.1.rc2...rust-v0.10.1)

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Feb 13, 2021
@dependabot @github
Copy link
Copy Markdown
Author

dependabot bot commented on behalf of github Apr 10, 2021

Superseded by #11.

@dependabot dependabot bot closed this Apr 10, 2021
@dependabot dependabot bot deleted the dependabot/pip/python/requirements/tokenizers-0.10.1 branch April 10, 2021 07:06
architkulkarni pushed a commit that referenced this pull request Jul 27, 2022
We encountered SIGSEGV when running Python test `python/ray/tests/test_failure_2.py::test_list_named_actors_timeout`. The stack is:

```
#0  0x00007fffed30f393 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
   from /lib64/libstdc++.so.6
#1  0x00007fffee707649 in ray::RayLog::GetLoggerName() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#2  0x00007fffee70aa90 in ray::SpdLogMessage::Flush() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#3  0x00007fffee70af28 in ray::RayLog::~RayLog() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#4  0x00007fffee2b570d in ray::asio::testing::(anonymous namespace)::DelayManager::Init() [clone .constprop.0] ()
   from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#5  0x00007fffedd0d95a in _GLOBAL__sub_I_asio_chaos.cc () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#6  0x00007ffff7fe282a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7fe2931 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7fe674c in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7b82e79 in _dl_catch_exception () from /lib64/libc.so.6
#10 0x00007ffff7fe5ffe in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff7d5f39c in dlopen_doit () from /lib64/libdl.so.2
#12 0x00007ffff7b82e79 in _dl_catch_exception () from /lib64/libc.so.6
#13 0x00007ffff7b82f13 in _dl_catch_error () from /lib64/libc.so.6
#14 0x00007ffff7d5fb09 in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff7d5f42a in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#16 0x00007fffef04d330 in py_dl_open (self=<optimized out>, args=<optimized out>)
    at /tmp/python-build.20220507135524.257789/Python-3.7.11/Modules/_ctypes/callproc.c:1369
```

The root cause is that when loading `_raylet.so`, `static DelayManager _delay_manager` is initialized and `RAY_LOG(ERROR) << "RAY_testing_asio_delay_us is set to " << delay_env;` is executed. However, the static variables declared in `logging.cc` are not initialized yet (in this case, `std::string RayLog::logger_name_ = "ray_log_sink"`).

It's better not to rely on the initialization order of static variables in different compilation units because it's not guaranteed. I propose to change all `RAY_LOG`s to `std::cerr` in `DelayManager::Init()`.

The crash happens in Ant's internal codebase. Not sure why this test case passes in the community version though.

BTW, I've tried different approaches:

1. Using a static local variable in `get_delay_us` and remove the global variable. This doesn't work because `init()` needs to access the variable as well.
2. Defining the global variable as type `std::unique_ptr<DelayManager>` and initialize it in `get_delay_us`. This works but it requires a lock to be thread-safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants