-
Notifications
You must be signed in to change notification settings - Fork 69
Description
Current Limitation
While working on huggingface/tokenizers#1784, I noticed that hf-xet is getting built from source on 3.13t. If there were wheels for 3.13t that step would be a lot faster and wouldn't require a rust toolchain.
Feature Description
I see two uses of unsafe impl Send and unsafe impl Sync - no idea if those might be exposed to multiple python threads:
merklehash/src/data_hash.rs
127:unsafe impl heed::bytemuck::Zeroable for DataHash {
134:unsafe impl heed::bytemuck::Pod for DataHash {}
merkledb/src/async_chunk_iterator.rs
206:unsafe impl Send for HasherPointerBox<'_> {}
207:unsafe impl Sync for HasherPointerBox<'_> {}
Other than that, you should be able to rely on the guarantees provided by Rust and PyO3 for thread safety. It looks like hf_xet uses a thread pool internally but there aren't any checks for multithreaded use via the Python API. I'm not sure to what extent it makes sense to use objects exposed by the library in multiple threads. It's totally valid to say that it's not supported and still ship free-threaded binaries to support other use-cases that are supported.
Additional Context
See https://py-free-threading.github.io for more detail about supporting and using free-threaded Python.