Code reorganization towards release of xet cargo package#693
Conversation
rajatarya
left a comment
There was a problem hiding this comment.
Review: Code reorganization towards release of xet cargo package
This is a well-structured consolidation of ~20 workspace crates into 5 publishable packages with a clean layered dependency graph (runtime → core_structures → client → data → xet). The api_changes/update_260309_package_restructure.md migration guide is thorough. Error type hierarchy is well-designed with XetError providing meaningful user-facing categories that map cleanly to Python exception types. Backward-compatibility aliases at old module paths is the right approach for staged migration.
CI workflows were updated correctly in ci.yml, git-xet-release.yml, and hf-xet-tests.yml.
Critical
1. release.yml: All 5 maturin working-directory references still point to hf_xet instead of bindings/hf_xet
hf_xet/ no longer exists at the repo root — it was moved to bindings/hf_xet/. This will break every release build (linux, musllinux, windows, macos, sdist). The sed/cp/pushd paths in the same file were updated correctly — only working-directory was missed. See inline comment for locations.
Non-blocking observations (for follow-up)
2. Wrong repository URL in libxet/Cargo.toml — set to https://github.com/xetdata/xet-core but actual repo is https://github.com/huggingface/xet-core.
3. Missing crates.io publish metadata — Only libxet/Cargo.toml has description, license, repository. The other 4 packages are missing these (cargo publish requires license and description). Fine if publishing is a later step.
4. mockall as a non-dev dependency in client/Cargo.toml — pulls a proc-macro framework into every consumer's build. Should likely be [dev-dependencies] unless mock types are part of the public API for downstream test code.
5. tempfile as a non-dev dependency in core_structures/Cargo.toml — worth auditing whether it's used in library code or only tests/benchmarks.
6. Inconsistent type paths in client/src/error.rs:129-130 — From<SingleflightError> impl uses utils::errors::SingleflightError in the trait but utils::singleflight::SingleflightError in the parameter. Compiles today via re-export but fragile and confusing to readers.
|
Not sure any way to avoid this, but by moving Maybe we can use git symlinks to keep them alive for one release and mark them deprecated? Not sure it is worth it since they will likely not notice until it breaks. |
89cda78 to
ea34cdd
Compare
We can move them back for now... That's less critical. |
|
IIUC to publish |
If possible let's also keep git_xet under the repo root for now. There are a lot of hub docs linking to here and a moon-landing proxy rule leading to this location. |
The packages published are hf-xet, xet-runtime, xet-core-structures, xet-client, and xet-data. |
f7a13a9 to
7bf45f9
Compare
7bf45f9 to
4e88dd1
Compare
This PR is a massive rearrangement of the code base into 5 packages intended for release on cargo. The directories and corresponding packages are:
In addition, the other tools are:
The full description — and information for an AI agent to use to update downstream dependencies — is at api_changes/update_260309_package_restructure.md.
Summary of moves:
xet_runtime: became xet_runtime::core inside xet_runtime/.
utils: became xet_runtime::utils inside xet_runtime/.
xet_config: became xet_runtime::config inside xet_runtime/.
xet_logging: became xet_runtime::logging inside xet_runtime/.
error_printer: became xet_runtime::error_printer inside xet_runtime/.
file_utils: became xet_runtime::file_utils inside xet_runtime/.
merklehash: became xet_core_structures::merklehash inside xet_core_structures/.
mdb_shard: became xet_core_structures::metadata_shard inside xet_core_structures/.
xorb_object: became xet_core_structures::xorb_object inside xet_core_structures/.
cas_client: became xet_client::cas_client inside xet_client/.
hub_client: became xet_client::hub_client inside xet_client/.
cas_types: became xet_client::cas_types inside xet_client/.
chunk_cache: became xet_client::chunk_cache inside xet_client/.
data: became xet_data::processing inside xet_data/.
deduplication: became xet_data::deduplication inside xet_data/.
file_reconstruction: became xet_data::file_reconstruction inside xet_data/.
progress_tracking: became xet_data::progress_tracking inside xet_data/.
xet_session: became xet::xet_session inside xet_pkg/.
Wasm packages (hf_xet_wasm, hf_xet_thin_wasm): moved from top-level into wasm/; internal imports updated, public APIs unchanged.