Skip to content

Remove our dependency on nanoid#19201

Merged
woodruffw merged 10 commits intomainfrom
ww/rm-nanoid
Apr 29, 2026
Merged

Remove our dependency on nanoid#19201
woodruffw merged 10 commits intomainfrom
ww/rm-nanoid

Conversation

@woodruffw
Copy link
Copy Markdown
Member

Summary

This introduces a new workspace member, uv-fastid. uv-fastid provides IDs using the same alphabet as nanoid, but with a slightly smaller token size (16 instead of 21) and fewer knobs for configuration.

In practice it should also be faster, since our IDs are now trivially copyable + don't require any heap allocations to construct internally. This is probably not a huge deal in our case anyways (ID generation isn't exactly dominating our runtime), but it doesn't hurt to have a fast primitive here.

The two main APIs I've added are insecure() and secure(), whose names refer to the kind of PRNG they use internally. Right now we only use insecure(), since we don't actually need a CSPRNG for the uses of nanoid we're replacing (revision and archive IDs).

Some other notes:

  • Arguably we could remove the secure/insecure distinction entirely, and just keep the CSPRNG path -- with rand we have a purely userspace CSPRNG (seeded from the OS's CSPRNG), so we don't really need fastrand for performance here.
  • I'm guilty of using unsafe in this, although I think the usage is demonstrably safe (as long as crate-private invariants are maintained) 🙂
  • I think there's no breakage risk with this -- FWICT our only usage of nanoid doesn't make any format assumptions, so switching to a slightly shorter ID form here shouldn't break anything. Our previous IDs also aren't part of any public interface AFAICT.

Closes #19176.

Test Plan

Added unit tests to the new uv-fastid crate. Our existing tests should cover this transitively as well.

Signed-off-by: William Woodruff <william@astral.sh>
@woodruffw woodruffw requested review from charliermarsh and zsol April 28, 2026 18:12
@woodruffw woodruffw self-assigned this Apr 28, 2026
@woodruffw woodruffw added the internal A refactor or improvement that is not user-facing label Apr 28, 2026
Comment thread Cargo.toml
Comment thread crates/uv-cache/src/archive.rs
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
@woodruffw
Copy link
Copy Markdown
Member Author

Some other notes:

  • I picked a token size of 16 because log2(64) * 16 = 96 bits of entropy, which I think is more than enough for our use cases (in the non-security-sensitive setting we could go even lower, but even in the security setting that's a comfortable guessing/collision margin).

Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
@woodruffw
Copy link
Copy Markdown
Member Author

woodruffw commented Apr 28, 2026

Ah, looks like we do have a behavioral assumption around ArchiveId's format:

 thread 'tests::test_link_deserialize' (133698) panicked at crates/uv-cache/src/lib.rs:1455:9:
    assertion failed: Link::from_str("archive-v0/foo").is_ok()

(The assumption before was that any String can be an ID, but now we're strict about it being a uv_fastid::Id-formatted string.)

I think we have a few options here:

  1. I could bump the CacheBucket for archives to archive-v1, which (IIUC) will prevent us from reading the old entries and failing.
  2. I could change the Link::from_str tests to make them pass, under the assumption that we'll just silently ignore the now-invalid archive entries.
  3. We could revert to ArchiveId(String) instead of ArchiveId(uv_fastid::Id), which loses us a few allocations but preserves exact compatibility with the current IDs.

Edit: This also might be minor in any case, since FWICT Link is basically dead code except one Windows path?

@charliermarsh
Copy link
Copy Markdown
Member

Do we need to be able to read existing IDs from disk? Like, do we need to be able to read these into our ID format?

❯ ls archive-v0
_3OAfZctF8UydglecrX0x aDcgovMVjxtaHXNCqt5MB gDOrNLlXb_61jOj8QHVNd mptS-Rhq4XPOyS7GFLAo7 TPYObcfx552jcjLbC-Ze_
_3oBciWusDv3wJMSA2k1j aDesp9jd9o_OfiZmqQt1Q ge4B5Av6_I1AmYM_SE2n4 MQKGCSwouTUcv-18fv2ic tQkdqY0-v4mSiexMnhgIx
_4Xo0kTSe9pOtzJmIwQe1 ADEwFc5XvSpW4F_n6w3cc gemQALx3mCH46z5DdJG_1 MQxWuQXNpwxkEP_WGVqOP TR-qz4iFLjPmqB2DAzRal
_5fR2nwdv0mukbpiURbo5 AdPGqIgQ6dIcqtVY6Ukzj GfC2kpPVxWjalEumO-FTc MrHj-Y-c1IwwVts8Q-hqj TRypPoYDSpSP9hE9nikNZ
_6uAgliFRctrSMvHkcgtw AdSANrr6dclYwxb5sUoI8 gfMBbzvHc6a0nY6rVDCI3 mRS75UNUPLwHMLfuHiYRL TS167myrBkRdHhfM2poMG
_7KPaIBs7gGkMZpCdkfUW AdwlJTA4HJmFauEKavD8S gfr7BTVGbqfWntoX9UXRM MRuSt3YdrkmMC8OkDWKGy tsiuXo_IEK4p57anWCKPE
_AWeEmMq3mrf8W_aPCcje AfyEl63gGYBZyravasAFW gGhXEwIaf4gbprwV0DtJr MS7FFdYzPtIw747AkAQO9 TSKa2juut_KuN-MDZLI_u
_CH8IEno0SZOFqwzSLX4d Ag_LijIN9aG30sSg2vQKS GgiV8fiNcZYdgMckklgnr MsfFoacczQ5Sycu6B6QvH tsnnqoFF-VkINtiDfdHnx
_cy8IlLQTNmkpj3XqHHjy ahMwCMSnw_FImlGvNWwIw GGqsuRRigeQW-Fj7uaAH4 MSZG2bqP04ghGHMaBUqIr TTBwEyhLXjmUHwIR_-eaj

@woodruffw
Copy link
Copy Markdown
Member Author

I think if we bump the archive version then no, the old ones will just become orphaned (like other bucket bumps). But I'll test that tomorrow, since it looks like the archive bucket hasn't been bumped before.

@zsol
Copy link
Copy Markdown
Member

zsol commented Apr 29, 2026

Bumping the archive version is a breaking change though. Is that warranted here? Maybe we could do that as a separate step later?

Edit: the case I mainly have in mind is a read-only mounted uv cache, where it's undesirable/impossible for the uv process to refresh it (because e.g. it's network-isolated)

@woodruffw
Copy link
Copy Markdown
Member Author

Bumping the archive version is a breaking change though. Is that warranted here? Maybe we could do that as a separate step later?

I might be missing it, but what makes it a breaking change? IIUC in the read-only cache case like that it'll result in cache misses rather than hard failures, but I don't think we typically consider that a breaking change (since we've done bumps with e.g. the metadata cache on patch releases).

With that said I agree it's not warranted here 😅 -- this is all a bit of a micro-optimization/attempt to save a few heap allocations on archive cache keys. I think the easiest thing for me to do here is to just revert back to using String for these keys and make a note that in the future we can re-roll the type + bucket version here.

Signed-off-by: William Woodruff <william@astral.sh>
@woodruffw woodruffw marked this pull request as ready for review April 29, 2026 14:32
@zanieb
Copy link
Copy Markdown
Member

zanieb commented Apr 29, 2026

Bumping the bucket isn't considered a breaking change.

It does seem reasonable to avoid bumping for now and to do it when we have a stronger motiviation.

Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
@woodruffw woodruffw enabled auto-merge (squash) April 29, 2026 15:47
@woodruffw woodruffw merged commit 0e69f0c into main Apr 29, 2026
102 of 103 checks passed
@woodruffw woodruffw deleted the ww/rm-nanoid branch April 29, 2026 15:56
woodruffw added a commit that referenced this pull request May 6, 2026
…9298) (#19301)

## Summary

Closes #19298.

#19201 (0.11.9) switched revision IDs from 21-character `nanoid` to
`uv_fastid`'s 16-character IDs, but didn't bump the cache bucket
version. As a result, when 0.11.9+ opens a cache populated by an older
uv, deserialization of an existing revision pointer hard-fails with:

```
Failed to deserialize cache entry
invalid ID: "HM0NxJml5hc7UjbfTWT1r" (must be 16 ID characters in the alphabet)
```

This effectively poisons long-lived caches on shared CI workers after
upgrade.

## Fix

Mirror the existing convention already documented on `ArchiveId` in
`crates/uv-cache/src/archive.rs`:

```rust
/// Note: for compatibility with the existing `archive-v0` bucket, this is a newtype
/// around a `String` instead of a newtype around `uv_fastid::Id`. In the future,
/// we may want to bump to `archive-v1` and switch to using `uv_fastid::Id` directly.
```

`RevisionId` had the same shape but was migrated to `uv_fastid::Id`
directly in #19201, which is what made the legacy 21-character pointers
undecodable. Switching `RevisionId` back to `String` (with the same
`uv_fastid::insecure()` source for new IDs) restores
forward-compatibility with both formats. The ID is only used as an
opaque path/string component, so no other call site needs to change.

## Test plan

- [x] New unit test in `crates/uv-distribution/src/source/revision.rs`:
- `deserialize_legacy_nanoid_revision` round-trips a `Revision` with a
21-character legacy ID through `rmp_serde` (the exact failing path from
the bug report)
- `round_trip_current_revision` confirms freshly-generated IDs still
serialize as the new 16-character form
- [x] `cargo build -p uv` clean
- [x] `cargo test -p uv-distribution --lib` -> 21/21 passing (incl. the
2 new tests)
- [x] `cargo test -p uv-fastid` -> 4/4 passing
- [x] `cargo fmt --check` clean
- [x] `cargo clippy -p uv-distribution --lib --tests` clean

---------

Signed-off-by: William Woodruff <william@yossarian.net>
Co-authored-by: William Woodruff <william@yossarian.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal A refactor or improvement that is not user-facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants