Conversation
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
|
Some other notes:
|
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
|
Ah, looks like we do have a behavioral assumption around (The assumption before was that any I think we have a few options here:
Edit: This also might be minor in any case, since FWICT |
|
Do we need to be able to read existing IDs from disk? Like, do we need to be able to read these into our ID format? |
|
I think if we bump the archive version then no, the old ones will just become orphaned (like other bucket bumps). But I'll test that tomorrow, since it looks like the archive bucket hasn't been bumped before. |
|
Bumping the archive version is a breaking change though. Is that warranted here? Maybe we could do that as a separate step later? Edit: the case I mainly have in mind is a read-only mounted uv cache, where it's undesirable/impossible for the uv process to refresh it (because e.g. it's network-isolated) |
I might be missing it, but what makes it a breaking change? IIUC in the read-only cache case like that it'll result in cache misses rather than hard failures, but I don't think we typically consider that a breaking change (since we've done bumps with e.g. the metadata cache on patch releases). With that said I agree it's not warranted here 😅 -- this is all a bit of a micro-optimization/attempt to save a few heap allocations on archive cache keys. I think the easiest thing for me to do here is to just revert back to using |
Signed-off-by: William Woodruff <william@astral.sh>
|
Bumping the bucket isn't considered a breaking change. It does seem reasonable to avoid bumping for now and to do it when we have a stronger motiviation. |
Signed-off-by: William Woodruff <william@astral.sh>
Signed-off-by: William Woodruff <william@astral.sh>
…9298) (#19301) ## Summary Closes #19298. #19201 (0.11.9) switched revision IDs from 21-character `nanoid` to `uv_fastid`'s 16-character IDs, but didn't bump the cache bucket version. As a result, when 0.11.9+ opens a cache populated by an older uv, deserialization of an existing revision pointer hard-fails with: ``` Failed to deserialize cache entry invalid ID: "HM0NxJml5hc7UjbfTWT1r" (must be 16 ID characters in the alphabet) ``` This effectively poisons long-lived caches on shared CI workers after upgrade. ## Fix Mirror the existing convention already documented on `ArchiveId` in `crates/uv-cache/src/archive.rs`: ```rust /// Note: for compatibility with the existing `archive-v0` bucket, this is a newtype /// around a `String` instead of a newtype around `uv_fastid::Id`. In the future, /// we may want to bump to `archive-v1` and switch to using `uv_fastid::Id` directly. ``` `RevisionId` had the same shape but was migrated to `uv_fastid::Id` directly in #19201, which is what made the legacy 21-character pointers undecodable. Switching `RevisionId` back to `String` (with the same `uv_fastid::insecure()` source for new IDs) restores forward-compatibility with both formats. The ID is only used as an opaque path/string component, so no other call site needs to change. ## Test plan - [x] New unit test in `crates/uv-distribution/src/source/revision.rs`: - `deserialize_legacy_nanoid_revision` round-trips a `Revision` with a 21-character legacy ID through `rmp_serde` (the exact failing path from the bug report) - `round_trip_current_revision` confirms freshly-generated IDs still serialize as the new 16-character form - [x] `cargo build -p uv` clean - [x] `cargo test -p uv-distribution --lib` -> 21/21 passing (incl. the 2 new tests) - [x] `cargo test -p uv-fastid` -> 4/4 passing - [x] `cargo fmt --check` clean - [x] `cargo clippy -p uv-distribution --lib --tests` clean --------- Signed-off-by: William Woodruff <william@yossarian.net> Co-authored-by: William Woodruff <william@yossarian.net>
Summary
This introduces a new workspace member,
uv-fastid.uv-fastidprovides IDs using the same alphabet as nanoid, but with a slightly smaller token size (16 instead of 21) and fewer knobs for configuration.In practice it should also be faster, since our IDs are now trivially copyable + don't require any heap allocations to construct internally. This is probably not a huge deal in our case anyways (ID generation isn't exactly dominating our runtime), but it doesn't hurt to have a fast primitive here.
The two main APIs I've added are
insecure()andsecure(), whose names refer to the kind of PRNG they use internally. Right now we only useinsecure(), since we don't actually need a CSPRNG for the uses of nanoid we're replacing (revision and archive IDs).Some other notes:
randwe have a purely userspace CSPRNG (seeded from the OS's CSPRNG), so we don't really needfastrandfor performance here.unsafein this, although I think the usage is demonstrably safe (as long as crate-private invariants are maintained) 🙂nanoiddoesn't make any format assumptions, so switching to a slightly shorter ID form here shouldn't break anything. Our previous IDs also aren't part of any public interface AFAICT.Closes #19176.
Test Plan
Added unit tests to the new
uv-fastidcrate. Our existing tests should cover this transitively as well.