Skip to content

[ruff] use bitcode instead of bincode#23544

Merged
MichaReiser merged 3 commits intoastral-sh:mainfrom
chirizxc:bitcode
Mar 9, 2026
Merged

[ruff] use bitcode instead of bincode#23544
MichaReiser merged 3 commits intoastral-sh:mainfrom
chirizxc:bitcode

Conversation

@chirizxc
Copy link
Contributor

@chirizxc chirizxc commented Feb 24, 2026

Summary

See: #22284

This repository contains benchmarks. bitcode shows good results in them.

Test Plan

@astral-sh-bot
Copy link

astral-sh-bot bot commented Feb 24, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 85.05%. The percentage of expected errors that received a diagnostic held steady at 78.05%. The number of fully passing files held steady at 63/132.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Feb 24, 2026

Memory usage report

Memory usage unchanged ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Feb 24, 2026

mypy_primer results

Changes were detected when running on open source projects
pydantic (https://github.com/pydantic/pydantic)
- pydantic/_internal/_core_metadata.py:87:54: error[invalid-assignment] Invalid assignment to key "pydantic_js_extra" with declared type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | ((dict[str, int | float | str | ... omitted 3 union elements], type[Any], /) -> None)` on TypedDict `CoreMetadata`: value of type `dict[object, object]`
+ pydantic/_internal/_core_metadata.py:87:54: error[invalid-assignment] Invalid assignment to key "pydantic_js_extra" with declared type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | ((dict[str, Divergent], type[Any], /) -> None)` on TypedDict `CoreMetadata`: value of type `dict[object, object]`
- pydantic/fields.py:949:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:949:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:989:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:989:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1032:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1032:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1072:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1072:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1115:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1115:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1154:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1154:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1194:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
+ pydantic/fields.py:1194:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
- pydantic/fields.py:1573:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`, found `dict[str, int | float | str | ... omitted 3 union elements] | dict[Never, Never] | (((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`
+ pydantic/fields.py:1573:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`, found `dict[str, Divergent] | dict[Never, Never] | (((dict[str, Divergent], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`

@chirizxc chirizxc changed the title [ruff] use bitcode [ruff] use bitcode instead of bincode Feb 24, 2026
@astral-sh-bot
Copy link

astral-sh-bot bot commented Feb 24, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@chirizxc
Copy link
Contributor Author

chirizxc commented Feb 24, 2026

❯ git diff -- . ':(exclude)Cargo.lock'                                                                                                         
diff --git a/crates/ruff/Cargo.toml b/crates/ruff/Cargo.toml                                                                                   
index f279b40756..6c9e5b7cc7 100644
--- a/crates/ruff/Cargo.toml
+++ b/crates/ruff/Cargo.toml
@@ -39,6 +39,7 @@ ruff_workspace = { workspace = true }
 
 anyhow = { workspace = true }
 argfile = { workspace = true }
+bincode = { version = "2.0.1", optional = true }
 bitcode = { workspace = true }
 bitflags = { workspace = true }
 cachedir = { workspace = true }
❯ git diff --staged                                                                                                                            
diff --git a/crates/ruff/src/bin/cache_codec_bench.rs b/crates/ruff/src/bin/cache_codec_bench.rs                                               
new file mode 100644
index 0000000000..9f225b58ca
--- /dev/null
+++ b/crates/ruff/src/bin/cache_codec_bench.rs
@@ -0,0 +1,174 @@
+#[cfg(not(feature = "bincode"))]
+fn main() {
+    eprintln!("need bincode");
+}
+
+#[cfg(feature = "bincode")]
+mod bench {
+    use std::path::PathBuf;
+    use std::sync::atomic::{AtomicU64, Ordering};
+    use std::time::{Duration, Instant};
+
+    use bitcode::{Decode, Encode};
+    use rustc_hash::FxHashMap;
+
+    #[derive(Debug, Default, Clone, Encode, Decode, bincode::Encode, bincode::Decode)]
+    struct FileCacheData {
+        linted: bool,
+        formatted: bool,
+    }
+
+    #[derive(Debug, bincode::Encode, bincode::Decode)]
+    struct BincodeFileCache {
+        key: u64,
+        last_seen: AtomicU64,
+        data: FileCacheData,
+    }
+
+    #[derive(Debug, bincode::Encode, bincode::Decode)]
+    struct BincodePackageCache {
+        package_root: PathBuf,
+        files: FxHashMap<PathBuf, BincodeFileCache>,
+    }
+
+    #[derive(Debug, Clone, Encode, Decode)]
+    struct BitcodeFileCache {
+        key: u64,
+        last_seen: u64,
+        data: FileCacheData,
+    }
+
+    #[derive(Debug, Clone, Encode, Decode)]
+    struct BitcodePackageCache {
+        package_root: String,
+        files: Vec<(String, BitcodeFileCache)>,
+    }
+
+    fn build_data(file_count: usize) -> (BincodePackageCache, BitcodePackageCache) {
+        let mut bincode_files = FxHashMap::default();
+        let mut bitcode_files = Vec::with_capacity(file_count);
+
+        for i in 0..file_count {
+            let path = format!("pkg/src/file_{i}.py");
+            let key = (i as u64).wrapping_mul(1_000_003);
+            let last_seen = 1_730_000_000_000 + i as u64;
+            let data = FileCacheData {
+                linted: i % 3 != 0,
+                formatted: i % 5 == 0,
+            };
+
+            bincode_files.insert(
+                PathBuf::from(&path),
+                BincodeFileCache {
+                    key,
+                    last_seen: AtomicU64::new(last_seen),
+                    data: data.clone(),
+                },
+            );
+            bitcode_files.push((
+                path,
+                BitcodeFileCache {
+                    key,
+                    last_seen,
+                    data,
+                },
+            ));
+        }
+
+        (
+            BincodePackageCache {
+                package_root: PathBuf::from("C:/bench/pkg"),
+                files: bincode_files,
+            },
+            BitcodePackageCache {
+                package_root: "C:/bench/pkg".to_string(),
+                files: bitcode_files,
+            },
+        )
+    }
+
+    fn bench_bincode(data: &BincodePackageCache, iters: usize) -> (Duration, Duration, usize) {
+        let mut enc_total = Duration::ZERO;
+        let mut dec_total = Duration::ZERO;
+        let mut bytes = 0usize;
+        let config = bincode::config::standard();
+
+        for _ in 0..iters {
+            let start = Instant::now();
+            let encoded = bincode::encode_to_vec(data, config).expect("bincode encode");
+            enc_total += start.elapsed();
+            bytes = encoded.len();
+
+            let start = Instant::now();
+            let (decoded, _): (BincodePackageCache, usize) =
+                bincode::decode_from_slice(&encoded, config).expect("bincode decode");
+            dec_total += start.elapsed();
+
+            std::hint::black_box(decoded.files.len());
+        }
+
+        (enc_total, dec_total, bytes)
+    }
+
+    fn bench_bitcode(data: &BitcodePackageCache, iters: usize) -> (Duration, Duration, usize) {
+        let mut enc_total = Duration::ZERO;
+        let mut dec_total = Duration::ZERO;
+        let mut bytes = 0usize;
+
+        for _ in 0..iters {
+            let start = Instant::now();
+            let encoded = bitcode::encode(data);
+            enc_total += start.elapsed();
+            bytes = encoded.len();
+
+            let start = Instant::now();
+            let decoded: BitcodePackageCache = bitcode::decode(&encoded).expect("bitcode decode");
+            dec_total += start.elapsed();
+
+            std::hint::black_box(decoded.files.len());
+        }
+
+        (enc_total, dec_total, bytes)
+    }
+
+    fn print_result(name: &str, enc: Duration, dec: Duration, bytes: usize, iters: usize) {
+        println!(
+            "{name}: encode_avg={:.3}ms decode_avg={:.3}ms bytes={bytes}",
+            enc.as_secs_f64() * 1_000.0 / iters as f64,
+            dec.as_secs_f64() * 1_000.0 / iters as f64
+        );
+    }
+
+    pub(crate) fn run() {
+        let file_count = std::env::args()
+            .nth(1)
+            .and_then(|v| v.parse::<usize>().ok())
+            .unwrap_or(20_000);
+        let iters = std::env::args()
+            .nth(2)
+            .and_then(|v| v.parse::<usize>().ok())
+            .unwrap_or(200);
+
+        println!("dataset: files={file_count}, iterations={iters}");
+        let (bincode_data, bitcode_data) = build_data(file_count);
+
+        let (enc, dec, bytes) = bench_bincode(&bincode_data, iters);
+        print_result("bincode(old)", enc, dec, bytes, iters);
+
+        let _ = std::hint::black_box(
+            bincode_data
+                .files
+                .values()
+                .map(|f| f.last_seen.load(Ordering::Relaxed))
+                .sum::<u64>(),
+        );
+
+        let (enc, dec, bytes) = bench_bitcode(&bitcode_data, iters);
+        print_result("bitcode(new)", enc, dec, bytes, iters);
+    }
+}
+
+#[cfg(feature = "bincode")]
+fn main() {
+    bench::run();
+}

On my system (Windows 10), this gives the following results (the benchmark is quite synthetic):

изображение

~30% faster encoding
~56% faster decoding
~23% smaller size

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! I'll be curious to get Micha's thoughts next week.

I just had a couple of questions on the code itself, probably from my unfamiliarity with bitcode.

}

#[derive(Debug, bitcode::Encode, bitcode::Decode)]
struct SerializedFileCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encode and Decode seem to be implemented for all of these fields. Do we need the to and from_runtime methods for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_runtime reads atomic (load) and puts plain u64 into serializable struct /
into_runtime when reading from disk, wraps u64 back into AtomicU64::new(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry, but I saw in the bitcode docs that Decode and Encode are implemented for AtomicU64. I see now that that requires the target_has_atomic=64 feature, though.

https://docs.rs/bitcode/latest/bitcode/trait.Decode.html#impl-Decode%3C'a%3E-for-AtomicU64

#[derive(Debug, bitcode::Encode, bitcode::Decode)]
struct SerializedPackageCache {
package_root: String,
files: Vec<(String, SerializedFileCache)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would #[bitcode(with_serde)] help to avoid this intermediate struct here? It seems a bit unfortunate to have to go through to_string_lossy for these paths. I think if the conversion ends up being lossy, we might as well not even cache those files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/crates/ruff/Cargo.toml b/crates/ruff/Cargo.toml
index f279b40756..0d85f4c0ed 100644
--- a/crates/ruff/Cargo.toml
+++ b/crates/ruff/Cargo.toml
@@ -39,7 +39,7 @@ ruff_workspace = { workspace = true }
 
 anyhow = { workspace = true }
 argfile = { workspace = true }
-bitcode = { workspace = true }
+bitcode = { workspace = true, features = ["serde"] }
 bitflags = { workspace = true }
 cachedir = { workspace = true }
 clap = { workspace = true, features = ["derive", "env", "wrap_help"] }
diff --git a/crates/ruff/src/cache.rs b/crates/ruff/src/cache.rs
index 1a29789732..3830db4e10 100644
--- a/crates/ruff/src/cache.rs
+++ b/crates/ruff/src/cache.rs
@@ -109,7 +109,7 @@ impl Cache {
             }
         };
 
-        let package: SerializedPackageCache = match bitcode::decode(&serialized) {
+        let mut package: PackageCache = match bitcode::deserialize(&serialized) {
             Ok(package) => package,
             Err(err) => {
                 warn_user!("Failed parse cache file `{}`: {err}", path.display());
@@ -117,8 +117,6 @@ impl Cache {
             }
         };
 
-        let mut package = package.into_runtime();
-
         // Sanity check.
         if package.package_root != package_root {
             warn_user!(
@@ -169,7 +167,7 @@ impl Cache {
 
         // Serialize to in-memory buffer because hyperfine benchmark showed that it's faster than
         // using a `BufWriter` and our cache files are small enough that streaming isn't necessary.
-        let serialized = bitcode::encode(&SerializedPackageCache::from_runtime(&self.package));
+        let serialized = bitcode::serialize(&self.package).context("Failed to serialize cache")?;
         temp_file
             .write_all(&serialized)
             .context("Failed to write serialized cache to temporary file.")?;
@@ -298,69 +296,8 @@ impl Cache {
     }
 }
 
-/// On disk representation of a cache of a package.
-#[derive(Debug, bitcode::Encode, bitcode::Decode)]
-struct SerializedPackageCache {
-    package_root: String,
-    files: Vec<(String, SerializedFileCache)>,
-}
-
-impl SerializedPackageCache {
-    fn from_runtime(cache: &PackageCache) -> Self {
-        Self {
-            package_root: cache.package_root.to_string_lossy().into_owned(),
-            files: cache
-                .files
-                .iter()
-                .map(|(path, file)| {
-                    (
-                        path.to_string_lossy().into_owned(),
-                        SerializedFileCache::from_runtime(file),
-                    )
-                })
-                .collect(),
-        }
-    }
-
-    fn into_runtime(self) -> PackageCache {
-        PackageCache {
-            package_root: PathBuf::from(self.package_root),
-            files: self
-                .files
-                .into_iter()
-                .map(|(path, file)| (PathBuf::from(path), file.into_runtime()))
-                .collect(),
-        }
-    }
-}
-
-#[derive(Debug, bitcode::Encode, bitcode::Decode)]
-struct SerializedFileCache {
-    key: u64,
-    last_seen: u64,
-    data: FileCacheData,
-}
-
-impl SerializedFileCache {
-    fn from_runtime(cache: &FileCache) -> Self {
-        Self {
-            key: cache.key,
-            last_seen: cache.last_seen.load(Ordering::Relaxed),
-            data: cache.data.clone(),
-        }
-    }
-
-    fn into_runtime(self) -> FileCache {
-        FileCache {
-            key: self.key,
-            last_seen: AtomicU64::new(self.last_seen),
-            data: self.data,
-        }
-    }
-}
-
 /// Runtime representation of a cache of a package.
-#[derive(Debug)]
+#[derive(Debug, serde::Serialize, serde::Deserialize)]
 struct PackageCache {
     /// Path to the root of the package.
     ///
@@ -372,7 +309,7 @@ struct PackageCache {
 }
 
 /// Runtime representation of the cache per source file.
-#[derive(Debug)]
+#[derive(Debug, serde::Serialize, serde::Deserialize)]
 pub(crate) struct FileCache {
     /// Key that determines if the cached item is still valid.
     key: u64,
@@ -392,7 +329,11 @@ impl FileCache {
     }
 }
 
-#[derive(Debug, Default, Clone, bitcode::Encode, bitcode::Decode)]
+#[derive(
+    Debug, Default, Clone,
+    bitcode::Encode, bitcode::Decode,
+    serde::Serialize, serde::Deserialize,
+)]
 struct FileCacheData {
     linted: bool,
     formatted: bool,

I think we could do something like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a benchmark? The bitcode docs say the serde integration "is slower, produces slightly larger output", but don't quantify how much, so I'm curious how much that would affect caching performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would #[bitcode(with_serde)] help to avoid this intermediate struct here? It seems a bit unfortunate to have to go through to_string_lossy for these paths. I think if the conversion ends up being lossy, we might as well not even cache those files.

diff --git a/crates/ruff/src/cache.rs b/crates/ruff/src/cache.rs
index 1a29789732..0fa283d4bc 100644
--- a/crates/ruff/src/cache.rs
+++ b/crates/ruff/src/cache.rs
@@ -1,4 +1,5 @@
 use std::fmt::Debug;
+use std::ffi::OsString;
 use std::fs;
 use std::hash::Hasher;
 use std::io::{self, Write};
@@ -109,7 +110,9 @@ impl Cache {
             }
         };
 
-        let package: SerializedPackageCache = match bitcode::decode(&serialized) {
+        let mut package: PackageCache = match bitcode::decode::<SerializedPackageCache>(&serialized)
+            .map(SerializedPackageCache::into_runtime)
+        {
             Ok(package) => package,
             Err(err) => {
                 warn_user!("Failed parse cache file `{}`: {err}", path.display());
@@ -117,8 +120,6 @@ impl Cache {
             }
         };
 
-        let mut package = package.into_runtime();
-
         // Sanity check.
         if package.package_root != package_root {
             warn_user!(
@@ -298,37 +299,77 @@ impl Cache {
     }
 }
 
+/// On-disk representation of a path, preserving non-UTF-8 data.
+#[derive(Debug, Clone, bitcode::Encode, bitcode::Decode)]
+struct SerializedPath {
+    #[cfg(unix)]
+    components: Vec<u8>,
+    #[cfg(windows)]
+    components: Vec<u16>,
+}
+
+impl SerializedPath {
+    fn from_path(path: &Path) -> Self {
+        #[cfg(unix)]
+        {
+            use std::os::unix::ffi::OsStrExt;
+
+            Self {
+                components: path.as_os_str().as_bytes().to_vec(),
+            }
+        }
+
+        #[cfg(windows)]
+        {
+            use std::os::windows::ffi::OsStrExt;
+
+            Self {
+                components: path.as_os_str().encode_wide().collect(),
+            }
+        }
+    }
+
+    fn into_path_buf(self) -> PathBuf {
+        #[cfg(unix)]
+        {
+            use std::os::unix::ffi::OsStringExt;
+            PathBuf::from(OsString::from_vec(self.components))
+        }
+
+        #[cfg(windows)]
+        {
+            use std::os::windows::ffi::OsStringExt;
+            PathBuf::from(OsString::from_wide(&self.components))
+        }
+    }
+}
+
 /// On disk representation of a cache of a package.
 #[derive(Debug, bitcode::Encode, bitcode::Decode)]
 struct SerializedPackageCache {
-    package_root: String,
-    files: Vec<(String, SerializedFileCache)>,
+    package_root: SerializedPath,
+    files: Vec<(SerializedPath, SerializedFileCache)>,
 }
 
 impl SerializedPackageCache {
     fn from_runtime(cache: &PackageCache) -> Self {
         Self {
-            package_root: cache.package_root.to_string_lossy().into_owned(),
+            package_root: SerializedPath::from_path(&cache.package_root),
             files: cache
                 .files
                 .iter()
-                .map(|(path, file)| {
-                    (
-                        path.to_string_lossy().into_owned(),
-                        SerializedFileCache::from_runtime(file),
-                    )
-                })
+                .map(|(path, file)| (SerializedPath::from_path(path), SerializedFileCache::from_runtime(file)))
                 .collect(),
         }
     }
 
     fn into_runtime(self) -> PackageCache {
         PackageCache {
-            package_root: PathBuf::from(self.package_root),
+            package_root: self.package_root.into_path_buf(),
             files: self
                 .files
                 .into_iter()
-                .map(|(path, file)| (PathBuf::from(path), file.into_runtime()))
+                .map(|(path, file)| (path.into_path_buf(), file.into_runtime()))
                 .collect(),
         }
     }

we could use something like this instead of .to_string_lossy()

This comment was marked as outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is bench-current the bitcode version using serde?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little later, I'll try the option with rkyv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of bitcode with serde sounds promising. Do you want to update this PR to that version?

rkyv might be a bit more involved and I suggest we open a separate PR for it. We can then decide which of the two PRs we want to merge. We can also decide to leave it at this and merge the bitcode PR now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of bitcode with serde sounds promising. Do you want to update this PR to that version?

rkyv might be a bit more involved and I suggest we open a separate PR for it. We can then decide which of the two PRs we want to merge. We can also decide to leave it at this and merge the bitcode PR now.

Just in case, I'll try to do a few more benchmarks locally and get back to you with the results later.

@ntBre ntBre requested a review from MichaReiser February 25, 2026 14:29
@amyreese
Copy link
Member

Does this affect compatibility with existing cache generated by bincode, or do they use the same on-disk format?

@ntBre
Copy link
Contributor

ntBre commented Feb 25, 2026

We don't try to reuse caches across ruff versions from what I remember, so there's no problem with breaking the cache format between releases, even patch ones. If you have a long-lived .ruff_cache directory, it should have versioned sub-directories:

❯ tree -L 1 .ruff_cache
.ruff_cache
├── 0.1.15
├── 0.1.4
├── 0.1.5
├── 0.12.12
├── 0.12.2
├── 0.12.4
├── 0.12.5
├── 0.12.7
├── 0.12.8
├── 0.12.9
├── 0.13.0
├── 0.13.1
├── 0.13.2
├── 0.13.3
├── 0.14.0
├── 0.14.1
├── 0.14.10
├── 0.14.11
├── 0.14.13
├── 0.14.14
├── 0.14.2
├── 0.14.3
├── 0.14.4
├── 0.14.5
├── 0.14.6
├── 0.14.7
├── 0.14.8
├── 0.15.0
├── 0.15.1
├── 0.15.2
├── 0.3.0
├── 0.4.0
├── 0.5.0
├── 0.9.0
├── CACHEDIR.TAG
└── content

@ntBre ntBre added the dependencies Pull requests that update a dependency file label Feb 26, 2026
@chirizxc
Copy link
Contributor Author

chirizxc commented Mar 6, 2026

@MichaReiser

The benchmark was performed on the repository https://github.com/apache/airflow.

изображение

Cache size:

изображение

https://gist.github.com/chirizxc/aa7eefd17b507575c1bff97c94e2bebf

@MichaReiser
Copy link
Member

Thanks for the detailed benchmarks. Does the rkyv benchmark use its zero copy deserialization?

@chirizxc
Copy link
Contributor Author

chirizxc commented Mar 6, 2026

изображение

This is a benchmark for the entire folder of projects that I have created/contributed to in Python.

@chirizxc
Copy link
Contributor Author

chirizxc commented Mar 6, 2026

Thanks for the detailed benchmarks. Does the rkyv benchmark use its zero copy deserialization?

In this benchmark, rkyv was not used in zero-copy mode.

I did rkyv::from_bytes(...), and then converted the data to regular runtime structures (PackageCache/FileCache) via into_runtime(), i.e., with copying.

If I understand correctly, to use zero-copy, we will have to slightly change the cache design itself. Currently, the cache is designed for data ownership and mutation (PathBuf, HashMap, AtomicU64, last_seen updates), while rkyv's zero-copy works better with archived (essentially read-only) structures and a different access pattern. For true zero-copy, we would have to significantly rework the cache model (separate archived view + separate mutable state).

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed analysis. This looks good to me. I only have a small comment and this should then be ready to merge.

@MichaReiser MichaReiser enabled auto-merge (squash) March 9, 2026 14:02
@MichaReiser MichaReiser merged commit 90f8843 into astral-sh:main Mar 9, 2026
50 checks passed
@chirizxc chirizxc deleted the bitcode branch March 9, 2026 14:06
@musicinmybrain
Copy link
Contributor

Did you happen to notice that bitcode is intentionally unsound, relying on known undefined behavior for performance reasons? Personally, I find that more concerning than the unmaintained status of bincode.

@MichaReiser
Copy link
Member

No, I was not aware of this and I agree with this sentiment.

MichaReiser added a commit that referenced this pull request Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants