storage: pack read optimisations and storage-wide LRU FD pool by hiddeco · Pull Request #2134 · go-git/go-git

hiddeco · 2026-05-16T20:33:54Z

This PR primarily addresses resource hygiene and concurrent correctness: bounded FDs across a Storage (and across multiple Storages via Options.Pool), *fdpool.Pool.Stats() for observability, and four s.index race fixes. Read-path performance changes are minimal — measured against main with benchstat (-count=6, -benchtime=1s):

Allocations reduced 14% geomean, up to 47% on hot read paths (ParallelReads, AlternatesObjectLookup/EncodedObject). Lower GC pressure.
Alternates lookup latency -59% (with variance caveats; replication recommended).
Flat throughput on the primary read path (ParallelReads).
Single-pack ReindexThenRead is +14% slower (cost of singleflight ceremony when there's no parallelisation benefit to capture).

`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 39.3 µs -96.4% (p=0.002, n=6) B/op: 111.5 KiB → 6.88 KiB -93.8% (p=0.002) allocs/op: 4219 → 114 -97.3% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

PackfileWriter.Notify wrote to s.index without taking muI, even though every other s.index writer in this file (the requireIndex publish, Reindex reset, ResetPackHandles, the pruning paths) takes muI.Lock. The asymmetry is a latent race against concurrent readers that take muI.RLock; under -race a parallel EncodedObject + PackfileWriter workload shows the unlocked write. Wrap the write under muI.Lock so Notify obeys the same locking contract as the other writers. The fix has no effect on the normal write rate: Notify fires once per packfile close, far outside the read hot path. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`muI` guards reads and writes of `s.index`. The earlier `PackfileWriter.Notify` wrap closed the writer side; two read sites still touched `s.index` without taking the lock: `encodedObjectSizeFromPackfile` and the `lazyPackfilesIter` open callback in `buildPackfileIters`. Both are reachable from concurrent `EncodedObjectSize` / `IterEncodedObjects` callers and a racing reader can observe a partially-built map under the lazy-publish path. Wrap each read in a short `RLock`/`RUnlock` window matching the pattern already used in `getFromPackfile`. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

requireIndex serialised cold-load through s.muI.Lock: the first reader took the write lock, walked every pack sequentially under loadIdxFile, and only released the lock once all idx files were materialised. On a repo with N packs and any concurrent reader herd, every joiner blocked on the same Lock and paid serial latency. Coalesce concurrent first-readers via golang.org/x/sync singleflight. The winning goroutine builds a local map in populateIndex with errgroup.SetLimit(GOMAXPROCS), so N packs load in O(N/GOMAXPROCS) instead of O(N), and joiners block in singleflight.Do until the in-flight call returns. The winner double-checks s.index inside the singleflight window for idempotency across back-to-back herds, then publishes the local map under muI.Lock. A late arrival that built its own map on the race-loss path closes the local indexes (LazyIndex implements io.Closer and holds sharedFile refcounts) before dropping them, so we do not leak descriptors. Reindex calls populateIndex synchronously and atomically swaps in the fresh map, leaving the next read as a hot-cache hit instead of a deferred cold-load. Previously-cached LazyIndex entries are not closed: in-flight readers that borrowed an entry before the swap keep using it, and the underlying SharedFile FDs close via the grace timer (no-pool mode) or fdpool LRU eviction (pool mode) once they fall out of use. This mirrors canonical Git's reprepare model[1] where packfile_store_reprepare leaves existing packs in place and only the FD-level LRU evicts. The legacy `loadIdxFile` path collapses into `loadIdx` + `loadLazyIndex` / `loadMemoryIndexValue`; these split helpers make the parallel loop's per-pack work side-effect-free (callers install into s.index, the helpers do not). Tests cover the first-read hot map, the Reindex pre-warm, and a 32-goroutine herd against a cold ObjectStorage (clean under -race). Bench surface (Apple M4 Pro, -benchtime=1s): BenchmarkObjectStorage_ColdEncodedObject ~3.2 ms/op BenchmarkObjectStorage_ReindexThenRead ~33 us/op ColdEncodedObject builds a fresh Storage per iteration against a 32-pack synthetic fixture (helper added in `multipack_fixture_test.go`); the cost is dominated by populateIndex parallel-loading every pack's idx, plus a single packed-object decode. Single-pack fixtures cannot reproduce the parallelisation win — quantifying it against serial main requires benchstat against a baseline recording. The change is internal: no exported API surface moves apart from Reindex gaining an error return, which the v6 cycle permits. [1]: https://github.com/git/git/blob/94f057755b/packfile.c#L1085-L1089 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`x/fdpool` implements a fixed-capacity LRU cache of resources that own a file descriptor. The pool is used by `storage/filesystem` to bound the number of open `.pack` / `.idx` / `.rev` descriptors across a Storage. Members register on first FD acquire and report subsequent acquires via `Touch`; when the LRU length exceeds the configured capacity, the least-recently-touched member is evicted via `Member.ReleaseNow`. A zero or negative capacity yields a no-op pool: `Touch` and `Forget` do nothing, no member is ever evicted, and `Stats` reports zero capacity. Call sites can therefore wire the pool unconditionally and disable pooling by passing `New(0)`, without special-casing the construction path. The eviction path deliberately drops `Pool.mu` around the call to `Member.ReleaseNow`. The intended wiring takes the member's own mutex during `ReleaseNow`, and the member's normal acquire path takes that mutex and then calls `Pool.Touch`, which would take `Pool.mu`. Holding `Pool.mu` across `ReleaseNow` would establish a Pool→Member ordering that conflicts with the Member→Pool ordering of the acquire path and risks deadlock. Releasing `Pool.mu` for the eviction call keeps Member→Pool as the only direction of lock acquisition. The package lives under `x/` so its API is reachable from external callers — `Options.Pool` consumers such as GitOps controllers that share an FD budget across many short-lived Storages — while still signalling, per the conventions of `x/`, that the surface may change. The package depends only on the standard library; the import dependency runs one way (consumers → fdpool) for any code that needs an LRU FD cache with a `ReleaseNow` member contract. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`SharedFile` now takes an optional `*fdpool.Pool` at construction via the new `NewWithPool` factory; `New` keeps the pool-less behaviour. When a pool is wired, `Acquire` notifies it via `Pool.Touch(s)` after the FD is in hand, which both registers the SharedFile on first open and refreshes its LRU position on every subsequent acquire. The Touch runs outside `s.mu` so the established Member→Pool lock-acquire direction holds, matching the deadlock-avoidance contract documented on [fdpool.Pool]. `Release` skips the grace-period timer when a pool is wired: the FD stays open and registered, and the pool drives the eventual close via [SharedFile.ReleaseNow] on eviction. Without a pool, the existing grace-timer behaviour is unchanged. `Close` calls `Pool.Forget(s)` so a racing eviction cannot observe a freed Member. `SharedFile` already exposes `ReleaseNow`, so the type satisfies [fdpool.Member] without further surface change; the interface itself lives in `x/fdpool`. Pool-integration tests cover `Touch` on every Acquire, FD-quiescence under a wired pool, terminal `Close` -> `Forget` (with and without active refs), and a one-slot pool driving real eviction across two SharedFiles. A `NoPool_ImmediateClose` test locks in the pool-less grace-timer policy alongside, so a future regression that always-pools cannot pass silently. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`NewLazyIndexWithPool` is the pool-aware variant of [NewLazyIndex]: it forwards a `*fdpool.Pool` to both the idx and rev [sharedfile.SharedFile]s so a storage-wide LRU bounds the FD budget held by the lazy index. `NewLazyIndex` remains the pool-less convenience over a `nil` pool. The change is purely additive at the surface; the per-source FD lifecycle lives inside the [sharedfile.SharedFile] primitive. Both shared files receive the same pool. Eviction acts on each SharedFile independently — the pool has no idx-versus-rev affinity — so an active .idx is not forced to keep its .rev FD warm, and vice versa. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`NewWithPool` is the pool-aware variant of [New]: it forwards a `*fdpool.Pool` to the .pack [sharedfile.SharedFile] so a storage-wide LRU bounds the open pack descriptors across many PackHandles. `New` remains the pool-less convenience over a `nil` pool. The change is purely additive at the surface; the per-source FD lifecycle lives inside the [sharedfile.SharedFile] primitive. Only the .pack FD is threaded here. The .idx and .rev FDs are managed by the [idxfile.LazyIndex] that [PackHandle.Index] lazily constructs; storages that want a pool-wired index ride a separate code path (see `storage/filesystem`'s `ObjectStorage.loadLazyIndex`). Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Wire the storage-wide `*fdpool.Pool` from `Storage` through `DotGit` to both pack-FD (`internal/packhandle`) and idx-FD (`plumbing/format/idxfile`) sharedFiles. `Options.MaxOpenDescriptors` now governs the pool capacity: zero selects `defaultMaxOpenDescriptors` (256, ~85 concurrently-hot packs at 3 FDs each), negative values yield a no-op pool so callers can disable pooling without special-casing the construction path. Replaces the previous no-op stub. `Options.Pool` accepts a caller-constructed `*fdpool.Pool` so multiple `Storage` instances in the same process can share one FD budget. Useful for callers that spawn many short-lived storages concurrently (GitOps controllers reconciling N repositories, daemons handling parallel requests). When nil the storage allocates its own pool from `MaxOpenDescriptors`. `dotgit.Options` gains a `Pool` field that the packhandle constructor consumes via `packhandle.NewWithPool`. `ObjectStorage.loadLazyIndex` switches to `idxfile.NewLazyIndexWithPool` and forwards the same pool. `DotGit.Alternates` propagates the parent's pool to alternate dotgits so the storage-wide budget covers alternates too. Tests inject a `*fdpool.Pool` via `Options.Pool` and inspect `pool.Stats()` directly to verify the wiring, mirroring the shape an external caller uses to observe the pool. Coverage spans the alternate inheritance path, pool sharing across multiple storages, and the disabled-pool path. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Three benches characterise the pool wiring at increasing scope. All numbers Apple M4 Pro, single -benchtime=1s run; relative shapes are stable but absolute ns/op drift across runs and benchstat against a baseline is the correct methodology for quantitative comparisons. `BenchmarkFDPool_WorkingSetFitsInPool` and `BenchmarkFDPool_Eviction_WorkingSetExceedsPool` target the pool directly. The fit-in-pool case (wset=16, cap=64, all hits) runs at ~12 ns/op with zero allocations; the eviction case (wset=32, cap=16, every Touch evicts) costs ~46 ns/op with a single `list.PushFront` allocation per round. These numbers are the floor — they set a ceiling for the storage- level cost we can attribute to the pool. `BenchmarkSharedFileAcquireReleaseWithPool` in `internal/sharedfile` measures the pool-notify overhead on the SharedFile hot path. Touch on a warm SharedFile is allocation-free and adds roughly 7 ns/op over the pool-less baseline (`BenchmarkSharedFileAcquireRelease` ~7 ns/op vs the pool-wired variant ~14 ns/op). `BenchmarkStorage_PoolPressure` exercises the full stack against a 32-pack synthetic fixture with `cache.NewObjectLRU(0)` so every read traverses the FD layer. Two sub-benchmarks compare regimes: NoChurn (cap=256, no eviction) ~12.5 us/op 5.1 KB/op 88 allocs/op Churn (cap=8, eviction) ~17.9 us/op 5.3 KB/op 91 allocs/op The ~43% delta between Churn and NoChurn is the cost of evicting and reopening pack/idx FDs when the working set exceeds capacity. Each sub-benchmark injects its own `*fdpool.Pool` via `Options.Pool` and asserts against `pool.Stats().Evictions` at the end — NoChurn fails if any eviction fires, Churn fails if no eviction fires — so the bench refuses to silently stop exercising the pool. `BenchmarkObjectStorage_ParallelReads` complements the storage benches by measuring concurrent EncodedObject throughput on the RLock'd read path. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`BenchmarkAlternatesObjectLookup` constructed its storage via the low-level `NewObjectStorage(dotgit.NewWithOptions(...))` path, which bypasses the FD pool that `NewStorageWithOptions` installs. The preamble noted the goal was to mirror `PlainClone(Shared: true)` behaviour, but the chosen construction shape missed the public API's wiring. Switching to `NewStorageWithOptions` threads the pool through to the alternate's PackHandles and gives the benchmark the same FD lifecycle real consumers see. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Touch's eviction path discards ReleaseNow's error by design (the Member contract states this explicitly). A future reader comparing this to errors.Join in packhandle.doClose may flag the asymmetry as an oversight. Add a sentence to the existing lock-ordering comment block recording the intentional contrast so the pattern reads as a documented choice rather than missing error handling. No behaviour change; documentation only. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

The Options.MaxOpenDescriptors godoc claimed the pool bounds ".pack/.idx/.rev descriptors across the entire Storage", which overstates coverage — pack-write FDs (PackWriter) are short-lived and not pooled. Reword to name the read side explicitly and add a note that the 256 default assumes go-git holds the dominant share of the process's FD budget; callers running alongside other FD-heavy subsystems should size against their full budget. Options.Pool gains a paragraph naming git.Open, git.Clone, and git.Init as the layered entry points for callers who want to share an *fdpool.Pool across Storage instances. The path-based wrappers (git.PlainOpen, git.PlainClone) construct their own Storage internally and so do not accept an injected pool; that is by design (they exist to hide storage internals). Both Options.Pool fields (filesystem and dotgit) gain a stability disclaimer: the field's API stability tracks fdpool.Pool's, not this package's, because fdpool lives under x/ and may change without semver. Consumers reading these fields should treat them as experimental on the same timeline. No behaviour change; documentation only. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Stats already exposes Hits and Evictions; operators monitoring the pool could not distinguish clean evictions from those where Member.ReleaseNow returned a non-nil error. Add EvictionFailures so the failure rate is visible without callers building their own wrapping. The eviction itself still completes (the Member is removed from the LRU regardless of ReleaseNow's return) so Evictions still counts the attempt; EvictionFailures is the subset that failed to close cleanly. This matches how operators expect cumulative counters to compose for rate calculations. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Pool.Touch's eviction previously took the LRU tail unconditionally. Under sustained concurrent load on a small set of hot packs, this caused churn: the LRU tail could be a pinned SharedFile whose FD lingered until refs drained (via the immediateClose latch), then needed reopening on the next Acquire — which itself evicted whoever was at the new tail. The result was extra close+reopen cycles on hot packs without bounding total open FDs more tightly. Add an optional Pinnable interface and a two-pass victim selection in Touch: walk the LRU back-to-front, prefer the LRU-most Member whose Pinned() returns false; fall back to the LRU tail if every Member is pinned. Matches canonical Git's find_lru_pack (packfile.c:482-530). The walk stops before reaching the just-inserted front element so eviction never targets the caller's own Member. SharedFile implements Pinned() as refs > 0. The check takes SharedFile.mu while pool.mu is held, which reverses the "Member→Pool only" lock-acquire rule the existing comment documented. The actual deadlock-avoidance invariant is "Member calls into Pool only after releasing Member.mu" (see Acquire, Close); the new Pool→Member call through Pinned is deadlock-free because no Member call holds s.mu while waiting for p.mu. Update the comment to reflect this. Stats gains PinnedSkips, a cumulative count of pinned Members the walk passed over. Non-zero values surface churn under load. Members that do not implement Pinnable are treated as unpinned, preserving the unconditional-LRU behaviour for any future Member implementation. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

The existing eviction benches assert evictions occurred (or did not) post-loop, but say nothing about Stats.Active relative to capacity. A regression in Touch where the LRU grows past the cap would slip silently past those assertions: evictions still fire, just not enough to keep the working set bounded. Add a post-loop check that Active <= capacity in both benches. Post-loop only — an in-b.Loop() check would distort the timings the bench is trying to measure. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

NewLazyIndexWithPool was exercised only transitively through the storage/filesystem integration tests. A regression in the wiring between LazyIndex's two SharedFiles (idx and rev) and the pool would surface two layers up rather than at the idxfile layer where the change lives, making the failure mode harder to attribute. Add a focused test that constructs two pool-wired LazyIndexes against a real fixture, forces eviction by sizing the pool to exactly one LazyIndex's worth of SharedFiles, and verifies that a read against the evicted LazyIndex reopens its FDs through the stored openers without surfacing an error. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

The existing concurrent ReleaseNow test verifies that Acquire, Release, and ReleaseNow compose without panic or deadlock, but the in-flight body never actually reads from the file. The refcount + immediateClose latch contract guarantees an in-flight read sees a valid FD even if ReleaseNow fires mid-read; a ReleaseNow that closes the FD out from under a reader would slip past tests that only check call composition. Add a stress test where N reader goroutines do Acquire → ReadAt → verify bytes → Release in a loop while M evictor goroutines fire ReleaseNow. A wrong byte slice or a ReadAt error would surface a regression in the latch — the read-back assertion is the load-bearing check, not the absence of panic. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

BenchmarkPackHandleAcquireGrace measures the warm path where the SharedFile's grace timer keeps the .pack FD open across iterations. The cold counterpart — reopen after the grace timer naturally fires — is the workload pattern the grace period is meant to amortize against, but no bench covers it directly. BenchmarkPackHandleAcquireAfterGraceExpiry fills the gap with a wall-clock-accurate measurement: each iteration sleeps past the 1s grace window with the timer paused, so only the subsequent OpenPackReader → ReadAt → Close is counted. The delta against the warm bench is one full pack-SharedFile reopen including the time.AfterFunc-driven close callback. The bench is naturally slow at default -benchtime=1s (each iteration's sleep dominates); the docstring directs users to override via -benchtime=Nx. BenchmarkPackHandleCloseIdleReopen remains the fast proxy for routine runs, fast-forwarding the grace timer via soft-close to keep the cycle synchronous. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Options.MaxOpenDescriptors and Options.Pool overlapped: both shaped the storage-wide FD budget, but only Pool was the actual primitive — the integer knob just picked a capacity for the per-Storage Pool that NewStorageWithOptions constructed when Pool was nil. Carrying both forced the doc to spell out their interaction (zero vs negative, ignored when Pool is non-nil) without giving callers any expressiveness Pool did not already provide. Drop the field. Nil Pool now selects defaultPoolCapacity (256). Disabling pooling becomes Pool: fdpool.New(0) — same no-op-pool branch a negative MaxOpenDescriptors used to take. The disabled- pool test injects the no-op pool explicitly. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

Pool kept an O(1) Member-to-list.Element map for Touch lookups, making Member an implicit map key. The doc said as much but the constraint is invisible at the type system level: a Member whose concrete type is non-comparable panics on first Touch with runtime "hash of unhashable type" rather than a contract error. Internal callers happened to satisfy it (*SharedFile, *PackHandle are both pointer types) but external consumers — fdpool is under x/ — have no compile-time safety net. Introduce a Handle type that owns the per-Member back-reference into the LRU list. Touch and Forget take a *Handle alongside the Member; the Pool reads and writes Handle fields under p.mu and stores both Member and Handle in the list element so eviction can clear the victim's h.elem before invoking ReleaseNow. Clearing under p.mu is load-bearing: without it, a re-Touch on a just-evicted Member would call MoveToFront on the removed element (a no-op in container/list) and leave the Member silently unregistered. Member loses its comparable requirement. failingMember now embeds a slice so the concrete type is non-comparable on purpose; under the old design that test would have panicked at registration. SharedFile carries a poolHandle field and passes its address through to Touch/Forget; the SharedFile→Pool lock-ordering invariant is unchanged. Bench on Apple M4 Pro (-benchtime=10000x): WorkingSetFitsInPool 10.47 ns/op, 0 allocs; Eviction_WorkingSetExceedsPool 59.28 ns/op, 2 allocs (entry + list.Element per insert) — the hot-path allocation profile is preserved. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

HashesWithPrefix iterated s.index without holding muI: a concurrent Reindex swap or PackfileWriter.Notify insert could race with the iteration, producing a Go-level data race on the map's internal structure. Snapshot the LazyIndex pointers under a brief muI.RLock and iterate the snapshot afterwards. The borrowed values stay alive for the duration of the loop via the local slice; the .idx and .rev FDs are reference-counted by sharedfile and governed by the fdpool LRU or grace timer, not by removal from s.index. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

pjbgf

@hiddeco thanks for working on this. 🙇

`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs main): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (added in c828cde), since the single-pack basic fixture cannot demonstrate the scaling- with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>

pjbgf reviewed May 19, 2026

View reviewed changes

Comment thread plumbing/format/idxfile/idxfile.go Outdated

pjbgf reviewed May 19, 2026

View reviewed changes

Comment thread plumbing/format/idxfile/shared_file.go Outdated

pjbgf reviewed May 19, 2026

View reviewed changes

Comment thread plumbing/format/packfile/fsobject.go Outdated

pjbgf reviewed May 19, 2026

View reviewed changes

Comment thread plumbing/format/idxfile/idxfile.go Outdated

hiddeco mentioned this pull request May 20, 2026

plumbing, storage: add CloseIdleDescriptors soft-close primitive #2133

Merged

hiddeco force-pushed the pack-hotpath-fd-pool branch 2 times, most recently from 449013e to b339fec Compare May 21, 2026 00:49

hiddeco marked this pull request as ready for review May 21, 2026 00:49

hiddeco marked this pull request as draft May 21, 2026 00:52

hiddeco force-pushed the pack-hotpath-fd-pool branch 4 times, most recently from f08adb3 to 206c978 Compare May 22, 2026 10:52

hiddeco marked this pull request as ready for review May 22, 2026 11:57

hiddeco mentioned this pull request May 22, 2026

*: speed up multi-pack reads via Index fanout probes #2147

Open

pjbgf reviewed May 26, 2026

View reviewed changes

Comment thread storage/filesystem/dotgit/dotgit.go Outdated

pjbgf reviewed May 26, 2026

View reviewed changes

Comment thread storage/filesystem/object.go Outdated

hiddeco force-pushed the pack-hotpath-fd-pool branch from 206c978 to 365b9d6 Compare May 27, 2026 09:36

hiddeco added 9 commits May 28, 2026 17:11

hiddeco added 12 commits May 28, 2026 17:12

hiddeco force-pushed the pack-hotpath-fd-pool branch from 365b9d6 to cc5e524 Compare May 28, 2026 15:14

pjbgf approved these changes May 29, 2026

View reviewed changes

pjbgf merged commit 1f84f6c into go-git:main May 29, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: pack read optimisations and storage-wide LRU FD pool#2134

storage: pack read optimisations and storage-wide LRU FD pool#2134
pjbgf merged 21 commits into
go-git:mainfrom
hiddeco:pack-hotpath-fd-pool

hiddeco commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pjbgf left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hiddeco commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pjbgf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hiddeco commented May 16, 2026 •

edited

Loading