storage: pack read optimisations and storage-wide LRU FD pool#2134
Merged
Conversation
pjbgf
reviewed
May 19, 2026
pjbgf
reviewed
May 19, 2026
pjbgf
reviewed
May 19, 2026
pjbgf
reviewed
May 19, 2026
449013e to
b339fec
Compare
f08adb3 to
206c978
Compare
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
May 22, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 39.3 µs -96.4% (p=0.002, n=6) B/op: 111.5 KiB → 6.88 KiB -93.8% (p=0.002) allocs/op: 4219 → 114 -97.3% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
May 22, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
May 22, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
pjbgf
reviewed
May 26, 2026
pjbgf
reviewed
May 26, 2026
206c978 to
365b9d6
Compare
PackfileWriter.Notify wrote to s.index without taking muI, even though every other s.index writer in this file (the requireIndex publish, Reindex reset, ResetPackHandles, the pruning paths) takes muI.Lock. The asymmetry is a latent race against concurrent readers that take muI.RLock; under -race a parallel EncodedObject + PackfileWriter workload shows the unlocked write. Wrap the write under muI.Lock so Notify obeys the same locking contract as the other writers. The fix has no effect on the normal write rate: Notify fires once per packfile close, far outside the read hot path. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`muI` guards reads and writes of `s.index`. The earlier `PackfileWriter.Notify` wrap closed the writer side; two read sites still touched `s.index` without taking the lock: `encodedObjectSizeFromPackfile` and the `lazyPackfilesIter` open callback in `buildPackfileIters`. Both are reachable from concurrent `EncodedObjectSize` / `IterEncodedObjects` callers and a racing reader can observe a partially-built map under the lazy-publish path. Wrap each read in a short `RLock`/`RUnlock` window matching the pattern already used in `getFromPackfile`. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
requireIndex serialised cold-load through s.muI.Lock: the
first reader took the write lock, walked every pack
sequentially under loadIdxFile, and only released the lock
once all idx files were materialised. On a repo with N packs
and any concurrent reader herd, every joiner blocked on
the same Lock and paid serial latency.
Coalesce concurrent first-readers via golang.org/x/sync
singleflight. The winning goroutine builds a local map in
populateIndex with errgroup.SetLimit(GOMAXPROCS), so N packs
load in O(N/GOMAXPROCS) instead of O(N), and joiners block
in singleflight.Do until the in-flight call returns. The
winner double-checks s.index inside the singleflight window
for idempotency across back-to-back herds, then publishes
the local map under muI.Lock. A late arrival that built its
own map on the race-loss path closes the local indexes
(LazyIndex implements io.Closer and holds sharedFile
refcounts) before dropping them, so we do not leak
descriptors.
Reindex calls populateIndex synchronously and atomically
swaps in the fresh map, leaving the next read as a hot-cache
hit instead of a deferred cold-load. Previously-cached
LazyIndex entries are not closed: in-flight readers that
borrowed an entry before the swap keep using it, and the
underlying SharedFile FDs close via the grace timer (no-pool
mode) or fdpool LRU eviction (pool mode) once they fall out
of use. This mirrors canonical Git's reprepare model[1]
where packfile_store_reprepare leaves existing packs in
place and only the FD-level LRU evicts. The legacy
`loadIdxFile` path collapses into `loadIdx` + `loadLazyIndex`
/ `loadMemoryIndexValue`; these split helpers make the
parallel loop's per-pack work side-effect-free (callers
install into s.index, the helpers do not).
Tests cover the first-read hot map, the Reindex pre-warm,
and a 32-goroutine herd against a cold ObjectStorage (clean
under -race). Bench surface (Apple M4 Pro, -benchtime=1s):
BenchmarkObjectStorage_ColdEncodedObject ~3.2 ms/op
BenchmarkObjectStorage_ReindexThenRead ~33 us/op
ColdEncodedObject builds a fresh Storage per iteration
against a 32-pack synthetic fixture (helper added in
`multipack_fixture_test.go`); the cost is dominated by
populateIndex parallel-loading every pack's idx, plus a
single packed-object decode. Single-pack fixtures cannot
reproduce the parallelisation win — quantifying it against
serial main requires benchstat against a baseline recording.
The change is internal: no exported API surface moves apart
from Reindex gaining an error return, which the v6 cycle
permits.
[1]: https://github.com/git/git/blob/94f057755b/packfile.c#L1085-L1089
Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`x/fdpool` implements a fixed-capacity LRU cache of resources that own a file descriptor. The pool is used by `storage/filesystem` to bound the number of open `.pack` / `.idx` / `.rev` descriptors across a Storage. Members register on first FD acquire and report subsequent acquires via `Touch`; when the LRU length exceeds the configured capacity, the least-recently-touched member is evicted via `Member.ReleaseNow`. A zero or negative capacity yields a no-op pool: `Touch` and `Forget` do nothing, no member is ever evicted, and `Stats` reports zero capacity. Call sites can therefore wire the pool unconditionally and disable pooling by passing `New(0)`, without special-casing the construction path. The eviction path deliberately drops `Pool.mu` around the call to `Member.ReleaseNow`. The intended wiring takes the member's own mutex during `ReleaseNow`, and the member's normal acquire path takes that mutex and then calls `Pool.Touch`, which would take `Pool.mu`. Holding `Pool.mu` across `ReleaseNow` would establish a Pool→Member ordering that conflicts with the Member→Pool ordering of the acquire path and risks deadlock. Releasing `Pool.mu` for the eviction call keeps Member→Pool as the only direction of lock acquisition. The package lives under `x/` so its API is reachable from external callers — `Options.Pool` consumers such as GitOps controllers that share an FD budget across many short-lived Storages — while still signalling, per the conventions of `x/`, that the surface may change. The package depends only on the standard library; the import dependency runs one way (consumers → fdpool) for any code that needs an LRU FD cache with a `ReleaseNow` member contract. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`SharedFile` now takes an optional `*fdpool.Pool` at construction via the new `NewWithPool` factory; `New` keeps the pool-less behaviour. When a pool is wired, `Acquire` notifies it via `Pool.Touch(s)` after the FD is in hand, which both registers the SharedFile on first open and refreshes its LRU position on every subsequent acquire. The Touch runs outside `s.mu` so the established Member→Pool lock-acquire direction holds, matching the deadlock-avoidance contract documented on [fdpool.Pool]. `Release` skips the grace-period timer when a pool is wired: the FD stays open and registered, and the pool drives the eventual close via [SharedFile.ReleaseNow] on eviction. Without a pool, the existing grace-timer behaviour is unchanged. `Close` calls `Pool.Forget(s)` so a racing eviction cannot observe a freed Member. `SharedFile` already exposes `ReleaseNow`, so the type satisfies [fdpool.Member] without further surface change; the interface itself lives in `x/fdpool`. Pool-integration tests cover `Touch` on every Acquire, FD-quiescence under a wired pool, terminal `Close` -> `Forget` (with and without active refs), and a one-slot pool driving real eviction across two SharedFiles. A `NoPool_ImmediateClose` test locks in the pool-less grace-timer policy alongside, so a future regression that always-pools cannot pass silently. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`NewLazyIndexWithPool` is the pool-aware variant of [NewLazyIndex]: it forwards a `*fdpool.Pool` to both the idx and rev [sharedfile.SharedFile]s so a storage-wide LRU bounds the FD budget held by the lazy index. `NewLazyIndex` remains the pool-less convenience over a `nil` pool. The change is purely additive at the surface; the per-source FD lifecycle lives inside the [sharedfile.SharedFile] primitive. Both shared files receive the same pool. Eviction acts on each SharedFile independently — the pool has no idx-versus-rev affinity — so an active .idx is not forced to keep its .rev FD warm, and vice versa. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`NewWithPool` is the pool-aware variant of [New]: it forwards a `*fdpool.Pool` to the .pack [sharedfile.SharedFile] so a storage-wide LRU bounds the open pack descriptors across many PackHandles. `New` remains the pool-less convenience over a `nil` pool. The change is purely additive at the surface; the per-source FD lifecycle lives inside the [sharedfile.SharedFile] primitive. Only the .pack FD is threaded here. The .idx and .rev FDs are managed by the [idxfile.LazyIndex] that [PackHandle.Index] lazily constructs; storages that want a pool-wired index ride a separate code path (see `storage/filesystem`'s `ObjectStorage.loadLazyIndex`). Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Wire the storage-wide `*fdpool.Pool` from `Storage` through `DotGit` to both pack-FD (`internal/packhandle`) and idx-FD (`plumbing/format/idxfile`) sharedFiles. `Options.MaxOpenDescriptors` now governs the pool capacity: zero selects `defaultMaxOpenDescriptors` (256, ~85 concurrently-hot packs at 3 FDs each), negative values yield a no-op pool so callers can disable pooling without special-casing the construction path. Replaces the previous no-op stub. `Options.Pool` accepts a caller-constructed `*fdpool.Pool` so multiple `Storage` instances in the same process can share one FD budget. Useful for callers that spawn many short-lived storages concurrently (GitOps controllers reconciling N repositories, daemons handling parallel requests). When nil the storage allocates its own pool from `MaxOpenDescriptors`. `dotgit.Options` gains a `Pool` field that the packhandle constructor consumes via `packhandle.NewWithPool`. `ObjectStorage.loadLazyIndex` switches to `idxfile.NewLazyIndexWithPool` and forwards the same pool. `DotGit.Alternates` propagates the parent's pool to alternate dotgits so the storage-wide budget covers alternates too. Tests inject a `*fdpool.Pool` via `Options.Pool` and inspect `pool.Stats()` directly to verify the wiring, mirroring the shape an external caller uses to observe the pool. Coverage spans the alternate inheritance path, pool sharing across multiple storages, and the disabled-pool path. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Three benches characterise the pool wiring at increasing
scope. All numbers Apple M4 Pro, single -benchtime=1s run;
relative shapes are stable but absolute ns/op drift across
runs and benchstat against a baseline is the correct
methodology for quantitative comparisons.
`BenchmarkFDPool_WorkingSetFitsInPool` and
`BenchmarkFDPool_Eviction_WorkingSetExceedsPool` target the
pool directly. The fit-in-pool case (wset=16, cap=64, all
hits) runs at ~12 ns/op with zero allocations; the eviction
case (wset=32, cap=16, every Touch evicts) costs ~46 ns/op
with a single `list.PushFront` allocation per round. These
numbers are the floor — they set a ceiling for the storage-
level cost we can attribute to the pool.
`BenchmarkSharedFileAcquireReleaseWithPool` in
`internal/sharedfile` measures the pool-notify overhead on
the SharedFile hot path. Touch on a warm SharedFile is
allocation-free and adds roughly 7 ns/op over the pool-less
baseline (`BenchmarkSharedFileAcquireRelease` ~7 ns/op vs
the pool-wired variant ~14 ns/op).
`BenchmarkStorage_PoolPressure` exercises the full stack
against a 32-pack synthetic fixture with
`cache.NewObjectLRU(0)` so every read traverses the FD
layer. Two sub-benchmarks compare regimes:
NoChurn (cap=256, no eviction) ~12.5 us/op 5.1 KB/op 88 allocs/op
Churn (cap=8, eviction) ~17.9 us/op 5.3 KB/op 91 allocs/op
The ~43% delta between Churn and NoChurn is the cost of
evicting and reopening pack/idx FDs when the working set
exceeds capacity. Each sub-benchmark injects its own
`*fdpool.Pool` via `Options.Pool` and asserts against
`pool.Stats().Evictions` at the end — NoChurn fails if any
eviction fires, Churn fails if no eviction fires — so the
bench refuses to silently stop exercising the pool.
`BenchmarkObjectStorage_ParallelReads` complements the
storage benches by measuring concurrent EncodedObject
throughput on the RLock'd read path.
Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`BenchmarkAlternatesObjectLookup` constructed its storage via the low-level `NewObjectStorage(dotgit.NewWithOptions(...))` path, which bypasses the FD pool that `NewStorageWithOptions` installs. The preamble noted the goal was to mirror `PlainClone(Shared: true)` behaviour, but the chosen construction shape missed the public API's wiring. Switching to `NewStorageWithOptions` threads the pool through to the alternate's PackHandles and gives the benchmark the same FD lifecycle real consumers see. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Touch's eviction path discards ReleaseNow's error by design (the Member contract states this explicitly). A future reader comparing this to errors.Join in packhandle.doClose may flag the asymmetry as an oversight. Add a sentence to the existing lock-ordering comment block recording the intentional contrast so the pattern reads as a documented choice rather than missing error handling. No behaviour change; documentation only. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
The Options.MaxOpenDescriptors godoc claimed the pool bounds ".pack/.idx/.rev descriptors across the entire Storage", which overstates coverage — pack-write FDs (PackWriter) are short-lived and not pooled. Reword to name the read side explicitly and add a note that the 256 default assumes go-git holds the dominant share of the process's FD budget; callers running alongside other FD-heavy subsystems should size against their full budget. Options.Pool gains a paragraph naming git.Open, git.Clone, and git.Init as the layered entry points for callers who want to share an *fdpool.Pool across Storage instances. The path-based wrappers (git.PlainOpen, git.PlainClone) construct their own Storage internally and so do not accept an injected pool; that is by design (they exist to hide storage internals). Both Options.Pool fields (filesystem and dotgit) gain a stability disclaimer: the field's API stability tracks fdpool.Pool's, not this package's, because fdpool lives under x/ and may change without semver. Consumers reading these fields should treat them as experimental on the same timeline. No behaviour change; documentation only. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Stats already exposes Hits and Evictions; operators monitoring the pool could not distinguish clean evictions from those where Member.ReleaseNow returned a non-nil error. Add EvictionFailures so the failure rate is visible without callers building their own wrapping. The eviction itself still completes (the Member is removed from the LRU regardless of ReleaseNow's return) so Evictions still counts the attempt; EvictionFailures is the subset that failed to close cleanly. This matches how operators expect cumulative counters to compose for rate calculations. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Pool.Touch's eviction previously took the LRU tail unconditionally. Under sustained concurrent load on a small set of hot packs, this caused churn: the LRU tail could be a pinned SharedFile whose FD lingered until refs drained (via the immediateClose latch), then needed reopening on the next Acquire — which itself evicted whoever was at the new tail. The result was extra close+reopen cycles on hot packs without bounding total open FDs more tightly. Add an optional Pinnable interface and a two-pass victim selection in Touch: walk the LRU back-to-front, prefer the LRU-most Member whose Pinned() returns false; fall back to the LRU tail if every Member is pinned. Matches canonical Git's find_lru_pack (packfile.c:482-530). The walk stops before reaching the just-inserted front element so eviction never targets the caller's own Member. SharedFile implements Pinned() as refs > 0. The check takes SharedFile.mu while pool.mu is held, which reverses the "Member→Pool only" lock-acquire rule the existing comment documented. The actual deadlock-avoidance invariant is "Member calls into Pool only after releasing Member.mu" (see Acquire, Close); the new Pool→Member call through Pinned is deadlock-free because no Member call holds s.mu while waiting for p.mu. Update the comment to reflect this. Stats gains PinnedSkips, a cumulative count of pinned Members the walk passed over. Non-zero values surface churn under load. Members that do not implement Pinnable are treated as unpinned, preserving the unconditional-LRU behaviour for any future Member implementation. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
The existing eviction benches assert evictions occurred (or did not) post-loop, but say nothing about Stats.Active relative to capacity. A regression in Touch where the LRU grows past the cap would slip silently past those assertions: evictions still fire, just not enough to keep the working set bounded. Add a post-loop check that Active <= capacity in both benches. Post-loop only — an in-b.Loop() check would distort the timings the bench is trying to measure. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
NewLazyIndexWithPool was exercised only transitively through the storage/filesystem integration tests. A regression in the wiring between LazyIndex's two SharedFiles (idx and rev) and the pool would surface two layers up rather than at the idxfile layer where the change lives, making the failure mode harder to attribute. Add a focused test that constructs two pool-wired LazyIndexes against a real fixture, forces eviction by sizing the pool to exactly one LazyIndex's worth of SharedFiles, and verifies that a read against the evicted LazyIndex reopens its FDs through the stored openers without surfacing an error. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
The existing concurrent ReleaseNow test verifies that Acquire, Release, and ReleaseNow compose without panic or deadlock, but the in-flight body never actually reads from the file. The refcount + immediateClose latch contract guarantees an in-flight read sees a valid FD even if ReleaseNow fires mid-read; a ReleaseNow that closes the FD out from under a reader would slip past tests that only check call composition. Add a stress test where N reader goroutines do Acquire → ReadAt → verify bytes → Release in a loop while M evictor goroutines fire ReleaseNow. A wrong byte slice or a ReadAt error would surface a regression in the latch — the read-back assertion is the load-bearing check, not the absence of panic. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
BenchmarkPackHandleAcquireGrace measures the warm path where the SharedFile's grace timer keeps the .pack FD open across iterations. The cold counterpart — reopen after the grace timer naturally fires — is the workload pattern the grace period is meant to amortize against, but no bench covers it directly. BenchmarkPackHandleAcquireAfterGraceExpiry fills the gap with a wall-clock-accurate measurement: each iteration sleeps past the 1s grace window with the timer paused, so only the subsequent OpenPackReader → ReadAt → Close is counted. The delta against the warm bench is one full pack-SharedFile reopen including the time.AfterFunc-driven close callback. The bench is naturally slow at default -benchtime=1s (each iteration's sleep dominates); the docstring directs users to override via -benchtime=Nx. BenchmarkPackHandleCloseIdleReopen remains the fast proxy for routine runs, fast-forwarding the grace timer via soft-close to keep the cycle synchronous. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Options.MaxOpenDescriptors and Options.Pool overlapped: both shaped the storage-wide FD budget, but only Pool was the actual primitive — the integer knob just picked a capacity for the per-Storage Pool that NewStorageWithOptions constructed when Pool was nil. Carrying both forced the doc to spell out their interaction (zero vs negative, ignored when Pool is non-nil) without giving callers any expressiveness Pool did not already provide. Drop the field. Nil Pool now selects defaultPoolCapacity (256). Disabling pooling becomes Pool: fdpool.New(0) — same no-op-pool branch a negative MaxOpenDescriptors used to take. The disabled- pool test injects the no-op pool explicitly. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Pool kept an O(1) Member-to-list.Element map for Touch lookups, making Member an implicit map key. The doc said as much but the constraint is invisible at the type system level: a Member whose concrete type is non-comparable panics on first Touch with runtime "hash of unhashable type" rather than a contract error. Internal callers happened to satisfy it (*SharedFile, *PackHandle are both pointer types) but external consumers — fdpool is under x/ — have no compile-time safety net. Introduce a Handle type that owns the per-Member back-reference into the LRU list. Touch and Forget take a *Handle alongside the Member; the Pool reads and writes Handle fields under p.mu and stores both Member and Handle in the list element so eviction can clear the victim's h.elem before invoking ReleaseNow. Clearing under p.mu is load-bearing: without it, a re-Touch on a just-evicted Member would call MoveToFront on the removed element (a no-op in container/list) and leave the Member silently unregistered. Member loses its comparable requirement. failingMember now embeds a slice so the concrete type is non-comparable on purpose; under the old design that test would have panicked at registration. SharedFile carries a poolHandle field and passes its address through to Touch/Forget; the SharedFile→Pool lock-ordering invariant is unchanged. Bench on Apple M4 Pro (-benchtime=10000x): WorkingSetFitsInPool 10.47 ns/op, 0 allocs; Eviction_WorkingSetExceedsPool 59.28 ns/op, 2 allocs (entry + list.Element per insert) — the hot-path allocation profile is preserved. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
HashesWithPrefix iterated s.index without holding muI: a concurrent Reindex swap or PackfileWriter.Notify insert could race with the iteration, producing a Go-level data race on the map's internal structure. Snapshot the LazyIndex pointers under a brief muI.RLock and iterate the snapshot afterwards. The borrowed values stay alive for the duration of the loop via the local slice; the .idx and .rev FDs are reference-counted by sharedfile and governed by the fdpool LRU or grace timer, not by removal from s.index. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
365b9d6 to
cc5e524
Compare
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
Jun 5, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). This is a breaking change to the public `Index` interface for external implementers, which v6 permits. `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs pack-hotpath-fd-pool baseline): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (introduced by pack-hotpath-fd-pool's parallelise-requireIndex commit), since the single-pack basic fixture cannot demonstrate the scaling-with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
Jun 5, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs main): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (added in c828cde), since the single-pack basic fixture cannot demonstrate the scaling- with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
Jun 9, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs main): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (added in c828cde), since the single-pack basic fixture cannot demonstrate the scaling- with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco
added a commit
to hiddeco/go-git
that referenced
this pull request
Jun 9, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index entries whose hashes start with `prefix`. Implementations use the fanout table to bound the search to a single bucket when `len(prefix) >= 1`; an empty prefix is equivalent to `Entries()`. For multi-byte prefixes the matching entries form a contiguous run somewhere in the middle of the fanout bucket (the bucket is sorted by full hash, so entries with second byte < `prefix[1]` precede the run and entries with second byte > `prefix[1]` follow it). The iterator binary-searches within the bucket for the leftmost entry whose hash is >= `prefix` padded with zeros, and then walks forward with a stop-on-first-mismatch sentinel. This mirrors upstream Git's `for_each_prefixed_object_in_pack`[1], which calls `bsearch_pack` to position before walking forward — without the bsearch step, a stop-on-first-mismatch walk from the bucket start would silently terminate before reaching the match-run whenever some entry with a smaller second byte precedes it. Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix` to call `EntriesWithPrefix(prefix)` and drop the now- redundant per-entry prefix check. The new version also snapshots `s.index` under `muI.RLock` and iterates the snapshot without holding the lock, avoiding any RLock retention across per-iterator I/O (`LazyIndex` iterators take their own sharedFile locks on `acquire`). `idxfilePrefixIter` holds only the cursor state plus the per-bucket Names / Offset32 / CRC32 slices and the shared Offset64 table — not a reference to the parent `MemoryIndex`. Offset and CRC32 lookups inline the computation that `MemoryIndex.getOffset` / `MemoryIndex.getCRC32` would perform, so the iterator's lifetime no longer pins the entire `MemoryIndex` heap footprint (addresses go-git#2134 review comment). `BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack synthetic fixture with 32 objects per pack, two-byte prefix lookup that walks every pack, count=6 -benchtime=1s, benchstat vs main): sec/op: 1093.5 µs → 32.05 µs -97.07% (p=0.002, n=6) B/op: 111.5 KiB → 6.44 KiB -94.22% (p=0.002) allocs/op: 4219 → 95 -97.75% (p=0.002) The bench also reshapes `BenchmarkFindObjectInPackfile_*` and `BenchmarkObjectStorage_*` to run against the `makeMultiPackFixture` helper (added in c828cde), since the single-pack basic fixture cannot demonstrate the scaling- with-N wins that the read-path optimisations deliver. [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <hidde@hhh.computer>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR primarily addresses resource hygiene and concurrent correctness: bounded FDs across a Storage (and across multiple Storages via
Options.Pool),*fdpool.Pool.Stats()for observability, and fours.indexrace fixes. Read-path performance changes are minimal — measured againstmainwithbenchstat (-count=6, -benchtime=1s):