Skip to content

storage: pack read optimisations and storage-wide LRU FD pool#2134

Merged
pjbgf merged 21 commits into
go-git:mainfrom
hiddeco:pack-hotpath-fd-pool
May 29, 2026
Merged

storage: pack read optimisations and storage-wide LRU FD pool#2134
pjbgf merged 21 commits into
go-git:mainfrom
hiddeco:pack-hotpath-fd-pool

Conversation

@hiddeco

@hiddeco hiddeco commented May 16, 2026

Copy link
Copy Markdown
Member

This PR primarily addresses resource hygiene and concurrent correctness: bounded FDs across a Storage (and across multiple Storages via Options.Pool), *fdpool.Pool.Stats() for observability, and four s.index race fixes. Read-path performance changes are minimal — measured against main with benchstat (-count=6, -benchtime=1s):

  • Allocations reduced 14% geomean, up to 47% on hot read paths (ParallelReads, AlternatesObjectLookup/EncodedObject). Lower GC pressure.
  • Alternates lookup latency -59% (with variance caveats; replication recommended).
  • Flat throughput on the primary read path (ParallelReads).
  • Single-pack ReindexThenRead is +14% slower (cost of singleflight ceremony when there's no parallelisation benefit to capture).

Comment thread plumbing/format/idxfile/idxfile.go Outdated
Comment thread plumbing/format/idxfile/shared_file.go Outdated
Comment thread plumbing/format/packfile/fsobject.go Outdated
Comment thread plumbing/format/idxfile/idxfile.go Outdated
@hiddeco hiddeco force-pushed the pack-hotpath-fd-pool branch 2 times, most recently from 449013e to b339fec Compare May 21, 2026 00:49
@hiddeco hiddeco marked this pull request as ready for review May 21, 2026 00:49
@hiddeco hiddeco marked this pull request as draft May 21, 2026 00:52
@hiddeco hiddeco force-pushed the pack-hotpath-fd-pool branch 4 times, most recently from f08adb3 to 206c978 Compare May 22, 2026 10:52
@hiddeco hiddeco marked this pull request as ready for review May 22, 2026 11:57
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 22, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

This is a breaking change to the public `Index` interface
for external implementers, which v6 permits.

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs pack-hotpath-fd-pool baseline):

    sec/op:    1093.5 µs →  39.3 µs   -96.4%  (p=0.002, n=6)
    B/op:      111.5 KiB →  6.88 KiB  -93.8%  (p=0.002)
    allocs/op: 4219      → 114         -97.3%  (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (introduced by
pack-hotpath-fd-pool's parallelise-requireIndex commit),
since the single-pack basic fixture cannot demonstrate the
scaling-with-N wins that the read-path optimisations
deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 22, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

This is a breaking change to the public `Index` interface
for external implementers, which v6 permits.

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs pack-hotpath-fd-pool baseline):

    sec/op:    1093.5 µs → 32.05 µs  -97.07% (p=0.002, n=6)
    B/op:      111.5 KiB →  6.44 KiB -94.22% (p=0.002)
    allocs/op: 4219      →    95     -97.75% (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (introduced by
pack-hotpath-fd-pool's parallelise-requireIndex commit),
since the single-pack basic fixture cannot demonstrate the
scaling-with-N wins that the read-path optimisations
deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 22, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

This is a breaking change to the public `Index` interface
for external implementers, which v6 permits.

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs pack-hotpath-fd-pool baseline):

    sec/op:    1093.5 µs → 32.05 µs  -97.07% (p=0.002, n=6)
    B/op:      111.5 KiB →  6.44 KiB -94.22% (p=0.002)
    allocs/op: 4219      →    95     -97.75% (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (introduced by
pack-hotpath-fd-pool's parallelise-requireIndex commit),
since the single-pack basic fixture cannot demonstrate the
scaling-with-N wins that the read-path optimisations
deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Comment thread storage/filesystem/dotgit/dotgit.go Outdated
Comment thread storage/filesystem/object.go Outdated
@hiddeco hiddeco force-pushed the pack-hotpath-fd-pool branch from 206c978 to 365b9d6 Compare May 27, 2026 09:36
hiddeco added 9 commits May 28, 2026 17:11
PackfileWriter.Notify wrote to s.index without taking muI,
even though every other s.index writer in this file (the
requireIndex publish, Reindex reset, ResetPackHandles, the
pruning paths) takes muI.Lock. The asymmetry is a latent
race against concurrent readers that take muI.RLock; under
-race a parallel EncodedObject + PackfileWriter workload
shows the unlocked write.

Wrap the write under muI.Lock so Notify obeys the same
locking contract as the other writers. The fix has no effect
on the normal write rate: Notify fires once per packfile
close, far outside the read hot path.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`muI` guards reads and writes of `s.index`. The earlier
`PackfileWriter.Notify` wrap closed the writer side; two read
sites still touched `s.index` without taking the lock:
`encodedObjectSizeFromPackfile` and the `lazyPackfilesIter`
open callback in `buildPackfileIters`. Both are reachable
from concurrent `EncodedObjectSize` / `IterEncodedObjects`
callers and a racing reader can observe a partially-built
map under the lazy-publish path.

Wrap each read in a short `RLock`/`RUnlock` window matching
the pattern already used in `getFromPackfile`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
requireIndex serialised cold-load through s.muI.Lock: the
first reader took the write lock, walked every pack
sequentially under loadIdxFile, and only released the lock
once all idx files were materialised. On a repo with N packs
and any concurrent reader herd, every joiner blocked on
the same Lock and paid serial latency.

Coalesce concurrent first-readers via golang.org/x/sync
singleflight. The winning goroutine builds a local map in
populateIndex with errgroup.SetLimit(GOMAXPROCS), so N packs
load in O(N/GOMAXPROCS) instead of O(N), and joiners block
in singleflight.Do until the in-flight call returns. The
winner double-checks s.index inside the singleflight window
for idempotency across back-to-back herds, then publishes
the local map under muI.Lock. A late arrival that built its
own map on the race-loss path closes the local indexes
(LazyIndex implements io.Closer and holds sharedFile
refcounts) before dropping them, so we do not leak
descriptors.

Reindex calls populateIndex synchronously and atomically
swaps in the fresh map, leaving the next read as a hot-cache
hit instead of a deferred cold-load. Previously-cached
LazyIndex entries are not closed: in-flight readers that
borrowed an entry before the swap keep using it, and the
underlying SharedFile FDs close via the grace timer (no-pool
mode) or fdpool LRU eviction (pool mode) once they fall out
of use. This mirrors canonical Git's reprepare model[1]
where packfile_store_reprepare leaves existing packs in
place and only the FD-level LRU evicts. The legacy
`loadIdxFile` path collapses into `loadIdx` + `loadLazyIndex`
/ `loadMemoryIndexValue`; these split helpers make the
parallel loop's per-pack work side-effect-free (callers
install into s.index, the helpers do not).

Tests cover the first-read hot map, the Reindex pre-warm,
and a 32-goroutine herd against a cold ObjectStorage (clean
under -race). Bench surface (Apple M4 Pro, -benchtime=1s):

    BenchmarkObjectStorage_ColdEncodedObject  ~3.2 ms/op
    BenchmarkObjectStorage_ReindexThenRead     ~33 us/op

ColdEncodedObject builds a fresh Storage per iteration
against a 32-pack synthetic fixture (helper added in
`multipack_fixture_test.go`); the cost is dominated by
populateIndex parallel-loading every pack's idx, plus a
single packed-object decode. Single-pack fixtures cannot
reproduce the parallelisation win — quantifying it against
serial main requires benchstat against a baseline recording.

The change is internal: no exported API surface moves apart
from Reindex gaining an error return, which the v6 cycle
permits.

[1]: https://github.com/git/git/blob/94f057755b/packfile.c#L1085-L1089

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`x/fdpool` implements a fixed-capacity LRU cache of resources
that own a file descriptor. The pool is used by
`storage/filesystem` to bound the number of open
`.pack` / `.idx` / `.rev` descriptors across a Storage.
Members register on first FD acquire and report subsequent
acquires via `Touch`; when the LRU length exceeds the
configured capacity, the least-recently-touched member is
evicted via `Member.ReleaseNow`.

A zero or negative capacity yields a no-op pool: `Touch` and
`Forget` do nothing, no member is ever evicted, and `Stats`
reports zero capacity. Call sites can therefore wire the pool
unconditionally and disable pooling by passing `New(0)`,
without special-casing the construction path.

The eviction path deliberately drops `Pool.mu` around the call
to `Member.ReleaseNow`. The intended wiring takes the member's
own mutex during `ReleaseNow`, and the member's normal acquire
path takes that mutex and then calls `Pool.Touch`, which would
take `Pool.mu`. Holding `Pool.mu` across `ReleaseNow` would
establish a Pool→Member ordering that conflicts with the
Member→Pool ordering of the acquire path and risks deadlock.
Releasing `Pool.mu` for the eviction call keeps Member→Pool as
the only direction of lock acquisition.

The package lives under `x/` so its API is reachable from
external callers — `Options.Pool` consumers such as GitOps
controllers that share an FD budget across many short-lived
Storages — while still signalling, per the conventions of
`x/`, that the surface may change. The package depends only
on the standard library; the import dependency runs one way
(consumers → fdpool) for any code that needs an LRU FD cache
with a `ReleaseNow` member contract.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`SharedFile` now takes an optional `*fdpool.Pool` at construction
via the new `NewWithPool` factory; `New` keeps the pool-less
behaviour. When a pool is wired, `Acquire` notifies it via
`Pool.Touch(s)` after the FD is in hand, which both registers the
SharedFile on first open and refreshes its LRU position on every
subsequent acquire. The Touch runs outside `s.mu` so the
established Member→Pool lock-acquire direction holds, matching
the deadlock-avoidance contract documented on [fdpool.Pool].

`Release` skips the grace-period timer when a pool is wired: the
FD stays open and registered, and the pool drives the eventual
close via [SharedFile.ReleaseNow] on eviction. Without a pool,
the existing grace-timer behaviour is unchanged.

`Close` calls `Pool.Forget(s)` so a racing eviction cannot
observe a freed Member. `SharedFile` already exposes
`ReleaseNow`, so the type satisfies [fdpool.Member] without
further surface change; the interface itself lives in `x/fdpool`.

Pool-integration tests cover `Touch` on every Acquire,
FD-quiescence under a wired pool, terminal `Close` -> `Forget`
(with and without active refs), and a one-slot pool driving real
eviction across two SharedFiles. A `NoPool_ImmediateClose` test
locks in the pool-less grace-timer policy alongside, so a future
regression that always-pools cannot pass silently.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`NewLazyIndexWithPool` is the pool-aware variant of
[NewLazyIndex]: it forwards a `*fdpool.Pool` to both the idx and
rev [sharedfile.SharedFile]s so a storage-wide LRU bounds the
FD budget held by the lazy index. `NewLazyIndex` remains the
pool-less convenience over a `nil` pool. The change is purely
additive at the surface; the per-source FD lifecycle lives
inside the [sharedfile.SharedFile] primitive.

Both shared files receive the same pool. Eviction acts on each
SharedFile independently — the pool has no idx-versus-rev
affinity — so an active .idx is not forced to keep its .rev FD
warm, and vice versa.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
`NewWithPool` is the pool-aware variant of [New]: it forwards a
`*fdpool.Pool` to the .pack [sharedfile.SharedFile] so a
storage-wide LRU bounds the open pack descriptors across many
PackHandles. `New` remains the pool-less convenience over a
`nil` pool. The change is purely additive at the surface; the
per-source FD lifecycle lives inside the
[sharedfile.SharedFile] primitive.

Only the .pack FD is threaded here. The .idx and .rev FDs are
managed by the [idxfile.LazyIndex] that
[PackHandle.Index] lazily constructs; storages that want a
pool-wired index ride a separate code path (see
`storage/filesystem`'s `ObjectStorage.loadLazyIndex`).

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Wire the storage-wide `*fdpool.Pool` from `Storage` through
`DotGit` to both pack-FD (`internal/packhandle`) and idx-FD
(`plumbing/format/idxfile`) sharedFiles.

`Options.MaxOpenDescriptors` now governs the pool capacity:
zero selects `defaultMaxOpenDescriptors` (256, ~85
concurrently-hot packs at 3 FDs each), negative values yield
a no-op pool so callers can disable pooling without
special-casing the construction path. Replaces the previous
no-op stub.

`Options.Pool` accepts a caller-constructed `*fdpool.Pool`
so multiple `Storage` instances in the same process can
share one FD budget. Useful for callers that spawn many
short-lived storages concurrently (GitOps controllers
reconciling N repositories, daemons handling parallel
requests). When nil the storage allocates its own pool from
`MaxOpenDescriptors`.

`dotgit.Options` gains a `Pool` field that the packhandle
constructor consumes via `packhandle.NewWithPool`.
`ObjectStorage.loadLazyIndex` switches to
`idxfile.NewLazyIndexWithPool` and forwards the same pool.
`DotGit.Alternates` propagates the parent's pool to alternate
dotgits so the storage-wide budget covers alternates too.

Tests inject a `*fdpool.Pool` via `Options.Pool` and inspect
`pool.Stats()` directly to verify the wiring, mirroring the
shape an external caller uses to observe the pool. Coverage
spans the alternate inheritance path, pool sharing across
multiple storages, and the disabled-pool path.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Three benches characterise the pool wiring at increasing
scope. All numbers Apple M4 Pro, single -benchtime=1s run;
relative shapes are stable but absolute ns/op drift across
runs and benchstat against a baseline is the correct
methodology for quantitative comparisons.

`BenchmarkFDPool_WorkingSetFitsInPool` and
`BenchmarkFDPool_Eviction_WorkingSetExceedsPool` target the
pool directly. The fit-in-pool case (wset=16, cap=64, all
hits) runs at ~12 ns/op with zero allocations; the eviction
case (wset=32, cap=16, every Touch evicts) costs ~46 ns/op
with a single `list.PushFront` allocation per round. These
numbers are the floor — they set a ceiling for the storage-
level cost we can attribute to the pool.

`BenchmarkSharedFileAcquireReleaseWithPool` in
`internal/sharedfile` measures the pool-notify overhead on
the SharedFile hot path. Touch on a warm SharedFile is
allocation-free and adds roughly 7 ns/op over the pool-less
baseline (`BenchmarkSharedFileAcquireRelease` ~7 ns/op vs
the pool-wired variant ~14 ns/op).

`BenchmarkStorage_PoolPressure` exercises the full stack
against a 32-pack synthetic fixture with
`cache.NewObjectLRU(0)` so every read traverses the FD
layer. Two sub-benchmarks compare regimes:

    NoChurn (cap=256, no eviction)  ~12.5 us/op  5.1 KB/op  88 allocs/op
    Churn   (cap=8,  eviction)      ~17.9 us/op  5.3 KB/op  91 allocs/op

The ~43% delta between Churn and NoChurn is the cost of
evicting and reopening pack/idx FDs when the working set
exceeds capacity. Each sub-benchmark injects its own
`*fdpool.Pool` via `Options.Pool` and asserts against
`pool.Stats().Evictions` at the end — NoChurn fails if any
eviction fires, Churn fails if no eviction fires — so the
bench refuses to silently stop exercising the pool.

`BenchmarkObjectStorage_ParallelReads` complements the
storage benches by measuring concurrent EncodedObject
throughput on the RLock'd read path.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco added 12 commits May 28, 2026 17:12
`BenchmarkAlternatesObjectLookup` constructed its storage via
the low-level `NewObjectStorage(dotgit.NewWithOptions(...))`
path, which bypasses the FD pool that `NewStorageWithOptions`
installs. The preamble noted the goal was to mirror
`PlainClone(Shared: true)` behaviour, but the chosen
construction shape missed the public API's wiring.

Switching to `NewStorageWithOptions` threads the pool through
to the alternate's PackHandles and gives the benchmark the
same FD lifecycle real consumers see.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Touch's eviction path discards ReleaseNow's error by design (the
Member contract states this explicitly). A future reader comparing
this to errors.Join in packhandle.doClose may flag the asymmetry
as an oversight. Add a sentence to the existing lock-ordering
comment block recording the intentional contrast so the pattern
reads as a documented choice rather than missing error handling.

No behaviour change; documentation only.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
The Options.MaxOpenDescriptors godoc claimed the pool bounds
".pack/.idx/.rev descriptors across the entire Storage", which
overstates coverage — pack-write FDs (PackWriter) are short-lived
and not pooled. Reword to name the read side explicitly and add a
note that the 256 default assumes go-git holds the dominant share
of the process's FD budget; callers running alongside other
FD-heavy subsystems should size against their full budget.

Options.Pool gains a paragraph naming git.Open, git.Clone, and
git.Init as the layered entry points for callers who want to
share an *fdpool.Pool across Storage instances. The path-based
wrappers (git.PlainOpen, git.PlainClone) construct their own
Storage internally and so do not accept an injected pool; that
is by design (they exist to hide storage internals).

Both Options.Pool fields (filesystem and dotgit) gain a stability
disclaimer: the field's API stability tracks fdpool.Pool's, not
this package's, because fdpool lives under x/ and may change
without semver. Consumers reading these fields should treat them
as experimental on the same timeline.

No behaviour change; documentation only.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Stats already exposes Hits and Evictions; operators monitoring
the pool could not distinguish clean evictions from those where
Member.ReleaseNow returned a non-nil error. Add EvictionFailures
so the failure rate is visible without callers building their
own wrapping.

The eviction itself still completes (the Member is removed from
the LRU regardless of ReleaseNow's return) so Evictions still
counts the attempt; EvictionFailures is the subset that failed
to close cleanly. This matches how operators expect cumulative
counters to compose for rate calculations.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Pool.Touch's eviction previously took the LRU tail unconditionally.
Under sustained concurrent load on a small set of hot packs, this
caused churn: the LRU tail could be a pinned SharedFile whose FD
lingered until refs drained (via the immediateClose latch), then
needed reopening on the next Acquire — which itself evicted
whoever was at the new tail. The result was extra close+reopen
cycles on hot packs without bounding total open FDs more tightly.

Add an optional Pinnable interface and a two-pass victim
selection in Touch: walk the LRU back-to-front, prefer the
LRU-most Member whose Pinned() returns false; fall back to the
LRU tail if every Member is pinned. Matches canonical Git's
find_lru_pack (packfile.c:482-530). The walk stops before
reaching the just-inserted front element so eviction never
targets the caller's own Member.

SharedFile implements Pinned() as refs > 0. The check takes
SharedFile.mu while pool.mu is held, which reverses the
"Member→Pool only" lock-acquire rule the existing comment
documented. The actual deadlock-avoidance invariant is "Member
calls into Pool only after releasing Member.mu" (see Acquire,
Close); the new Pool→Member call through Pinned is deadlock-free
because no Member call holds s.mu while waiting for p.mu.
Update the comment to reflect this.

Stats gains PinnedSkips, a cumulative count of pinned Members
the walk passed over. Non-zero values surface churn under load.
Members that do not implement Pinnable are treated as unpinned,
preserving the unconditional-LRU behaviour for any future
Member implementation.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
The existing eviction benches assert evictions occurred (or did
not) post-loop, but say nothing about Stats.Active relative to
capacity. A regression in Touch where the LRU grows past the cap
would slip silently past those assertions: evictions still fire,
just not enough to keep the working set bounded.

Add a post-loop check that Active <= capacity in both benches.
Post-loop only — an in-b.Loop() check would distort the
timings the bench is trying to measure.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
NewLazyIndexWithPool was exercised only transitively through the
storage/filesystem integration tests. A regression in the wiring
between LazyIndex's two SharedFiles (idx and rev) and the pool
would surface two layers up rather than at the idxfile layer
where the change lives, making the failure mode harder to
attribute.

Add a focused test that constructs two pool-wired LazyIndexes
against a real fixture, forces eviction by sizing the pool to
exactly one LazyIndex's worth of SharedFiles, and verifies that
a read against the evicted LazyIndex reopens its FDs through the
stored openers without surfacing an error.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
The existing concurrent ReleaseNow test verifies that Acquire,
Release, and ReleaseNow compose without panic or deadlock, but
the in-flight body never actually reads from the file. The
refcount + immediateClose latch contract guarantees an in-flight
read sees a valid FD even if ReleaseNow fires mid-read; a
ReleaseNow that closes the FD out from under a reader would slip
past tests that only check call composition.

Add a stress test where N reader goroutines do Acquire → ReadAt
→ verify bytes → Release in a loop while M evictor goroutines
fire ReleaseNow. A wrong byte slice or a ReadAt error would
surface a regression in the latch — the read-back assertion is
the load-bearing check, not the absence of panic.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
BenchmarkPackHandleAcquireGrace measures the warm path where
the SharedFile's grace timer keeps the .pack FD open across
iterations. The cold counterpart — reopen after the grace timer
naturally fires — is the workload pattern the grace period is
meant to amortize against, but no bench covers it directly.

BenchmarkPackHandleAcquireAfterGraceExpiry fills the gap with a
wall-clock-accurate measurement: each iteration sleeps past the
1s grace window with the timer paused, so only the subsequent
OpenPackReader → ReadAt → Close is counted. The delta against
the warm bench is one full pack-SharedFile reopen including the
time.AfterFunc-driven close callback.

The bench is naturally slow at default -benchtime=1s (each
iteration's sleep dominates); the docstring directs users to
override via -benchtime=Nx. BenchmarkPackHandleCloseIdleReopen
remains the fast proxy for routine runs, fast-forwarding the
grace timer via soft-close to keep the cycle synchronous.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Options.MaxOpenDescriptors and Options.Pool overlapped: both
shaped the storage-wide FD budget, but only Pool was the actual
primitive — the integer knob just picked a capacity for the
per-Storage Pool that NewStorageWithOptions constructed when Pool
was nil. Carrying both forced the doc to spell out their
interaction (zero vs negative, ignored when Pool is non-nil)
without giving callers any expressiveness Pool did not already
provide.

Drop the field. Nil Pool now selects defaultPoolCapacity (256).
Disabling pooling becomes Pool: fdpool.New(0) — same no-op-pool
branch a negative MaxOpenDescriptors used to take. The disabled-
pool test injects the no-op pool explicitly.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Pool kept an O(1) Member-to-list.Element map for Touch lookups,
making Member an implicit map key. The doc said as much but the
constraint is invisible at the type system level: a Member whose
concrete type is non-comparable panics on first Touch with
runtime "hash of unhashable type" rather than a contract error.
Internal callers happened to satisfy it (*SharedFile, *PackHandle
are both pointer types) but external consumers — fdpool is under
x/ — have no compile-time safety net.

Introduce a Handle type that owns the per-Member back-reference
into the LRU list. Touch and Forget take a *Handle alongside the
Member; the Pool reads and writes Handle fields under p.mu and
stores both Member and Handle in the list element so eviction
can clear the victim's h.elem before invoking ReleaseNow.
Clearing under p.mu is load-bearing: without it, a re-Touch on a
just-evicted Member would call MoveToFront on the removed
element (a no-op in container/list) and leave the Member
silently unregistered.

Member loses its comparable requirement. failingMember now
embeds a slice so the concrete type is non-comparable on
purpose; under the old design that test would have panicked at
registration. SharedFile carries a poolHandle field and passes
its address through to Touch/Forget; the SharedFile→Pool
lock-ordering invariant is unchanged.

Bench on Apple M4 Pro (-benchtime=10000x): WorkingSetFitsInPool
10.47 ns/op, 0 allocs; Eviction_WorkingSetExceedsPool 59.28
ns/op, 2 allocs (entry + list.Element per insert) — the hot-path
allocation profile is preserved.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
HashesWithPrefix iterated s.index without holding muI: a
concurrent Reindex swap or PackfileWriter.Notify insert
could race with the iteration, producing a Go-level data
race on the map's internal structure.

Snapshot the LazyIndex pointers under a brief muI.RLock and
iterate the snapshot afterwards. The borrowed values stay
alive for the duration of the loop via the local slice; the
.idx and .rev FDs are reference-counted by sharedfile and
governed by the fdpool LRU or grace timer, not by removal
from s.index.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
@hiddeco hiddeco force-pushed the pack-hotpath-fd-pool branch from 365b9d6 to cc5e524 Compare May 28, 2026 15:14

@pjbgf pjbgf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiddeco thanks for working on this. 🙇

@pjbgf pjbgf merged commit 1f84f6c into go-git:main May 29, 2026
17 checks passed
hiddeco added a commit to hiddeco/go-git that referenced this pull request Jun 5, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

This is a breaking change to the public `Index` interface
for external implementers, which v6 permits.

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs pack-hotpath-fd-pool baseline):

    sec/op:    1093.5 µs → 32.05 µs  -97.07% (p=0.002, n=6)
    B/op:      111.5 KiB →  6.44 KiB -94.22% (p=0.002)
    allocs/op: 4219      →    95     -97.75% (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (introduced by
pack-hotpath-fd-pool's parallelise-requireIndex commit),
since the single-pack basic fixture cannot demonstrate the
scaling-with-N wins that the read-path optimisations
deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco added a commit to hiddeco/go-git that referenced this pull request Jun 5, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs main):

    sec/op:    1093.5 µs → 32.05 µs  -97.07% (p=0.002, n=6)
    B/op:      111.5 KiB →  6.44 KiB -94.22% (p=0.002)
    allocs/op: 4219      →    95     -97.75% (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (added in c828cde), since the
single-pack basic fixture cannot demonstrate the scaling-
with-N wins that the read-path optimisations deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco added a commit to hiddeco/go-git that referenced this pull request Jun 9, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs main):

    sec/op:    1093.5 µs → 32.05 µs  -97.07% (p=0.002, n=6)
    B/op:      111.5 KiB →  6.44 KiB -94.22% (p=0.002)
    allocs/op: 4219      →    95     -97.75% (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (added in c828cde), since the
single-pack basic fixture cannot demonstrate the scaling-
with-N wins that the read-path optimisations deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
hiddeco added a commit to hiddeco/go-git that referenced this pull request Jun 9, 2026
`EntriesWithPrefix(prefix)` returns an iterator over index
entries whose hashes start with `prefix`. Implementations
use the fanout table to bound the search to a single bucket
when `len(prefix) >= 1`; an empty prefix is equivalent to
`Entries()`.

For multi-byte prefixes the matching entries form a
contiguous run somewhere in the middle of the fanout bucket
(the bucket is sorted by full hash, so entries with second
byte < `prefix[1]` precede the run and entries with second
byte > `prefix[1]` follow it). The iterator binary-searches
within the bucket for the leftmost entry whose hash is >=
`prefix` padded with zeros, and then walks forward with a
stop-on-first-mismatch sentinel. This mirrors upstream Git's
`for_each_prefixed_object_in_pack`[1], which calls
`bsearch_pack` to position before walking forward — without
the bsearch step, a stop-on-first-mismatch walk from the
bucket start would silently terminate before reaching the
match-run whenever some entry with a smaller second byte
precedes it.

Rewire `storage/filesystem.ObjectStorage.HashesWithPrefix`
to call `EntriesWithPrefix(prefix)` and drop the now-
redundant per-entry prefix check. The new version also
snapshots `s.index` under `muI.RLock` and iterates the
snapshot without holding the lock, avoiding any RLock
retention across per-iterator I/O (`LazyIndex` iterators
take their own sharedFile locks on `acquire`).

`idxfilePrefixIter` holds only the cursor state plus the
per-bucket Names / Offset32 / CRC32 slices and the shared
Offset64 table — not a reference to the parent
`MemoryIndex`. Offset and CRC32 lookups inline the
computation that `MemoryIndex.getOffset` /
`MemoryIndex.getCRC32` would perform, so the iterator's
lifetime no longer pins the entire `MemoryIndex` heap
footprint (addresses go-git#2134 review comment).

`BenchmarkHashesWithPrefix_TwoByte` (Apple M4 Pro, 32-pack
synthetic fixture with 32 objects per pack, two-byte prefix
lookup that walks every pack, count=6 -benchtime=1s,
benchstat vs main):

    sec/op:    1093.5 µs → 32.05 µs  -97.07% (p=0.002, n=6)
    B/op:      111.5 KiB →  6.44 KiB -94.22% (p=0.002)
    allocs/op: 4219      →    95     -97.75% (p=0.002)

The bench also reshapes `BenchmarkFindObjectInPackfile_*`
and `BenchmarkObjectStorage_*` to run against the
`makeMultiPackFixture` helper (added in c828cde), since the
single-pack basic fixture cannot demonstrate the scaling-
with-N wins that the read-path optimisations deliver.

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L2449-L2473

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants