Skip to content

storage: filesystem/mmap, Add PackScanner to handle large repos#1776

Merged
pjbgf merged 3 commits into
go-git:mainfrom
pjbgf:perf2
Dec 5, 2025
Merged

storage: filesystem/mmap, Add PackScanner to handle large repos#1776
pjbgf merged 3 commits into
go-git:mainfrom
pjbgf:perf2

Conversation

@pjbgf

@pjbgf pjbgf commented Dec 3, 2025

Copy link
Copy Markdown
Member

The new PackScanner is able to navigate through a packfile and inflate objects on demand. This avoids costly operations such as loading into memory the entire index and resolving all delta objects which may or may not be used.

Performance-wise, the impact is negligible for small repositories, with the exception of the 60% drop in number of allocations:

BenchmarkPackHandlers/packfile-cache-osfs:_basic-fixture-16   	    4020	    288678 ns/op	  294714 B/op	     743 allocs/op
BenchmarkPackHandlers/packfile-nocache-memfs:basic-fixture-16 	    4272	    280586 ns/op	  294884 B/op	     742 allocs/op
BenchmarkPackHandlers/mmap-pack-scanner:basic-fixture-16      	    5646	    212349 ns/op	  293650 B/op	     286 allocs/op

In very large repositories is where most gains can be observed. For the linux kernel, the memory churn of fetching 4 objects drops from 1.3GB to 1MB.
Both allocations and time per operation are a fraction of the existing implementation:

BenchmarkPackHandlers/packfile-cache-osfs:_linux-kernel-16    	       1	2240495709 ns/op	1331433920 B/op	   35933 allocs/op
BenchmarkPackHandlers/packfile-nocache-memfs:linux-kernel-16  	       1	2026880001 ns/op	1331437040 B/op	   35935 allocs/op
BenchmarkPackHandlers/mmap-pack-scanner:linux-kernel-16       	     477	   2513275 ns/op	   1011512 B/op	    1488 allocs/op

Additionally, this PR:

  • Refactors some internal pack functions so that it can be used by new and existing "Pack Handlers".
  • Creates a new test/ dir, to amass tests that goes across implementations. This is also the home for some high-level benchmarks, which will help monitor performance improvements over time.

Relates to: #1451 #1691 #1673 #1601.
Part of: #1774.

This is part of a openSUSE Hack Week 25 project.

Special thanks to @runxiyu for the initial nudge on mmap, which triggered the research that led to this PR. 🙇

@cedric-appdirect

Copy link
Copy Markdown
Contributor

In my attempt to improve performance, I did come across this issue and did think about using mmap. There is one drawback to be aware of with mmap, you can end up with blocking the entire system thread on page fault while the OS fetch the missing file page. It is my understanding that the go ecosystem has been avoiding this system call sadly. There is a lot of benefit to it with reduce syscall, better OS interaction for memory usage. Still having this blocking an entire system thread, means that we could see latency spike everywhere. Way to minimize the problem would be to use a goroutine pined to a specific system thread and before touching any page send a read request to that goroutine to make sure that only one system thread at most would get blocked. Maybe another approach is to use io.ReaderAt. Or make the mmap code path optional.

)

// mmapFile creates a memory-mapped region for the given file.
func mmapFile(f billy.File) ([]byte, func() error, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on adding Mmap to the billy? It'll be nice if other filesystems like memfs or customized ones support mmap.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmap is a syscall that maps files or devices into memory, it is generally backed by a file descriptor. The file descriptor has been exposed in go-bilyl for the osfs abstractions.

Given that we already have an in-memory abstraction (memfs and memory storage), I'm not sure we would benefit from tying those to an OS-specific syscall.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that the billy.Filesystem interface is conceptually based on an actual OS filesystem (especially posix). I’m aware that we probably wouldn’t benefit much from supporting mmap in filesystems like memfs, but I thought it should still be available for extensibility, similar to how we can mmap files on tmpfs.

I’m fine with either decision, though. It might be an unnecessary API addition for a very specific use case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair I thought about this a bit more, and I actually agree that we should aim to push this into the filesystem abstraction layer. I created an issue to track that in go-billy as a separate effort.

Comment thread storage/filesystem/mmap/fsobject.go Outdated
if o.diskType.IsDelta() {
data := o.scanner.packMmap[start : start+o.size]
br := sync.GetBufioReader(bytes.NewReader(data))
return io.NopCloser(br), nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return io.NopCloser(br), nil
rc := ioutil.NewReadCloser(br, ioutil.CloserFunc(func() error {
sync.PutBufioReader(br)
return nil
}))
return rc, nil

It seems like currently br isn't being returned to the pool.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice spot, thanks for that. 🙇

return io.NopCloser(br), nil
}

data := o.scanner.packMmap[start:]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data := o.scanner.packMmap[start:]
data := o.scanner.packMmap[start:start+o.size]

If I understand correctly, Read should be limited to the size of the object.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial implementation, but when trying to limit the returning slice here I had some consistency issues - IIRC some objects would break the Zlib process. Note that this is already in place for the raw reader (L95) - which is slightly more important in this case.

Given that this is just a new slice, pointing to the same underlying array, and that the Zlib process stops at the end of the compressed data, I'm not sure it is worth changing it. But happy to change if you find a way around the issue I mentioned above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is just a new slice, pointing to the same underlying array, and that the Zlib process stops at the end of the compressed data, I'm not sure it is worth changing it.

I wasn't aware that the zlib reader only consumes a single stream. I agree that it can stay as is. Thanks for letting me know.

Comment thread storage/filesystem/mmap/fsobject.go Outdated
Comment on lines +168 to +198
var baseOffset uint64
var baseHash plumbing.Hash
var err error

if o.diskType == plumbing.OFSDeltaObject {
reader := bytes.NewReader(o.scanner.packMmap[pos:])
negativeOffset, err := binary.ReadVariableWidthInt(reader)
if err != nil {
return fmt.Errorf("failed to read OFS delta offset: %w", err)
}

baseOffset = uint64(o.offset) - uint64(negativeOffset)
consumed := len(o.scanner.packMmap[pos:]) - reader.Len()
pos += int64(consumed)
} else {
hashSize := o.scanner.hashSize
if pos+int64(hashSize) > int64(len(o.scanner.packMmap)) {
return fmt.Errorf("invalid REF delta: hash out of bounds")
}

baseHash, _ = plumbing.FromBytes(o.scanner.packMmap[pos : pos+int64(hashSize)])
pos += int64(hashSize)
}

// Get the base object to inherit its type.
var base plumbing.EncodedObject
if o.diskType == plumbing.OFSDeltaObject {
base, err = o.scanner.GetByOffset(baseOffset)
} else {
base, err = o.scanner.Get(baseHash)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var baseOffset uint64
var baseHash plumbing.Hash
var err error
if o.diskType == plumbing.OFSDeltaObject {
reader := bytes.NewReader(o.scanner.packMmap[pos:])
negativeOffset, err := binary.ReadVariableWidthInt(reader)
if err != nil {
return fmt.Errorf("failed to read OFS delta offset: %w", err)
}
baseOffset = uint64(o.offset) - uint64(negativeOffset)
consumed := len(o.scanner.packMmap[pos:]) - reader.Len()
pos += int64(consumed)
} else {
hashSize := o.scanner.hashSize
if pos+int64(hashSize) > int64(len(o.scanner.packMmap)) {
return fmt.Errorf("invalid REF delta: hash out of bounds")
}
baseHash, _ = plumbing.FromBytes(o.scanner.packMmap[pos : pos+int64(hashSize)])
pos += int64(hashSize)
}
// Get the base object to inherit its type.
var base plumbing.EncodedObject
if o.diskType == plumbing.OFSDeltaObject {
base, err = o.scanner.GetByOffset(baseOffset)
} else {
base, err = o.scanner.Get(baseHash)
}
// Get the base object to inherit its type.
var base plumbing.EncodedObject
var err error
if o.diskType == plumbing.OFSDeltaObject {
reader := bytes.NewReader(o.scanner.packMmap[pos:])
negativeOffset, err := binary.ReadVariableWidthInt(reader)
if err != nil {
return fmt.Errorf("failed to read OFS delta offset: %w", err)
}
baseOffset := uint64(o.offset) - uint64(negativeOffset)
consumed := len(o.scanner.packMmap[pos:]) - reader.Len()
pos += int64(consumed)
base, err = o.scanner.GetByOffset(baseOffset)
} else {
hashSize := o.scanner.hashSize
if pos+int64(hashSize) > int64(len(o.scanner.packMmap)) {
return fmt.Errorf("invalid REF delta: hash out of bounds")
}
baseHash, _ := plumbing.FromBytes(o.scanner.packMmap[pos : pos+int64(hashSize)])
pos += int64(hashSize)
base, err = o.scanner.Get(baseHash)
}

IMO it looks cleaner to limit the scope of baseOffset and baseHash.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, completely agree. 👍

}
if err != nil {
return nil, fmt.Errorf("failed to get base object: %w", err)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines above seems to be duplicate of the logic in resolveMetadata. Moving this to a separate function could be a good idea.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the sentiment, however a big part of the delta resolution logic is duplicated from what already exists in the packfile package - but in a slightly different shape and flow. A follow-up PR should take care of extracting that logic (potentially) to its own package that can then be used there and here.

Comment thread storage/filesystem/mmap/scan.go Outdated
return s.offset(pos)
}

// FindOffset is deprecated. Use OffsetByHash instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FindOffset is deprecated. Use OffsetByHash instead.
// Deprecated: Use OffsetByHash instead.

This enables some great features. https://go.dev/wiki/Deprecated

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a bit more, and I think we are better off completely dropping off these deprecation for the time being. Ideally that would be done across all the "Pack handlers", so it can be done as a separate PR later.

Comment thread storage/filesystem/mmap/scan.go Outdated
return id, nil
}

// FindHash is deprecated. Use HashByOffset instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FindHash is deprecated. Use HashByOffset instead.
// Deprecated: Use HashByOffset instead.

Same as above.

Comment thread tests/pack/common_test.go
Comment on lines +59 to +66
// resetGlobalSyncPools resets the global sync pools. This is needed as
// the Suite execution ends up copying global vars by value, which in this
// case specifically results in the `New` becoming nil.
func resetGlobalSyncPools() {
bufioPool = sync.Pool{New: func() any {
return bufio.NewReader(nil)
}}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to find out how this exactly happens. I would really appreciate it if you could elaborate on this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question, a) are unsure on why the problem occurs or b) on how the fix works?

For the former, it seems that Testify.Suite would value copy global vars to ensure the different suites instances don't impact on one another. sync.Pool objects are not meant to be copied, so once you have 2-3 Suites tests you can step debug and see New set to nil, which leads to a panic (due to the reset that follows it).

As for the latter, I'm using linkname (L57) to make the remote sync.bufioReader available via the local bufioPool var. That way I can reset it to a new sync.Pool.

@onee-only onee-only Dec 5, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my ambiguous question. 😅
I was curious about the former, that testify.Suite can copy global variables.

sync.Pool objects are not meant to be copied, so once you have 2-3 Suites tests you can step debug and see New set to nil, which leads to a panic (due to the reset that follows it).

However, in my local debug this seems to be not the case. I have debugged TestPackfileWithFS and got the result below.

Image Image

bufioReader.New wasn't nil. It was x (the value from the pool) being non-nil, with nil data inside.

I believe the panic was caused by Packfile.Close() putting nil reader(p.rbuf) to the pool, specifically on the tests of FindHash() and FindOffset(). These two methods don't call p.init() to initialize p.rbuf, as they belong to idxfile.

Removing resetGlobalSyncPools() from SetupTest and calling handler.Get() in those tests to manually trigger p.init() fixed the issue for me.

I think we should make Packfile.Close() handle this case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, while debugging this without resetGlobalSyncPools, r is nil as the pool is empty and the func New is nil and therefore not called (as stepped into golib code). As per stack trace, init is being called:

image
--- FAIL: TestPackfileEmbedFS (0.00s)
    --- FAIL: TestPackfileEmbedFS/TestGet (0.00s)
        --- FAIL: TestPackfileEmbedFS/TestGet/valid_commit_hash (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7b4e18]

goroutine 29 [running]:
testing.tRunner.func1.2({0x89eaa0, 0x3b8c610})
	/usr/lib64/go/1.25/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
	/usr/lib64/go/1.25/src/testing/testing.go:1875 +0x35b
panic({0x89eaa0?, 0x3b8c610?})
	/usr/lib64/go/1.25/src/runtime/panic.go:783 +0x132
bufio.(*Reader).Reset(...)
	/usr/lib64/go/1.25/src/bufio/bufio.go:81
github.com/go-git/go-git/v6/utils/sync.GetBufioReader({0x0, 0x0})
	/home/levi/git/go-git/go-git/utils/sync/bufio.go:23 +0x78
github.com/go-git/go-git/v6/plumbing/format/packfile.(*Packfile).init.func1()
	/home/levi/git/go-git/go-git/plumbing/format/packfile/packfile.go:211 +0x39
sync.(*Once).doSlow(0x3b46440?, 0x8dd400?)
	/usr/lib64/go/1.25/src/sync/once.go:78 +0xac
sync.(*Once).Do(...)
	/usr/lib64/go/1.25/src/sync/once.go:69
github.com/go-git/go-git/v6/plumbing/format/packfile.(*Packfile).init(0xc0000c6370)
	/home/levi/git/go-git/go-git/plumbing/format/packfile/packfile.go:200 +0x45
github.com/go-git/go-git/v6/plumbing/format/packfile.(*Packfile).Get(0xc0000c6370, {{0xe8, 0xd3, 0xff, 0xab, 0x55, 0x28, 0x95, 0xc1, 0x9b, ...}, ...})
	/home/levi/git/go-git/go-git/plumbing/format/packfile/packfile.go:69 +0x37
github.com/go-git/go-git/v6/tests/pack_test.(*PackHandlerSuite[...]).TestGet.func1()
	/home/levi/git/go-git/go-git/tests/pack/pack_test.go:280 +0x52
testing.tRunner(0xc0001eb180, 0xc0000e0510)
	/usr/lib64/go/1.25/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 28
	/usr/lib64/go/1.25/src/testing/testing.go:1997 +0x465
FAIL	github.com/go-git/go-git/v6/tests/pack	0.008s
FAIL

We can leave this as is for now, and review on a follow-up PR as this is related to the testing code as opposed to the main change proposed.

@pjbgf

pjbgf commented Dec 4, 2025

Copy link
Copy Markdown
Member Author

There is one drawback to be aware of with mmap, you can end up with blocking the entire system thread on page fault while the OS fetch the missing file page.

@cedric-appdirect Thanks for pointing that out. For context, handling very large repositories with the vanilla filesystem implementation results in GB's of memory churn. That combined with the impact of the GC pressure results in a very tough time for memory-constraint machines, as that may starve the rest of the system of both CPU and memory.

Now considering the benchmark I shared, the mmap implementation has a rough cost of 2.5ms per operation. The page fault would need to make it at least 500 times slower, for the performance to match the current implementation.

Maybe another approach is to use io.ReaderAt.

It would make a lot of sense to use that to add support for Windows. Perhaps that becomes one additional back-end for the PackScanner.

Or make the mmap code path optional.

This is already optional - as per PR description and linked issue.

@cedric-appdirect

Copy link
Copy Markdown
Contributor

There is one drawback to be aware of with mmap, you can end up with blocking the entire system thread on page fault while the OS fetch the missing file page.

@cedric-appdirect Thanks for pointing that out. For context, handling very large repositories with the vanilla filesystem implementation results in GB's of memory churn. That combined with the impact of the GC pressure results in a very tough time for memory-constraint machines, as that may starve the rest of the system of both CPU and memory.

Absolutely. mmap is really good at working with the OS at evicting page and only bringing and keeping the bit of the file actively needed in memory.

Now considering the benchmark I shared, the mmap implementation has a rough cost of 2.5ms per operation. The page fault would need to make it at least 500 times slower, for the performance to match the current implementation.

In general, this is true. It is more an edge case scenario problem. Once your process is under memory pressure, things can become really bad. The closer I can describe it is when you computer start to be using the swap for data needed by active code, there is nothing that can bring back that computer to live. You are good to restart. That can get that bad. The main concern not being for the performance of go-git, but the entire go application itself as it could end up with all its system thread being blocked (the less CPU cores you have the more likely that would be to a problem). This would add latency to unrelated goroutine even if the go-git one would perform better than previous version.

Maybe another approach is to use io.ReaderAt.

It would make a lot of sense to use that to add support for Windows. Perhaps that becomes one additional back-end for the PackScanner.

Or for people that use the package inside things like web server. It would be less efficient/performant, I would expect, than the mmap variant, but without the random latency penalty.

Or make the mmap code path optional.

This is already optional - as per PR description and linked issue.

Excellent, I am not yet accustomed to the full code base and missed it. Sorry.

@cedric-appdirect

cedric-appdirect commented Dec 4, 2025

Copy link
Copy Markdown
Contributor

I may have an interest in implementing the io.ReaderAt version, as I use go-git in web service, once this PR is merged. Let me know if you do not plan to work on it.

pjbgf added 2 commits December 5, 2025 12:07
The latest version of both dependencies enable the use of OSFixture
and Fd(), both being requirements to test the mmap implementation.

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Comment thread storage/filesystem/mmap/fsobject.go Fixed
The new PackScanner is able to navigate through a packfile and inflate
objects on demand. This avoids costly operations such as loading into
memory the entire index and resolving all delta objects which may or
may not be used.

Performance-wise, the impact is negligible for small repositories,
with the exception of the 60% drop in number of allocations:

BenchmarkPackHandlers/packfile-cache-osfs:_basic-fixture-16   	    4020	    288678 ns/op	  294714 B/op	     743 allocs/op
BenchmarkPackHandlers/packfile-nocache-memfs:basic-fixture-16 	    4272	    280586 ns/op	  294884 B/op	     742 allocs/op
BenchmarkPackHandlers/mmap-pack-scanner:basic-fixture-16      	    5646	    212349 ns/op	  293650 B/op	     286 allocs/op

In very large repositories is where most gains can be observed. For the linux
kernel, the memory churn of fetching 4 objects drops from 1.3GB to 1MB.
Both allocations and time per operation are a fraction of the existing implementation:

BenchmarkPackHandlers/packfile-cache-osfs:_linux-kernel-16    	       1	2240495709 ns/op	1331433920 B/op	   35933 allocs/op
BenchmarkPackHandlers/packfile-nocache-memfs:linux-kernel-16  	       1	2026880001 ns/op	1331437040 B/op	   35935 allocs/op
BenchmarkPackHandlers/mmap-pack-scanner:linux-kernel-16       	     477	   2513275 ns/op	   1011512 B/op	    1488 allocs/op

Signed-off-by: Paulo Gomes <pjbgf@linux.com>
Comment thread storage/filesystem/mmap/fsobject.go Dismissed

@onee-only onee-only left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. For some unknown reason, I can't resolve those reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants