Skip to content

Parallel checkout#1749

Draft
cedric-appdirect wants to merge 2 commits into
go-git:mainfrom
cedric-appdirect:parallel-checkout
Draft

Parallel checkout#1749
cedric-appdirect wants to merge 2 commits into
go-git:mainfrom
cedric-appdirect:parallel-checkout

Conversation

@cedric-appdirect

Copy link
Copy Markdown
Contributor

This PR introduce the use of parallel goroutine to checkout each files. This scale as you would expect by the number of core. It has minimal overhead for very small repository as most of the workload is driven by decompression and syscall which dwarf the use of the channel to send "checkout command".

@cedric-appdirect

Copy link
Copy Markdown
Contributor Author

I have left this PR in draft as I am not sure of the code logic here. One of the area I am not a fan, but couldn't figure out a better approach is the creation of workerTree for each of the goroutine. Will need review and advise from someone that understand and know this code base.

Replace the lazy nil-check on t.m with sync.Once for one-time
initialization, and change the t.t path cache from a plain map to
sync.Map for concurrent access. Both Decode() resets now properly
clear initOnce and the path cache so that re-decoding the same
*Tree instance works correctly.

Assisted-by: OpenCode with Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Cedric BAIL <cedric.bail@appdirect.com>
Improve checkout performance for filesystem-backed repositories by
splitting resetWorktree into a producer/consumer pipeline that mirrors
Git's own parallel-checkout design:

Workers (parallel): each goroutine gets its own filesystem.Storage
instance and resolves object.File handles from packfiles concurrently.
Resolved files are streamed through a channel one at a time.

Writer (sequential): a single goroutine receives resolved files and
writes them to the billy.Filesystem. Only one file's content is in
memory at any moment, avoiding bulk memory spikes on large repos.
errgroup handles cancellation: the first worker error cancels all
siblings via context and is returned to the caller.

Non-filesystem storage backends (e.g. memory) fall back to fully
sequential checkout, as their EncodedObjectStorer implementations
are not documented as goroutine-safe.

Assisted-by: OpenCode with Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Cedric BAIL <cedric.bail@appdirect.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants