Conversation
|
IMHO it would make sense to review and merge #2844 first before changing prune. |
That's the plan. I've primarily opened the PR to avoid duplicate work. (Well, actually it was quicker to code this and run prune afterwards than wait for the four hour unparallelized repack step to complete ;-) ) |
|
I only took a cursory glance at the code, but could you explain why there are still two worker pools connected by a channel? It looks like the downloader goroutines could just call the processing code synchronously. Then numRepackWorkers would be the actual number of workers, rather than half of that number. |
|
@greatroar The idea is to split the workers in the future into CPU- and IO-bound (to some extent). That is the number of download workers would scale with number of connections the backend can handle, whereas the repacking itself is more CPU-bound and would be limited to I'm aware that Right now, the number of repack workers is just an arbitrary number (I just used the same value as in |
7fb67f2 to
1d5e9d0
Compare
fd0
left a comment
There was a problem hiding this comment.
Great work! I'd love to refactor this into smaller functions (and maybe a Repacker struct which bundles all things we need nicely), but this will do for now.
We just need to clarify what we're going to do with the map, then it's ready to be merged.
The file returned from DownloadAndHash() is already seeked to the start of the file.
What does this PR change? What problem does it solve?
The PR parallelizes the repack step of the prune command.
This change is basically a simpler reimplementation of the corresponding part of #2340. There are a few notable differences though:
keepBlobsand not tracked in an additional sync.Map. The implementation follows Prune speedup #2340 in that it is prepared to support using one of multiple duplicates of a blob in the future. (This is the reason for the duplicate keepBlobs check)Note: Most of the changes are indentation only, so the actual changes are best viewed when ignoring whitespace changes.
Was the change discussed in an issue or in the forum before?
Probably in one of the issues regarding prune performance.
Checklist
[ ] I have added tests for all changes in this PRRepack already has tests[ ] I have added documentation for the changes (in the manual)changelog/unreleased/that describes the changes for our users (template here)gofmton the code in all commits