Improve git blame performance and fix previous commit location when file was renamed#61577
Conversation
| // Read returns the next blame hunk. The returned hunk is only valid | ||
| // until the next call to Read, and needs to be copied if it needs | ||
| // to be stored. | ||
| func (br *blameHunkReader) Read() (*gitdomain.Hunk, error) { |
There was a problem hiding this comment.
This reader has been drastically simplified. git blame --incremental displays commits sequentially, so it's not necessary to keep an entire hashmap of previous commit data. We only need the current commit being processed.
The blame hunks are also deterministically structured: They start with the commit hash and end with the filename field, so I simplified the parsing and logic here. Once we read a filename field, we return the hunk. No funky logic required.
Finally, I'm using a single in-memory gitdomain.Hunk. Since hunks are immediately converted to a proto and sent, I don't think there is value in creating a new hunk object every time, especially since a vast majority of the hunks will share data and require minimal updates.
| - The "Commits" button in repository and folder pages links to commits in the current revision instead of in the default branch. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408) | ||
| - The "Commits" button in repository and folder pages uses Perforce language and links to `/-/changelists` for Perforce depots when the experimental feature `perforceChangelistMapping` is enabled. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408) | ||
| - Selecting "View blame prior to this change" on a file that was moved will now correctly navigate to the old file location at the specified commit. [#61577](https://github.com/sourcegraph/sourcegraph/pull/61577) | ||
| - Git blame performance on large files with a large number of commits has been drastically improved. [#61577](https://github.com/sourcegraph/sourcegraph/pull/61577) |
There was a problem hiding this comment.
Will be great to update with numbers when we have an overall percentile gain
Co-authored-by: Erik Seliger <erikseliger@me.com>
eseliger
left a comment
There was a problem hiding this comment.
not yet done, but waiting to resolve more commits
| oid: string | ||
| }[] | ||
| previous: { | ||
| rev: string |
There was a problem hiding this comment.
for consistency, can we call this commitID as well?
There was a problem hiding this comment.
maybe previous also needs to be nullable?
There was a problem hiding this comment.
I was going for consistency with the main object, where the commitID is renamed to rev 🤷♂️
There was a problem hiding this comment.
hmm generally renaming fields feels like a red flag to me that it will cause confusion down the line, but I see why 🤔 maybe time to change both? 😬
eseliger
left a comment
There was a problem hiding this comment.
LGTM, pending a test in gitcli that covers the rename case. Nice!
I see the comment about the non-reusability of Hunk has been removed but aren't we still reusing the pointer? 🤔
|
@eseliger we make a copy of the object before sending it to the caller |
Closes #60137
Closes #60159
This PR does several things to improve Sourcegraph's Git Blame functionality:
git blame, which removes the need to fetch every commit in a blame just for the parent commit.filenamefield that's part of the git blame output. This filename is the original filename of the change, which allows for accurate navigation to the original file at the specified commit. Previously, if a file moved during a commit, this functionality would simply not work.Overall this is a massive performance improvement, reducing the time it takes for a git blame on a large file to load from several seconds down to less than 500 milliseconds.
Test plan
Manual tests, unit tests adjusted where necessary.