Improve git blame performance and fix previous commit location when file was renamed by pjlast · Pull Request #61577 · sourcegraph/sourcegraph-public-snapshot

pjlast · 2024-04-04T07:13:38Z

This PR does several things to improve Sourcegraph's Git Blame functionality:

The previous commit of a change is now parsed from the output of git blame, which removes the need to fetch every commit in a blame just for the parent commit.
The frontend now uses the filename field that's part of the git blame output. This filename is the original filename of the change, which allows for accurate navigation to the original file at the specified commit. Previously, if a file moved during a commit, this functionality would simply not work.

Overall this is a massive performance improvement, reducing the time it takes for a git blame on a large file to load from several seconds down to less than 500 milliseconds.

Test plan

Manual tests, unit tests adjusted where necessary.

pjlast · 2024-04-04T09:25:45Z

+// Read returns the next blame hunk. The returned hunk is only valid
+// until the next call to Read, and needs to be copied if it needs
+// to be stored.
+func (br *blameHunkReader) Read() (*gitdomain.Hunk, error) {


This reader has been drastically simplified. git blame --incremental displays commits sequentially, so it's not necessary to keep an entire hashmap of previous commit data. We only need the current commit being processed.

The blame hunks are also deterministically structured: They start with the commit hash and end with the filename field, so I simplified the parsing and logic here. Once we read a filename field, we return the hunk. No funky logic required.

Finally, I'm using a single in-memory gitdomain.Hunk. Since hunks are immediately converted to a proto and sent, I don't think there is value in creating a new hunk object every time, especially since a vast majority of the hunks will share data and require minimal updates.

sourcegraph-bot · 2024-04-04T09:27:25Z

📖 Storybook live preview

mmanela · 2024-04-04T14:09:27Z

 - The "Commits" button in repository and folder pages links to commits in the current revision instead of in the default branch. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408)
 - The "Commits" button in repository and folder pages uses Perforce language and links to `/-/changelists` for Perforce depots when the experimental feature `perforceChangelistMapping` is enabled. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408)
+- Selecting "View blame prior to this change" on a file that was moved will now correctly navigate to the old file location at the specified commit. [#61577](https://github.com/sourcegraph/sourcegraph/pull/61577)
+- Git blame performance on large files with a large number of commits has been drastically improved. [#61577](https://github.com/sourcegraph/sourcegraph/pull/61577)


Will be great to update with numbers when we have an overall percentile gain

…trings

Co-authored-by: Erik Seliger <erikseliger@me.com>

eseliger

not yet done, but waiting to resolve more commits

eseliger · 2024-04-04T15:27:44Z

-            oid: string
-        }[]
+        previous: {
+            rev: string


for consistency, can we call this commitID as well?

maybe previous also needs to be nullable?

I was going for consistency with the main object, where the commitID is renamed to rev 🤷‍♂️

hmm generally renaming fields feels like a red flag to me that it will cause confusion down the line, but I see why 🤔 maybe time to change both? 😬

eseliger

LGTM, pending a test in gitcli that covers the rename case. Nice!

I see the comment about the non-reusability of Hunk has been removed but aren't we still reusing the pointer? 🤔

pjlast · 2024-04-05T06:05:12Z

@eseliger we make a copy of the object before sending it to the caller

pjlast added 5 commits April 3, 2024 16:18

Use previous commit from git blame instead of manual lookup

e33c624

Restore accidentally removed changes

400197f

Links to previous versions of file where the file was moved now works

7955829

Clean up blame hunk reader

32b6268

Update interface comment

4021bea

cla-bot Bot added the cla-signed label Apr 4, 2024

github-actions Bot added team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all labels Apr 4, 2024

pjlast added 2 commits April 4, 2024 10:54

Remove unused imports

f1d6ecb

Changelog entry

9217773

pjlast changed the title ~~Pjlast/read previous commit from git blame~~ Improve git blame performance and fix previous commit location when file was renamed Apr 4, 2024

pjlast marked this pull request as ready for review April 4, 2024 09:20

pjlast requested a review from a team April 4, 2024 09:20

Merge branch 'main' into pjlast/read-previous-commit-from-git-blame

795c8b6

pjlast commented Apr 4, 2024

View reviewed changes

eseliger reviewed Apr 4, 2024

View reviewed changes

mmanela reviewed Apr 4, 2024

View reviewed changes

pjlast and others added 2 commits April 4, 2024 17:22

Use the filename from the previous field. Also make sure to unquote s…

9861330

…trings

Update cmd/gitserver/internal/git/gitcli/blame.go

70be726

Co-authored-by: Erik Seliger <erikseliger@me.com>

eseliger reviewed Apr 4, 2024

View reviewed changes

pjlast added 4 commits April 4, 2024 17:49

Make PreviousCommit optional

27609c3

Add | null

c184f19

Update comments

f4c27d5

Add test for getURLToFileCommit

5081f91

eseliger approved these changes Apr 4, 2024

View reviewed changes

pjlast added 2 commits April 5, 2024 09:51

Add test for commits with a different previous filename

780479e

Merge branch 'main' into pjlast/read-previous-commit-from-git-blame

bf212d2

pjlast merged commit bdd400f into main Apr 5, 2024

pjlast deleted the pjlast/read-previous-commit-from-git-blame branch April 5, 2024 09:26

Conversation

pjlast commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

pjlast Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

sourcegraph-bot commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmanela Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

eseliger left a comment

Choose a reason for hiding this comment

Uh oh!

eseliger Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

eseliger Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

pjlast Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

eseliger Apr 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eseliger left a comment

Choose a reason for hiding this comment

Uh oh!

pjlast commented Apr 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pjlast commented Apr 4, 2024 •

edited

Loading

sourcegraph-bot commented Apr 4, 2024 •

edited

Loading