Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Improve git blame performance and fix previous commit location when file was renamed#61577

Merged
pjlast merged 16 commits into
mainfrom
pjlast/read-previous-commit-from-git-blame
Apr 5, 2024
Merged

Improve git blame performance and fix previous commit location when file was renamed#61577
pjlast merged 16 commits into
mainfrom
pjlast/read-previous-commit-from-git-blame

Conversation

@pjlast

@pjlast pjlast commented Apr 4, 2024

Copy link
Copy Markdown
Contributor

Closes #60137
Closes #60159

This PR does several things to improve Sourcegraph's Git Blame functionality:

  • The previous commit of a change is now parsed from the output of git blame, which removes the need to fetch every commit in a blame just for the parent commit.
  • The frontend now uses the filename field that's part of the git blame output. This filename is the original filename of the change, which allows for accurate navigation to the original file at the specified commit. Previously, if a file moved during a commit, this functionality would simply not work.

Overall this is a massive performance improvement, reducing the time it takes for a git blame on a large file to load from several seconds down to less than 500 milliseconds.

Test plan

Manual tests, unit tests adjusted where necessary.

@cla-bot cla-bot Bot added the cla-signed label Apr 4, 2024
@github-actions github-actions Bot added team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all labels Apr 4, 2024
@pjlast pjlast changed the title Pjlast/read previous commit from git blame Improve git blame performance and fix previous commit location when file was renamed Apr 4, 2024
@pjlast pjlast marked this pull request as ready for review April 4, 2024 09:20
@pjlast pjlast requested a review from a team April 4, 2024 09:20
// Read returns the next blame hunk. The returned hunk is only valid
// until the next call to Read, and needs to be copied if it needs
// to be stored.
func (br *blameHunkReader) Read() (*gitdomain.Hunk, error) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reader has been drastically simplified. git blame --incremental displays commits sequentially, so it's not necessary to keep an entire hashmap of previous commit data. We only need the current commit being processed.

The blame hunks are also deterministically structured: They start with the commit hash and end with the filename field, so I simplified the parsing and logic here. Once we read a filename field, we return the hunk. No funky logic required.

Finally, I'm using a single in-memory gitdomain.Hunk. Since hunks are immediately converted to a proto and sent, I don't think there is value in creating a new hunk object every time, especially since a vast majority of the hunks will share data and require minimal updates.

@sourcegraph-bot

sourcegraph-bot commented Apr 4, 2024

Copy link
Copy Markdown
Contributor

📖 Storybook live preview

Comment thread client/web/src/util/url.ts
Comment thread cmd/gitserver/internal/git/iface.go Outdated
Comment thread client/web/src/repo/blame/useBlameHunks.ts Outdated
Comment thread cmd/gitserver/internal/git/gitcli/blame.go Outdated
Comment thread cmd/gitserver/internal/git/gitcli/blame.go
Comment thread internal/gitserver/gitdomain/common.go Outdated
Comment thread cmd/gitserver/internal/git/gitcli/blame.go
Comment thread cmd/gitserver/internal/git/gitcli/blame.go
Comment thread cmd/gitserver/internal/git/gitcli/blame.go
Comment thread cmd/gitserver/internal/git/gitcli/blame.go Outdated
Comment thread client/web/src/util/url.ts
Comment thread cmd/frontend/internal/httpapi/stream_blame.go Outdated
Comment thread CHANGELOG.md
- The "Commits" button in repository and folder pages links to commits in the current revision instead of in the default branch. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408)
- The "Commits" button in repository and folder pages uses Perforce language and links to `/-/changelists` for Perforce depots when the experimental feature `perforceChangelistMapping` is enabled. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408)
- Selecting "View blame prior to this change" on a file that was moved will now correctly navigate to the old file location at the specified commit. [#61577](https://github.com/sourcegraph/sourcegraph/pull/61577)
- Git blame performance on large files with a large number of commits has been drastically improved. [#61577](https://github.com/sourcegraph/sourcegraph/pull/61577)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be great to update with numbers when we have an overall percentile gain

@eseliger eseliger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet done, but waiting to resolve more commits

oid: string
}[]
previous: {
rev: string

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, can we call this commitID as well?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe previous also needs to be nullable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for consistency with the main object, where the commitID is renamed to rev 🤷‍♂️

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm generally renaming fields feels like a red flag to me that it will cause confusion down the line, but I see why 🤔 maybe time to change both? 😬

Comment thread client/web/src/repo/blame/useBlameHunks.ts Outdated
Comment thread client/web/src/repo/blob/BlameDecoration.tsx Outdated
Comment thread client/web/src/util/url.ts
Comment thread cmd/frontend/internal/httpapi/stream_blame.go Outdated

@eseliger eseliger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending a test in gitcli that covers the rename case. Nice!

I see the comment about the non-reusability of Hunk has been removed but aren't we still reusing the pointer? 🤔

@pjlast

pjlast commented Apr 5, 2024

Copy link
Copy Markdown
Contributor Author

@eseliger we make a copy of the object before sending it to the caller

@pjlast pjlast merged commit bdd400f into main Apr 5, 2024
@pjlast pjlast deleted the pjlast/read-previous-commit-from-git-blame branch April 5, 2024 09:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blame: Need to return OldPath for files that were renamed Git Blame runs GetCommit a ton of times

4 participants