fix: Prefer SCIP uploads over LSIF uploads#64217
Conversation
When determining the set of visible uploads at a given commit, we group uploads from scip-K and lsif-K for ranking purposes to allow them to shadow each other. Generally, scip-K will end up shadowing lsif-K, avoiding a mix of code intel data
| } | ||
|
|
||
| const findClosestCompletedUploadsFromGraphFragmentCommitGraphQuery = ` | ||
| const findClosestCompletedUploadsFromGraphFragmentCommitGraphQuery = |
There was a problem hiding this comment.
This language= comment is for GoLand language injections to work properly.
| makeCommit(3): {2}, | ||
| makeCommit(4): {2}, | ||
| makeCommit(5): {2}, | ||
| makeCommit(5): {4}, |
There was a problem hiding this comment.
This is an innocuous change because of the upload_id based tie-breaking.
| } | ||
|
|
||
| if diff := cmp.Diff([]int{2, 3, 5}, getProtectedUploads(t, db, 50)); diff != "" { | ||
| if diff := cmp.Diff([]int{2, 3, 4, 5}, getProtectedUploads(t, db, 50)); diff != "" { |
There was a problem hiding this comment.
This one is a bit iffy -- I will look into why this changed tomorrow.
UPDATE: This change is coming up because there is a tagged commit (5) at which upload ID 4 is now visible. So this is correct.
| api.CommitID(makeCommit(5)): {{UploadID: 2, Distance: 2}}, | ||
| api.CommitID(makeCommit(6)): {{UploadID: 2, Distance: 3}}, | ||
| api.CommitID(makeCommit(7)): {{UploadID: 3, Distance: 0}}, | ||
| api.CommitID(makeCommit(8)): {{UploadID: 1, Distance: 4}}, | ||
| api.CommitID(makeCommit(8)): {{UploadID: 2, Distance: 4}}, |
There was a problem hiding this comment.
These changes are innocuous and are happening because of the upload_id tie-breaking change.
| {commit: makeCommit(3), file: "file.ts", rootMustEnclosePath: true, graph: graph, anyOfIDs: []int{2}}, | ||
| {commit: makeCommit(4), file: "file.ts", rootMustEnclosePath: true, graph: graph, anyOfIDs: []int{2}}, | ||
| {commit: makeCommit(6), file: "file.ts", rootMustEnclosePath: true, graph: graph, anyOfIDs: []int{1}}, | ||
| {commit: makeCommit(6), file: "file.ts", rootMustEnclosePath: true, graph: graph, anyOfIDs: []int{2}}, |
There was a problem hiding this comment.
Innocuous / Happening because of the upload_id tie-breaking change.
| makeCommit(3): {2}, | ||
| makeCommit(4): {2}, | ||
| makeCommit(5): {2}, | ||
| makeCommit(5): {4}, |
There was a problem hiding this comment.
Innocuous / Happening because of the upload_id tie-breaking change.
| } | ||
|
|
||
| if diff := cmp.Diff([]int{2, 3, 5}, getProtectedUploads(t, db, 50)); diff != "" { | ||
| if diff := cmp.Diff([]int{2, 3, 4, 5}, getProtectedUploads(t, db, 50)); diff != "" { |
There was a problem hiding this comment.
This one is a bit iffy -- I will look into why this changed tomorrow (maybe it's the same as the other place where 4 is getting added).
UPDATE: This commit graph is the same as in the other test where 4 is getting added, which is why the result is the same here.
| makeCommit(5): {2}, | ||
| makeCommit(6): {2}, | ||
| makeCommit(7): {3}, | ||
| makeCommit(8): {1}, | ||
| makeCommit(8): {2}, |
There was a problem hiding this comment.
Innocuous / Happening because of the upload_id tie-breaking change.
| assertCommitsVisibleFromUploads(t, store, uploads, expectedVisibleUploads) | ||
|
|
||
| if diff := cmp.Diff([]int{1}, getUploadsVisibleAtTip(t, db, 50)); diff != "" { | ||
| if diff := cmp.Diff([]int{2}, getUploadsVisibleAtTip(t, db, 50)); diff != "" { |
There was a problem hiding this comment.
Innocuous / Happening because of the upload_id tie-breaking change.
kritzcreek
left a comment
There was a problem hiding this comment.
Luckily postgres's substr indexing isn't confusing at all /s
Nice change!
What do you mean? By it being 1-based instead of 0-based? |
When determining the set of visible uploads at a given commit,
we group uploads from scip-K and lsif-K for ranking purposes
to allow them to shadow each other. Generally, scip-K will end
up shadowing lsif-K, avoiding sending the client a mix of data
from a much older lsif-K index when newer data from a scip-K
index is available.
Fixes https://linear.app/sourcegraph/issue/GRAPH-787
The logic requires some final tie-breaking to avoid non-determinism
in case when there is a scip-K and lsif-K index at the same distance,
which I've done based on the upload_id in descending order
(upload IDs are monotonically increasing). Earlier, some code was
doing tie-breaking based on upload_id in ascending order,
so I've changed that for consistency. This caused some test failures,
so I've updated the corresponding struct literals.
(I've left some PR review comments to describe each modification
to the existing test cases. The PR is ready for review, I've just added
the comments for ease of review.)
Test plan
Added a new dedicated test checking for shadowing behavior.
Changelog
Fixes a bug where old LSIF uploads would also be used for code
navigation even when newer SCIP uploads were available for the
same language, potentially leading to duplicate results in the reference
panel. With this change, scip-go uploads shadow the uploads
for lsif-go, and similarly for other indexers following the scip-X/lsif-X naming
convention.