refactor(codeintel): Extracts a MappedIndex abstraction over uploads#63781
Conversation
aeb467d to
77ca575
Compare
cafcc25 to
af85bad
Compare
af85bad to
52ce4c6
Compare
|
Sorry for messing around with the reviewers, I think Varun has a lot more context for this change. I'll ask Anton for review when I use this new abstraction to write syntactic usages tests. |
varungandhi-src
left a comment
There was a problem hiding this comment.
Took a brief look, will take a look at the rest tomorrow
| ) | ||
|
|
||
| func (s *store) SCIPDocument(ctx context.Context, uploadID int, path core.UploadRelPath) (_ *scip.Document, err error) { | ||
| func (s *store) SCIPDocument(ctx context.Context, uploadID int, path core.UploadRelPath) (_ *scip.Document, found bool, err error) { |
There was a problem hiding this comment.
I thought you were going to change this to use core.Option, right? I think the problem you must've run into is that you shouldn't copy scip.Document since it has a mutex. Maybe we can add an API to core.Option that takes in a pointer argument?
There was a problem hiding this comment.
I wasn't sure how deep I should push the Option, but as this is still within codeintel/ you're right.
Maybe we can add an API to core.Option that takes in a pointer argument?
I don't know if that's required, looks like core.Option[*T] works just fine.
There was a problem hiding this comment.
looks like core.Option[*T] works just fine.
In that case, None and Some(nil) would both represent "not found", right? Adding pointer-based APIs to Option would collapse those two states into one.
There was a problem hiding this comment.
I think if we decide that "not found" in codeintel means option.None we can reclaim *T to mean "never nil, here for mutability/perf reasons"
| type Mapped interface { | ||
| IndexCommit() api.CommitID | ||
| TargetCommit() api.CommitID | ||
| } |
There was a problem hiding this comment.
Please add doc comments here. Is TargetCommit guaranteed to be a descendant of IndexCommit or can it be unrelated or an ancestor?
Maybe TargetCommit should be called NavCommit or NavigationCommit instead, since the point is that we support code navigation at that commit?
There was a problem hiding this comment.
Is TargetCommit guaranteed to be a descendant of IndexCommit or can it be unrelated or an ancestor?
I don't think it matters for the implementation or the caller? The important thing is that all arguments and return values are specified in "TargetCommit" coordinates.
As these functions aren't called at the moment I could also just remove them for now?
Maybe TargetCommit should be called NavCommit or NavigationCommit instead, since the point is that we support code navigation at that commit?
I'm not in love with the "target" name, but as this is an abstraction I don't want to name anything after one of its usage sites. After I merge this PR I was planning to switch the codegraphdata API over to using this as well, at which point it'll be used in a non-navigation context.
There was a problem hiding this comment.
I don't think it matters for the implementation or the caller?
git diff will work with both git diff A B and git diff B A but we only propagate upload visibility in one direction in the commit graph.
Is your point that if the commit graph mechanism somehow supported visibility in the backwards direction, then this will just handle that case out-of-the-box without changes?
If it doesn't matter, then let's document that it doesn't matter.
After I merge this PR I was planning to switch the codegraphdata API over to using this as well, at which point it'll be used in a non-navigation context
The CodeGraphData API is also present for the purpose of code navigation (specifically, it will be invoked when the user is browsing the file)
It's not a big deal though, leaving the naming up to you.
|
How about this one? :D Sorry for the back and forth, still getting to grips with concurrency in Go. |
| newOccurrences = append(newOccurrences, &scip.Occurrence{ | ||
| Range: mappedRange.ToSCIPRange().SCIPRange(), | ||
| Symbol: occ.Symbol, | ||
| SymbolRoles: occ.SymbolRoles, | ||
| OverrideDocumentation: occ.OverrideDocumentation, | ||
| SyntaxKind: occ.SyntaxKind, | ||
| Diagnostics: occ.Diagnostics, | ||
| EnclosingRange: occ.EnclosingRange, | ||
| }) |
There was a problem hiding this comment.
This code will be silently incorrect when we add a new field to Occurrence. I know it's a bit icky, but it would be more future-proof to roundtrip serialize->deserialize (through Protobuf) and overwrite the value in the copy. Another alternative is to use reflect.DeepCopy, but I'd be worried about how that might interact with the extra
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
fields inside Occurrence created by the Protobuf codegen.
There was a problem hiding this comment.
I think the icky-ness is coming from using protobuf generated types in our domain code. Having to pass pointers everywhere because there's a "DontCopyMe" marker Mutex in there.
Maybe it's worth to consider a layer of types that we deserialize the proto messages into, with exhaustruct annotations etc.
For now I've used https://pkg.go.dev/google.golang.org/protobuf@v1.34.2/proto#Clone which is how you're apparently supposed to copy proto messages.
| ctx := context.Background() | ||
| unknownDoc, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/unknown.go")) | ||
| require.NoError(t, err) | ||
| require.True(t, unknownDoc.IsNone()) | ||
|
|
||
| mappedDocumentResult, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go")) | ||
| require.NoError(t, err) | ||
| mappedDocument := mappedDocumentResult.Unwrap() | ||
|
|
||
| occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1)) | ||
| require.NoError(t, err) | ||
| require.Len(t, occurrences, 1) | ||
| require.Equal(t, scip.NewRangeUnchecked(occurrences[0].GetRange()).Start.Line, int32(1)) | ||
|
|
||
| noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(4)) | ||
| require.NoError(t, err) | ||
| require.Len(t, noOccurrences, 0) | ||
|
|
||
| allOccurrences, err := mappedDocument.GetOccurrences(ctx) | ||
| require.NoError(t, err) | ||
| require.Len(t, allOccurrences, 3) |
There was a problem hiding this comment.
Let's convert this to a property based test instead?
| targetCommit, upload, lsifStore := setupSimpleUpload() | ||
| translator := shiftAllTranslator(targetCommit, -2) | ||
| mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload) | ||
|
|
||
| ctx := context.Background() | ||
| mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go")) | ||
| require.NoError(t, err) | ||
| mappedDocument := mappedDocumentOption.Unwrap() | ||
|
|
||
| noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1)) | ||
| require.NoError(t, err) | ||
| require.Len(t, noOccurrences, 0) | ||
|
|
||
| occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2)) | ||
| require.NoError(t, err) | ||
| require.Len(t, occurrences, 1) | ||
| require.Equal(t, scip.NewRangeUnchecked(occurrences[0].GetRange()).Start.Line, int32(3)) | ||
|
|
||
| allOccurrences, err := mappedDocument.GetOccurrences(ctx) | ||
| require.NoError(t, err) | ||
| require.Len(t, allOccurrences, 3) |
There was a problem hiding this comment.
Let's convert this to a concurrent property based test. You may want to use sourcegraph/conc, that has nicer APIs than the stdlib. Our tests should be run with -race, so this might help catch races should they slip through when modifying the code.
Its only dependency on the service is the searchClient, which we can easily pass in
7943c11 to
8da432f
Compare
|
Thanks for the great review again! I like having the simple non-property-based unit tests here, because they nicely show how to make a faked If that's allright with you, please take another look. |
varungandhi-src
left a comment
There was a problem hiding this comment.
Since I've been busy with support work, I have not had the time to look at all the unit tests carefully, or look at the moved code.
For the code involving locks, I think we should have at least 1 test exercising that code in parallel with 10-100 goroutines.
Approving to unblock.
| upload, lsifStore := setupUpload(indexCommit, "indexRoot/", | ||
| doc("a.go", | ||
| ref("a", 1), | ||
| ref("b", 2), | ||
| ref("c", 3)), | ||
| doc("b.go", | ||
| ref("a", 2))) |
There was a problem hiding this comment.
We already have a DSL for writing unit tests for SCIP indexes called reprolang. You can see root_resolver_test.go for an example; it would've avoided a bunch of the extra code you have here for constructing the right data. More generally, I've been saying that we shouldn't need to hand-write large struct literals and do specific tests against with specific counts/structures. That makes both reading and updating test code more time-consuming.
Additionally, this test code isn't doing something complex that's not doable with reprolang (e.g. multiple occurrences at the same source range), so the use of a different mini-DSL with standalone functions isn't buying us anything IMO.
There was a problem hiding this comment.
I looked at the test involving reprolang, but they're not suitable for testing syntactic, because the tests need to make it clear they're matching up ranges from search results with ranges in SCIP indices.
reprolang doesn't give me any tools to be explicit about source ranges. The tests you reference instead use autogold to just generate whatever ranges we ended up parsing.
I've been saying that we shouldn't need to hand-write large struct literals and do specific tests against with specific counts/structures. That makes both reading and updating test code more time-consuming.
I find the use of auto-gold makes reading test-code very time consuming, as it machine-writes large struct literals in the middle of the code I'm reading.
There was a problem hiding this comment.
Let's discuss this later, it's not a blocking concern.
| return targetCommit, upload, lsifStore | ||
| } | ||
|
|
||
| func TestMappedIndex_GetDocumentNoTranslation(t *testing.T) { |
There was a problem hiding this comment.
Since these tests are inside mapped_index_test.go, you could drop the MappedIndex_ prefix.
There was a problem hiding this comment.
But they're in the same package as all the other tests. This naming conventation seems to be everywhere I look? (without the _, but eg. tests in service_hover_test.go all start with TestHover)
There was a problem hiding this comment.
Sure, it was a minor suggestion, it's fine to leave it in.
| translator := shiftAllTranslator(targetCommit, -2) | ||
| mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload) | ||
|
|
||
| ctx := context.Background() | ||
| mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go")) | ||
| require.NoError(t, err) | ||
| mappedDocument := mappedDocumentOption.Unwrap() | ||
|
|
||
| noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1)) | ||
| require.NoError(t, err) | ||
| require.Len(t, noOccurrences, 0) | ||
|
|
||
| occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2)) |
There was a problem hiding this comment.
This code is a bit confusing due to the sign difference (-2, +2); the directionality is not obvious.
There was a problem hiding this comment.
Yeah, it was confusing to me as well initially. The issue is that the gitTreeTranslator passed into the service uses the target commit as its "TranslationBase", which means the direction index -> target sets the reverse flag to true.
Hopefully this gets more straightforward when we remove TranslationBase from GitTreeTranslator and just explicitly request from and to commits for every operation.
I can add a comment for now.
kritzcreek
left a comment
There was a problem hiding this comment.
I tried writing a test like you suggested, but can't get it to fail even after removing all the of the locking calls and adding an artificial thread.Sleep to the translator mock.
Unless I can get the test to fail somewhat regular when I remove the locks it doesn't add any confidence for me.
func TestMappedIndex_ConcurrentIndex(t *testing.T) {
targetCommit, upload, lsifStore := setupSimpleUpload()
translator := fakeTranslator(targetCommit, -2, func(_ string, _ shared.Range) bool {
return false
}, 250)
mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload)
ctx := context.Background()
mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go"))
require.NoError(t, err)
mappedDocument := mappedDocumentOption.Unwrap()
for i := 0; i < 100; i++ {
wg := conc.NewWaitGroup()
for i := 0; i < 50; i++ {
wg.Go(func() {
occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2))
require.NoError(t, err)
require.Len(t, occurrences, 1)
require.Equal(t, scip.NewRangeUnchecked(occurrences[0].GetRange()).Start.Line, int32(3))
})
}
for i := 0; i < 10; i++ {
wg.Go(func() {
allOccurrences, err := mappedDocument.GetOccurrences(ctx)
require.NoError(t, err)
require.Len(t, allOccurrences, 3)
})
}
wg.Wait()
}
}| upload, lsifStore := setupUpload(indexCommit, "indexRoot/", | ||
| doc("a.go", | ||
| ref("a", 1), | ||
| ref("b", 2), | ||
| ref("c", 3)), | ||
| doc("b.go", | ||
| ref("a", 2))) |
There was a problem hiding this comment.
I looked at the test involving reprolang, but they're not suitable for testing syntactic, because the tests need to make it clear they're matching up ranges from search results with ranges in SCIP indices.
reprolang doesn't give me any tools to be explicit about source ranges. The tests you reference instead use autogold to just generate whatever ranges we ended up parsing.
I've been saying that we shouldn't need to hand-write large struct literals and do specific tests against with specific counts/structures. That makes both reading and updating test code more time-consuming.
I find the use of auto-gold makes reading test-code very time consuming, as it machine-writes large struct literals in the middle of the code I'm reading.
| return targetCommit, upload, lsifStore | ||
| } | ||
|
|
||
| func TestMappedIndex_GetDocumentNoTranslation(t *testing.T) { |
There was a problem hiding this comment.
But they're in the same package as all the other tests. This naming conventation seems to be everywhere I look? (without the _, but eg. tests in service_hover_test.go all start with TestHover)
| translator := shiftAllTranslator(targetCommit, -2) | ||
| mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload) | ||
|
|
||
| ctx := context.Background() | ||
| mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go")) | ||
| require.NoError(t, err) | ||
| mappedDocument := mappedDocumentOption.Unwrap() | ||
|
|
||
| noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1)) | ||
| require.NoError(t, err) | ||
| require.Len(t, noOccurrences, 0) | ||
|
|
||
| occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2)) |
There was a problem hiding this comment.
Yeah, it was confusing to me as well initially. The issue is that the gitTreeTranslator passed into the service uses the target commit as its "TranslationBase", which means the direction index -> target sets the reverse flag to true.
Hopefully this gets more straightforward when we remove TranslationBase from GitTreeTranslator and just explicitly request from and to commits for every operation.
I can add a comment for now.
Closes https://linear.app/sourcegraph/issue/GRAPH-735/test-syntactic-usages Uses the new MappedIndex abstraction from #63781 to implement some proper unit tests for `syntacticUsagesImpl`. ## Test plan PR adds a bunch of tests
Precursor for https://linear.app/sourcegraph/issue/GRAPH-735/test-syntactic-usages
This PR introduces the
MappedIndexabstraction, which wraps up an upload with a target commit. Its APIs then take care of mapping upload relative paths and repo relative paths, and ranges across commits.My main motivation for making this change is that I can now test this logic in isolation (which this PR does), and also have an interface that is easy to fake and use to test syntactic usages.
Test plan
Added unit tests for the
MappedIndexcomponent, manual testing of the GraphQL APIs show no changes in the syntactic usages output between this PR and master.