Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

refactor(codeintel): Extracts a MappedIndex abstraction over uploads#63781

Merged
kritzcreek merged 15 commits into
mainfrom
christoph/mapped-scip-document
Jul 18, 2024
Merged

refactor(codeintel): Extracts a MappedIndex abstraction over uploads#63781
kritzcreek merged 15 commits into
mainfrom
christoph/mapped-scip-document

Conversation

@kritzcreek

@kritzcreek kritzcreek commented Jul 11, 2024

Copy link
Copy Markdown
Contributor

Precursor for https://linear.app/sourcegraph/issue/GRAPH-735/test-syntactic-usages

This PR introduces the MappedIndex abstraction, which wraps up an upload with a target commit. Its APIs then take care of mapping upload relative paths and repo relative paths, and ranges across commits.

My main motivation for making this change is that I can now test this logic in isolation (which this PR does), and also have an interface that is easy to fake and use to test syntactic usages.

Test plan

Added unit tests for the MappedIndex component, manual testing of the GraphQL APIs show no changes in the syntactic usages output between this PR and master.

@cla-bot cla-bot Bot added the cla-signed label Jul 11, 2024
@github-actions github-actions Bot added team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform labels Jul 11, 2024
@kritzcreek kritzcreek force-pushed the christoph/mapped-scip-document branch from aeb467d to 77ca575 Compare July 15, 2024 02:59
@kritzcreek kritzcreek requested review from keynmol and removed request for keynmol July 15, 2024 12:14
@kritzcreek kritzcreek force-pushed the christoph/mapped-scip-document branch from cafcc25 to af85bad Compare July 15, 2024 12:16
@kritzcreek kritzcreek requested a review from keynmol July 15, 2024 12:17
@kritzcreek kritzcreek changed the title WIP: Extracts a MappedIndex abstraction for syntactic usages refactor(codeintel): Extracts a MappedIndex abstraction over uploads Jul 15, 2024
@kritzcreek kritzcreek marked this pull request as ready for review July 15, 2024 12:18
@kritzcreek kritzcreek force-pushed the christoph/mapped-scip-document branch from af85bad to 52ce4c6 Compare July 15, 2024 12:21
@kritzcreek kritzcreek requested review from varungandhi-src and removed request for keynmol July 15, 2024 12:22
@kritzcreek

Copy link
Copy Markdown
Contributor Author

Sorry for messing around with the reviewers, I think Varun has a lot more context for this change. I'll ask Anton for review when I use this new abstraction to write syntactic usages tests.

@varungandhi-src varungandhi-src left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a brief look, will take a look at the rest tomorrow

)

func (s *store) SCIPDocument(ctx context.Context, uploadID int, path core.UploadRelPath) (_ *scip.Document, err error) {
func (s *store) SCIPDocument(ctx context.Context, uploadID int, path core.UploadRelPath) (_ *scip.Document, found bool, err error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you were going to change this to use core.Option, right? I think the problem you must've run into is that you shouldn't copy scip.Document since it has a mutex. Maybe we can add an API to core.Option that takes in a pointer argument?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how deep I should push the Option, but as this is still within codeintel/ you're right.

Maybe we can add an API to core.Option that takes in a pointer argument?

I don't know if that's required, looks like core.Option[*T] works just fine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like core.Option[*T] works just fine.

In that case, None and Some(nil) would both represent "not found", right? Adding pointer-based APIs to Option would collapse those two states into one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we decide that "not found" in codeintel means option.None we can reclaim *T to mean "never nil, here for mutability/perf reasons"

Comment on lines +18 to +21
type Mapped interface {
IndexCommit() api.CommitID
TargetCommit() api.CommitID
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add doc comments here. Is TargetCommit guaranteed to be a descendant of IndexCommit or can it be unrelated or an ancestor?

Maybe TargetCommit should be called NavCommit or NavigationCommit instead, since the point is that we support code navigation at that commit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TargetCommit guaranteed to be a descendant of IndexCommit or can it be unrelated or an ancestor?

I don't think it matters for the implementation or the caller? The important thing is that all arguments and return values are specified in "TargetCommit" coordinates.
As these functions aren't called at the moment I could also just remove them for now?

Maybe TargetCommit should be called NavCommit or NavigationCommit instead, since the point is that we support code navigation at that commit?

I'm not in love with the "target" name, but as this is an abstraction I don't want to name anything after one of its usage sites. After I merge this PR I was planning to switch the codegraphdata API over to using this as well, at which point it'll be used in a non-navigation context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters for the implementation or the caller?

git diff will work with both git diff A B and git diff B A but we only propagate upload visibility in one direction in the commit graph.

Is your point that if the commit graph mechanism somehow supported visibility in the backwards direction, then this will just handle that case out-of-the-box without changes?

If it doesn't matter, then let's document that it doesn't matter.

After I merge this PR I was planning to switch the codegraphdata API over to using this as well, at which point it'll be used in a non-navigation context

The CodeGraphData API is also present for the purpose of code navigation (specifically, it will be invoked when the user is browsing the file)

It's not a big deal though, leaving the naming up to you.

Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment thread internal/codeintel/codenav/mapped_index.go Outdated
@kritzcreek

Copy link
Copy Markdown
Contributor Author

How about this one? :D Sorry for the back and forth, still getting to grips with concurrency in Go.

Comment thread internal/codeintel/codenav/internal/lsifstore/lsifstore_documents.go Outdated
Comment thread internal/codeintel/codenav/mapped_index.go
Comment thread internal/codeintel/codenav/mapped_index.go
Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment on lines +147 to +155
newOccurrences = append(newOccurrences, &scip.Occurrence{
Range: mappedRange.ToSCIPRange().SCIPRange(),
Symbol: occ.Symbol,
SymbolRoles: occ.SymbolRoles,
OverrideDocumentation: occ.OverrideDocumentation,
SyntaxKind: occ.SyntaxKind,
Diagnostics: occ.Diagnostics,
EnclosingRange: occ.EnclosingRange,
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will be silently incorrect when we add a new field to Occurrence. I know it's a bit icky, but it would be more future-proof to roundtrip serialize->deserialize (through Protobuf) and overwrite the value in the copy. Another alternative is to use reflect.DeepCopy, but I'd be worried about how that might interact with the extra

	state         protoimpl.MessageState
	sizeCache     protoimpl.SizeCache
	unknownFields protoimpl.UnknownFields

fields inside Occurrence created by the Protobuf codegen.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the icky-ness is coming from using protobuf generated types in our domain code. Having to pass pointers everywhere because there's a "DontCopyMe" marker Mutex in there.

Maybe it's worth to consider a layer of types that we deserialize the proto messages into, with exhaustruct annotations etc.

For now I've used https://pkg.go.dev/google.golang.org/protobuf@v1.34.2/proto#Clone which is how you're apparently supposed to copy proto messages.

Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment thread internal/codeintel/codenav/mapped_index.go
Comment thread internal/codeintel/codenav/mapped_index.go Outdated
Comment on lines +173 to +193
ctx := context.Background()
unknownDoc, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/unknown.go"))
require.NoError(t, err)
require.True(t, unknownDoc.IsNone())

mappedDocumentResult, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go"))
require.NoError(t, err)
mappedDocument := mappedDocumentResult.Unwrap()

occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1))
require.NoError(t, err)
require.Len(t, occurrences, 1)
require.Equal(t, scip.NewRangeUnchecked(occurrences[0].GetRange()).Start.Line, int32(1))

noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(4))
require.NoError(t, err)
require.Len(t, noOccurrences, 0)

allOccurrences, err := mappedDocument.GetOccurrences(ctx)
require.NoError(t, err)
require.Len(t, allOccurrences, 3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's convert this to a property based test instead?

Comment on lines +197 to +217
targetCommit, upload, lsifStore := setupSimpleUpload()
translator := shiftAllTranslator(targetCommit, -2)
mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload)

ctx := context.Background()
mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go"))
require.NoError(t, err)
mappedDocument := mappedDocumentOption.Unwrap()

noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1))
require.NoError(t, err)
require.Len(t, noOccurrences, 0)

occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2))
require.NoError(t, err)
require.Len(t, occurrences, 1)
require.Equal(t, scip.NewRangeUnchecked(occurrences[0].GetRange()).Start.Line, int32(3))

allOccurrences, err := mappedDocument.GetOccurrences(ctx)
require.NoError(t, err)
require.Len(t, allOccurrences, 3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's convert this to a concurrent property based test. You may want to use sourcegraph/conc, that has nicer APIs than the stdlib. Our tests should be run with -race, so this might help catch races should they slip through when modifying the code.

Christoph Hegemann added 3 commits July 17, 2024 05:22
Its only dependency on the service is the searchClient, which we can
easily pass in
@kritzcreek kritzcreek force-pushed the christoph/mapped-scip-document branch from 7943c11 to 8da432f Compare July 17, 2024 03:38
@kritzcreek

Copy link
Copy Markdown
Contributor Author

Thanks for the great review again!

I like having the simple non-property-based unit tests here, because they nicely show how to make a faked MappedIndex for the syntactic/search-based usages tests.
I don't mind adding some property based tests later on top, but I don't want to spend a bunch of time on writing generators right now.

If that's allright with you, please take another look.

@varungandhi-src varungandhi-src left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I've been busy with support work, I have not had the time to look at all the unit tests carefully, or look at the moved code.

For the code involving locks, I think we should have at least 1 test exercising that code in parallel with 10-100 goroutines.

Approving to unblock.

Comment on lines +158 to +164
upload, lsifStore := setupUpload(indexCommit, "indexRoot/",
doc("a.go",
ref("a", 1),
ref("b", 2),
ref("c", 3)),
doc("b.go",
ref("a", 2)))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a DSL for writing unit tests for SCIP indexes called reprolang. You can see root_resolver_test.go for an example; it would've avoided a bunch of the extra code you have here for constructing the right data. More generally, I've been saying that we shouldn't need to hand-write large struct literals and do specific tests against with specific counts/structures. That makes both reading and updating test code more time-consuming.

Additionally, this test code isn't doing something complex that's not doable with reprolang (e.g. multiple occurrences at the same source range), so the use of a different mini-DSL with standalone functions isn't buying us anything IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the test involving reprolang, but they're not suitable for testing syntactic, because the tests need to make it clear they're matching up ranges from search results with ranges in SCIP indices.
reprolang doesn't give me any tools to be explicit about source ranges. The tests you reference instead use autogold to just generate whatever ranges we ended up parsing.

I've been saying that we shouldn't need to hand-write large struct literals and do specific tests against with specific counts/structures. That makes both reading and updating test code more time-consuming.

I find the use of auto-gold makes reading test-code very time consuming, as it machine-writes large struct literals in the middle of the code I'm reading.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this later, it's not a blocking concern.

return targetCommit, upload, lsifStore
}

func TestMappedIndex_GetDocumentNoTranslation(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these tests are inside mapped_index_test.go, you could drop the MappedIndex_ prefix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they're in the same package as all the other tests. This naming conventation seems to be everywhere I look? (without the _, but eg. tests in service_hover_test.go all start with TestHover)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it was a minor suggestion, it's fine to leave it in.

Comment on lines +198 to +210
translator := shiftAllTranslator(targetCommit, -2)
mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload)

ctx := context.Background()
mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go"))
require.NoError(t, err)
mappedDocument := mappedDocumentOption.Unwrap()

noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1))
require.NoError(t, err)
require.Len(t, noOccurrences, 0)

occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is a bit confusing due to the sign difference (-2, +2); the directionality is not obvious.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was confusing to me as well initially. The issue is that the gitTreeTranslator passed into the service uses the target commit as its "TranslationBase", which means the direction index -> target sets the reverse flag to true.

Hopefully this gets more straightforward when we remove TranslationBase from GitTreeTranslator and just explicitly request from and to commits for every operation.

I can add a comment for now.

@kritzcreek kritzcreek left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried writing a test like you suggested, but can't get it to fail even after removing all the of the locking calls and adding an artificial thread.Sleep to the translator mock.

Unless I can get the test to fail somewhat regular when I remove the locks it doesn't add any confidence for me.

func TestMappedIndex_ConcurrentIndex(t *testing.T) {
	targetCommit, upload, lsifStore := setupSimpleUpload()
	translator := fakeTranslator(targetCommit, -2, func(_ string, _ shared.Range) bool {
		return false
	}, 250)
	mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload)
	ctx := context.Background()
	mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go"))
	require.NoError(t, err)
	mappedDocument := mappedDocumentOption.Unwrap()

	for i := 0; i < 100; i++ {
		wg := conc.NewWaitGroup()
		for i := 0; i < 50; i++ {
			wg.Go(func() {
				occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2))
				require.NoError(t, err)
				require.Len(t, occurrences, 1)
				require.Equal(t, scip.NewRangeUnchecked(occurrences[0].GetRange()).Start.Line, int32(3))
			})
		}
		for i := 0; i < 10; i++ {
			wg.Go(func() {
				allOccurrences, err := mappedDocument.GetOccurrences(ctx)
				require.NoError(t, err)
				require.Len(t, allOccurrences, 3)
			})
		}
		wg.Wait()
	}
}

Comment on lines +158 to +164
upload, lsifStore := setupUpload(indexCommit, "indexRoot/",
doc("a.go",
ref("a", 1),
ref("b", 2),
ref("c", 3)),
doc("b.go",
ref("a", 2)))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the test involving reprolang, but they're not suitable for testing syntactic, because the tests need to make it clear they're matching up ranges from search results with ranges in SCIP indices.
reprolang doesn't give me any tools to be explicit about source ranges. The tests you reference instead use autogold to just generate whatever ranges we ended up parsing.

I've been saying that we shouldn't need to hand-write large struct literals and do specific tests against with specific counts/structures. That makes both reading and updating test code more time-consuming.

I find the use of auto-gold makes reading test-code very time consuming, as it machine-writes large struct literals in the middle of the code I'm reading.

return targetCommit, upload, lsifStore
}

func TestMappedIndex_GetDocumentNoTranslation(t *testing.T) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they're in the same package as all the other tests. This naming conventation seems to be everywhere I look? (without the _, but eg. tests in service_hover_test.go all start with TestHover)

Comment on lines +198 to +210
translator := shiftAllTranslator(targetCommit, -2)
mappedIndex := NewMappedIndexFromTranslator(lsifStore, translator, upload)

ctx := context.Background()
mappedDocumentOption, err := mappedIndex.GetDocument(ctx, core.NewRepoRelPathUnchecked("indexRoot/a.go"))
require.NoError(t, err)
mappedDocument := mappedDocumentOption.Unwrap()

noOccurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, testRange(1))
require.NoError(t, err)
require.Len(t, noOccurrences, 0)

occurrences, err := mappedDocument.GetOccurrencesAtRange(ctx, shiftSCIPRange(testRange(1), 2))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was confusing to me as well initially. The issue is that the gitTreeTranslator passed into the service uses the target commit as its "TranslationBase", which means the direction index -> target sets the reverse flag to true.

Hopefully this gets more straightforward when we remove TranslationBase from GitTreeTranslator and just explicitly request from and to commits for every operation.

I can add a comment for now.

@kritzcreek kritzcreek merged commit c38e0b1 into main Jul 18, 2024
@kritzcreek kritzcreek deleted the christoph/mapped-scip-document branch July 18, 2024 03:54
kritzcreek pushed a commit that referenced this pull request Jul 22, 2024
Closes
https://linear.app/sourcegraph/issue/GRAPH-735/test-syntactic-usages

Uses the new MappedIndex abstraction from #63781 to implement some
proper unit tests for `syntacticUsagesImpl`.

## Test plan

PR adds a bunch of tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants