Codenav: use new occurrences API for symbol definitions by camdencheek · Pull Request #63217 · sourcegraph/sourcegraph-public-snapshot

camdencheek · 2024-06-11T23:30:21Z

This integrates the new occurrences API into the Svelte webapp. This fixes a number of issues where the syntax highlighting data is not an accurate way to determine hoverable tokens.

(In the Svelte app only:)
Fixes SRCH-582
Fixes SRCH-162 (validated)
Fixes SRCH-299 (validated)
Fixes SRCH-223 (not validated -- do we know of any groovy repos we have with precise indexing set up?)

Missing any?

Test plan

A bunch of manual testing that the files referenced in those issues now use the data from the occurrences API to provide more accurate hoverable token info. Once release tests are live, I plan to make a few end-to-end tests for all this. I attempted to make a playwright integration test for the hover ranges, but it turns out that it's quite difficult to target a specific text range in playwright. I've added it to my list for our next pairing time.

camdencheek · 2024-06-20T17:23:49Z

+    // The raw occurrences as returned by the API
+    occurrences: Occurrence[]


We don't actually use the non-flattened occurrences yet, but I wanted to maintain it here so we can use that higher-fidelity overlapping set of occurrences once we have a UI that can support that.

varungandhi-src

Once release tests are live, I plan to make a few end-to-end tests for all this

I think this would be good. I'm a bit worried about this silently regressing due to a change in seemingly unrelated logic somewhere.

varungandhi-src · 2024-06-24T01:25:29Z

+                            new Position(occ.range.end.line, occ.range.end.character)
+                        ),
+                        undefined,
+                        occ.symbol ?? undefined,


There is a small optimization opportunity here; if you interned the symbol strings in a map/set, that would likely help reduce memory usage, especially for large files.

Not sure I'm following. Wouldn't the symbol string of occurrences be unique within a file, meaning interning them would have no effect? Or would I need to do something like trim the common prefix (like I seem to remember we do for storage)?

Occurrences that are definitions/references to the same semantic symbol should have the same symbol string. E.g. if you have:

package abc func Loop() { Loop() }

You'll end up with two copies of a symbol string like scip-go gomod <pkg> v0.1.0 "github.com/sourcegraph/abc".abc/Loop() (the exact format may be different).

This is how the fast path for locals code navigation in the client code works today (albeit, the data there is coming from Tree-sitter, and not from precise data, so the string lengths are shorter). https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/codeintel/useSearchBasedCodeIntel.ts?L122:10-122:44

Ah! Got it -- thanks!

varungandhi-src · 2024-06-24T01:42:17Z

 // when two ranges overlap, we pick the syntax kind of the occurrence with the
 // shortest distance between start/end.
-function nonOverlappingOccurrences(occurrences: Occurrence[]): Occurrence[] {
+export function nonOverlappingOccurrences(occurrences: Occurrence[]): Occurrence[] {


Not exactly part of this PR, but:

Is the first sort call inside this function actually needed for the new call-site?

It seems a bit odd that this function is calling sort again at the end.

Maybe this is covered by unit tests, but I'm having trouble justifying that the logic in this function does what it is supposed to do:

First, it is sorting the occurrences by range in descending order (sort() followed by reverse())

It is iterating backwards through the occurrences (perhaps so that it can use pop/push instead of raw index manipulation)

While iterating, it seems to hit this in the overlapping case, which seems odd

if (current.range.isOverlapping(next.range)) { pushResult(current.withEndPosition(next.range.start)) stack.push(current.withStartPosition(next.range.end))

Here, it will end up materializing a fake range with start = next.range.end and end = current.range.start which will be a zero or negative length range, since we expect that current.range.start <= next.range.start <= next.range.end

varungandhi-src · 2024-06-24T01:59:33Z

 }

+// Returns the occurrence at this position based on data from the GraphQL occurrences API.
+function scipOccurrenceAtPosition(data: CodeGraphData[], position: Position): Occurrence | undefined {


Thoughts on adding some tests for this to make sure the index logic works correctly in all cases?

varungandhi-src

Accidentally hit Comment instead of Approve.

The SCIP-related logic in this PR looks good, and I'm happy that we can fix the issue with hoverability affecting multiple languages.

The logic in the nonOverlappingOccurrences doesn't make me feel terribly confident, I suspect it might be simpler to replace it with an O(N) approach with index manipulation (no sorting since the input is guaranteed to be sorted), but I don't have a concrete suggestion, and that function wasn't introduced as part of this PR.

fkling · 2024-06-24T12:38:55Z

-                if (!precise && markdownContents.length > 0 && !isInteractiveOccurrence(occurrence)) {
-                    return null
-                }
-                if (markdownContents === '' && isInteractiveOccurrence(occurrence)) {


Just double checking: I assume this won't suddenly cause 'No hover information' to be shown for e.g. keywords in the React app because er still return early in line 254 if we don't have an occurrence at that position anyway.

camdencheek · 2024-06-24T22:42:46Z

FYI, holding off on merge because of GRAPH-705

varungandhi-src · 2024-06-25T00:26:22Z

+                            new Position(occ.range.end.line, occ.range.end.character)
+                        ),
+                        undefined,
+                        occ.symbol ?? undefined,


Occurrences that are definitions/references to the same semantic symbol should have the same symbol string. E.g. if you have:

package abc func Loop() { Loop() }

You'll end up with two copies of a symbol string like scip-go gomod <pkg> v0.1.0 "github.com/sourcegraph/abc".abc/Loop() (the exact format may be different).

This is how the fast path for locals code navigation in the client code works today (albeit, the data there is coming from Tree-sitter, and not from precise data, so the string lengths are shorter). https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/codeintel/useSearchBasedCodeIntel.ts?L122:10-122:44

fkling · 2024-06-26T11:34:38Z

How exactly do you want this facet to work? Usually facets enable multiple places to provide input, which is then aggregated somehow (either by default or in a custom way). E.g. if you want to allow input to be provided from multiple sources, I'd change the definition to

export const codeGraphData = Facet.define<CodeGraphData>()

and then provide the input as

$: codeGraph = codeGraphDataFacet.computeN([], () => codeGraphData)

(at least I think that works)

If having static: true is important then we can't use compute I think. In which case, if you still want to allow multiple inputs, we should do

export const codeGraphData = Facet.define<CodeGraphData[], CodeGraphData[]>({ static: true, combine: values => values.flat() })

Of course if you only want to accept the most high pri input then what you have here is fine, though not obvious without a doc comment.

I don't think having static: true is important, and right now we should only ever have one value for this facet, so I didn't bother making it work for multiple inputs (though we possibly should in the future). Will fixup and add a comment to clarify.

more wip format remove console.log

This is a bit of refactoring as a followup to #63217. The goal is to unify how we use `Occurrences` so that we can use the same interface for both syntax highlighting data and occurrence data from the new API. An additional goal is to encode more of the invariants in the type system so it's more difficult to use incorrectly.

cla-bot Bot added the cla-signed label Jun 11, 2024

camdencheek force-pushed the cc/occurrences-api branch 2 times, most recently from 8aca5c7 to 7389f41 Compare June 17, 2024 19:47

camdencheek changed the title ~~Cc/occurrences api~~ Svelte: use new occurrences API for symbol definitions Jun 18, 2024

camdencheek commented Jun 18, 2024

View reviewed changes

varungandhi-src mentioned this pull request Jun 19, 2024

chore: Add more docs for occurrences GraphQL API #63338

Merged

camdencheek force-pushed the cc/occurrences-api branch from 6c2e464 to 5424f4e Compare June 20, 2024 17:18

camdencheek commented Jun 20, 2024

View reviewed changes

camdencheek changed the title ~~Svelte: use new occurrences API for symbol definitions~~ Codenav: use new occurrences API for symbol definitions Jun 20, 2024

camdencheek commented Jun 21, 2024

View reviewed changes

Comment thread client/web/src/repo/blob/codemirror/occurrence-utils.ts

camdencheek marked this pull request as ready for review June 21, 2024 15:42

camdencheek requested review from a team and varungandhi-src June 21, 2024 15:42

varungandhi-src reviewed Jun 24, 2024

View reviewed changes

varungandhi-src approved these changes Jun 24, 2024

View reviewed changes

fkling reviewed Jun 24, 2024

View reviewed changes

varungandhi-src approved these changes Jun 25, 2024

View reviewed changes

camdencheek mentioned this pull request Jun 25, 2024

Chore: refactoring occurrence indexing #63473

Merged

fkling reviewed Jun 26, 2024

View reviewed changes

camdencheek enabled auto-merge (squash) June 28, 2024 00:09

camdencheek added 9 commits June 27, 2024 18:10

wip

aa2caf4

more wip format remove console.log

extract fetch logic

ecdbfeb

fixup

4cc79b8

format

aa819a4

bump default size

071e958

simplify

3204c9d

remove console.log

6d05b5f

simplify interactivity

74ea0fc

wipo

429bd34

camdencheek added 10 commits June 27, 2024 18:10

format

9d9a3ce

add comments

0a262dd

push isInteractiveOccurrence down

34b3caf

always interactive and binary search

bb8d26c

revert conversion to function

2e7ecd5

bazel configure

20f139a

appease eslint

4463909

paginate to fetch full contents

96a02c1

put it behind an experimental feature

1898b28

add comment

60b19a0

camdencheek force-pushed the cc/occurrences-api branch from 729fd8b to 60b19a0 Compare June 28, 2024 00:10

camdencheek merged commit 6e57dfe into main Jun 28, 2024

camdencheek deleted the cc/occurrences-api branch June 28, 2024 00:26

		// The raw occurrences as returned by the API
		occurrences: Occurrence[]

Conversation

camdencheek commented Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

varungandhi-src left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varungandhi-src left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camdencheek commented Jun 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkling Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

camdencheek commented Jun 11, 2024 •

edited

Loading

fkling Jun 26, 2024 •

edited

Loading