Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Codenav: use new occurrences API for symbol definitions#63217

Merged
camdencheek merged 19 commits into
mainfrom
cc/occurrences-api
Jun 28, 2024
Merged

Codenav: use new occurrences API for symbol definitions#63217
camdencheek merged 19 commits into
mainfrom
cc/occurrences-api

Conversation

@camdencheek

@camdencheek camdencheek commented Jun 11, 2024

Copy link
Copy Markdown
Member

This integrates the new occurrences API into the Svelte webapp. This fixes a number of issues where the syntax highlighting data is not an accurate way to determine hoverable tokens.

(In the Svelte app only:)
Fixes SRCH-582
Fixes SRCH-162 (validated)
Fixes SRCH-299 (validated)
Fixes SRCH-223 (not validated -- do we know of any groovy repos we have with precise indexing set up?)

Missing any?

Test plan

A bunch of manual testing that the files referenced in those issues now use the data from the occurrences API to provide more accurate hoverable token info. Once release tests are live, I plan to make a few end-to-end tests for all this. I attempted to make a playwright integration test for the hover ranges, but it turns out that it's quite difficult to target a specific text range in playwright. I've added it to my list for our next pairing time.

@cla-bot cla-bot Bot added the cla-signed label Jun 11, 2024
@camdencheek camdencheek force-pushed the cc/occurrences-api branch 2 times, most recently from 8aca5c7 to 7389f41 Compare June 17, 2024 19:47
@camdencheek camdencheek changed the title Cc/occurrences api Svelte: use new occurrences API for symbol definitions Jun 18, 2024
Comment thread client/web/src/repo/blob/codemirror/occurrence-utils.ts Outdated
Comment thread client/web/src/repo/blob/codemirror/occurrence-utils.ts Outdated
Comment thread client/web/src/repo/blob/codemirror/occurrence-utils.ts Outdated
Comment on lines +12 to +13
// The raw occurrences as returned by the API
occurrences: Occurrence[]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually use the non-flattened occurrences yet, but I wanted to maintain it here so we can use that higher-fidelity overlapping set of occurrences once we have a UI that can support that.

@camdencheek camdencheek changed the title Svelte: use new occurrences API for symbol definitions Codenav: use new occurrences API for symbol definitions Jun 20, 2024
Comment thread client/web/src/repo/blob/codemirror/occurrence-utils.ts
@camdencheek camdencheek marked this pull request as ready for review June 21, 2024 15:42
@camdencheek camdencheek requested review from a team and varungandhi-src June 21, 2024 15:42

@varungandhi-src varungandhi-src left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once release tests are live, I plan to make a few end-to-end tests for all this

I think this would be good. I'm a bit worried about this silently regressing due to a change in seemingly unrelated logic somewhere.

new Position(occ.range.end.line, occ.range.end.character)
),
undefined,
occ.symbol ?? undefined,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small optimization opportunity here; if you interned the symbol strings in a map/set, that would likely help reduce memory usage, especially for large files.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following. Wouldn't the symbol string of occurrences be unique within a file, meaning interning them would have no effect? Or would I need to do something like trim the common prefix (like I seem to remember we do for storage)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Occurrences that are definitions/references to the same semantic symbol should have the same symbol string. E.g. if you have:

package abc

func Loop() {
    Loop()
}

You'll end up with two copies of a symbol string like scip-go gomod <pkg> v0.1.0 "github.com/sourcegraph/abc".abc/Loop() (the exact format may be different).

This is how the fast path for locals code navigation in the client code works today (albeit, the data there is coming from Tree-sitter, and not from precise data, so the string lengths are shorter). https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/codeintel/useSearchBasedCodeIntel.ts?L122:10-122:44

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Got it -- thanks!

// when two ranges overlap, we pick the syntax kind of the occurrence with the
// shortest distance between start/end.
function nonOverlappingOccurrences(occurrences: Occurrence[]): Occurrence[] {
export function nonOverlappingOccurrences(occurrences: Occurrence[]): Occurrence[] {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly part of this PR, but:

  1. Is the first sort call inside this function actually needed for the new call-site?
  2. It seems a bit odd that this function is calling sort again at the end.
  3. Maybe this is covered by unit tests, but I'm having trouble justifying that the logic in this function does what it is supposed to do:
  • First, it is sorting the occurrences by range in descending order (sort() followed by reverse())
  • It is iterating backwards through the occurrences (perhaps so that it can use pop/push instead of raw index manipulation)
    • While iterating, it seems to hit this in the overlapping case, which seems odd
            if (current.range.isOverlapping(next.range)) {
                pushResult(current.withEndPosition(next.range.start))
                stack.push(current.withStartPosition(next.range.end))

Here, it will end up materializing a fake range with start = next.range.end and end = current.range.start which will be a zero or negative length range, since we expect that current.range.start <= next.range.start <= next.range.end

Comment thread client/web/src/repo/blob/codemirror/codeintel/api.ts
Comment thread client/web/src/repo/blob/codemirror/codeintel/occurrences.ts
}

// Returns the occurrence at this position based on data from the GraphQL occurrences API.
function scipOccurrenceAtPosition(data: CodeGraphData[], position: Position): Occurrence | undefined {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding some tests for this to make sure the index logic works correctly in all cases?

@varungandhi-src varungandhi-src left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally hit Comment instead of Approve.

The SCIP-related logic in this PR looks good, and I'm happy that we can fix the issue with hoverability affecting multiple languages.

The logic in the nonOverlappingOccurrences doesn't make me feel terribly confident, I suspect it might be simpler to replace it with an O(N) approach with index manipulation (no sorting since the input is guaranteed to be sorted), but I don't have a concrete suggestion, and that function wasn't introduced as part of this PR.

Comment on lines -278 to -281
if (!precise && markdownContents.length > 0 && !isInteractiveOccurrence(occurrence)) {
return null
}
if (markdownContents === '' && isInteractiveOccurrence(occurrence)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking: I assume this won't suddenly cause 'No hover information' to be shown for e.g. keywords in the React app because er still return early in line 254 if we don't have an occurrence at that position anyway.

@camdencheek

Copy link
Copy Markdown
Member Author

FYI, holding off on merge because of GRAPH-705

new Position(occ.range.end.line, occ.range.end.character)
),
undefined,
occ.symbol ?? undefined,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Occurrences that are definitions/references to the same semantic symbol should have the same symbol string. E.g. if you have:

package abc

func Loop() {
    Loop()
}

You'll end up with two copies of a symbol string like scip-go gomod <pkg> v0.1.0 "github.com/sourcegraph/abc".abc/Loop() (the exact format may be different).

This is how the fast path for locals code navigation in the client code works today (albeit, the data there is coming from Tree-sitter, and not from precise data, so the string lengths are shorter). https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/codeintel/useSearchBasedCodeIntel.ts?L122:10-122:44

Comment on lines 19 to 25

@fkling fkling Jun 26, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly do you want this facet to work? Usually facets enable multiple places to provide input, which is then aggregated somehow (either by default or in a custom way). E.g. if you want to allow input to be provided from multiple sources, I'd change the definition to

export const codeGraphData = Facet.define<CodeGraphData>()

and then provide the input as

$: codeGraph = codeGraphDataFacet.computeN([], () => codeGraphData)

(at least I think that works)

If having static: true is important then we can't use compute I think. In which case, if you still want to allow multiple inputs, we should do

export const codeGraphData = Facet.define<CodeGraphData[], CodeGraphData[]>({
    static: true,
    combine: values => values.flat()
})

Of course if you only want to accept the most high pri input then what you have here is fine, though not obvious without a doc comment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having static: true is important, and right now we should only ever have one value for this facet, so I didn't bother making it work for multiple inputs (though we possibly should in the future). Will fixup and add a comment to clarify.

@camdencheek camdencheek enabled auto-merge (squash) June 28, 2024 00:09
@camdencheek camdencheek merged commit 6e57dfe into main Jun 28, 2024
@camdencheek camdencheek deleted the cc/occurrences-api branch June 28, 2024 00:26
camdencheek added a commit that referenced this pull request Jun 28, 2024
This is a bit of refactoring as a followup to #63217. The goal is to
unify how we use `Occurrences` so that we can use the same interface for
both syntax highlighting data and occurrence data from the new API. An
additional goal is to encode more of the invariants in the type system
so it's more difficult to use incorrectly.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants