Codenav: use new occurrences API for symbol definitions#63217
Conversation
8aca5c7 to
7389f41
Compare
6c2e464 to
5424f4e
Compare
| // The raw occurrences as returned by the API | ||
| occurrences: Occurrence[] |
There was a problem hiding this comment.
We don't actually use the non-flattened occurrences yet, but I wanted to maintain it here so we can use that higher-fidelity overlapping set of occurrences once we have a UI that can support that.
varungandhi-src
left a comment
There was a problem hiding this comment.
Once release tests are live, I plan to make a few end-to-end tests for all this
I think this would be good. I'm a bit worried about this silently regressing due to a change in seemingly unrelated logic somewhere.
| new Position(occ.range.end.line, occ.range.end.character) | ||
| ), | ||
| undefined, | ||
| occ.symbol ?? undefined, |
There was a problem hiding this comment.
There is a small optimization opportunity here; if you interned the symbol strings in a map/set, that would likely help reduce memory usage, especially for large files.
There was a problem hiding this comment.
Not sure I'm following. Wouldn't the symbol string of occurrences be unique within a file, meaning interning them would have no effect? Or would I need to do something like trim the common prefix (like I seem to remember we do for storage)?
There was a problem hiding this comment.
Occurrences that are definitions/references to the same semantic symbol should have the same symbol string. E.g. if you have:
package abc
func Loop() {
Loop()
}You'll end up with two copies of a symbol string like scip-go gomod <pkg> v0.1.0 "github.com/sourcegraph/abc".abc/Loop() (the exact format may be different).
This is how the fast path for locals code navigation in the client code works today (albeit, the data there is coming from Tree-sitter, and not from precise data, so the string lengths are shorter). https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/codeintel/useSearchBasedCodeIntel.ts?L122:10-122:44
There was a problem hiding this comment.
Ah! Got it -- thanks!
| // when two ranges overlap, we pick the syntax kind of the occurrence with the | ||
| // shortest distance between start/end. | ||
| function nonOverlappingOccurrences(occurrences: Occurrence[]): Occurrence[] { | ||
| export function nonOverlappingOccurrences(occurrences: Occurrence[]): Occurrence[] { |
There was a problem hiding this comment.
Not exactly part of this PR, but:
- Is the first
sortcall inside this function actually needed for the new call-site? - It seems a bit odd that this function is calling
sortagain at the end. - Maybe this is covered by unit tests, but I'm having trouble justifying that the logic in this function does what it is supposed to do:
- First, it is sorting the occurrences by range in descending order (
sort()followed byreverse()) - It is iterating backwards through the occurrences (perhaps so that it can use pop/push instead of raw index manipulation)
- While iterating, it seems to hit this in the overlapping case, which seems odd
if (current.range.isOverlapping(next.range)) {
pushResult(current.withEndPosition(next.range.start))
stack.push(current.withStartPosition(next.range.end))
Here, it will end up materializing a fake range with start = next.range.end and end = current.range.start which will be a zero or negative length range, since we expect that current.range.start <= next.range.start <= next.range.end
| } | ||
|
|
||
| // Returns the occurrence at this position based on data from the GraphQL occurrences API. | ||
| function scipOccurrenceAtPosition(data: CodeGraphData[], position: Position): Occurrence | undefined { |
There was a problem hiding this comment.
Thoughts on adding some tests for this to make sure the index logic works correctly in all cases?
varungandhi-src
left a comment
There was a problem hiding this comment.
Accidentally hit Comment instead of Approve.
The SCIP-related logic in this PR looks good, and I'm happy that we can fix the issue with hoverability affecting multiple languages.
The logic in the nonOverlappingOccurrences doesn't make me feel terribly confident, I suspect it might be simpler to replace it with an O(N) approach with index manipulation (no sorting since the input is guaranteed to be sorted), but I don't have a concrete suggestion, and that function wasn't introduced as part of this PR.
| if (!precise && markdownContents.length > 0 && !isInteractiveOccurrence(occurrence)) { | ||
| return null | ||
| } | ||
| if (markdownContents === '' && isInteractiveOccurrence(occurrence)) { |
There was a problem hiding this comment.
Just double checking: I assume this won't suddenly cause 'No hover information' to be shown for e.g. keywords in the React app because er still return early in line 254 if we don't have an occurrence at that position anyway.
|
FYI, holding off on merge because of GRAPH-705 |
| new Position(occ.range.end.line, occ.range.end.character) | ||
| ), | ||
| undefined, | ||
| occ.symbol ?? undefined, |
There was a problem hiding this comment.
Occurrences that are definitions/references to the same semantic symbol should have the same symbol string. E.g. if you have:
package abc
func Loop() {
Loop()
}You'll end up with two copies of a symbol string like scip-go gomod <pkg> v0.1.0 "github.com/sourcegraph/abc".abc/Loop() (the exact format may be different).
This is how the fast path for locals code navigation in the client code works today (albeit, the data there is coming from Tree-sitter, and not from precise data, so the string lengths are shorter). https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/client/web/src/enterprise/codeintel/useSearchBasedCodeIntel.ts?L122:10-122:44
There was a problem hiding this comment.
How exactly do you want this facet to work? Usually facets enable multiple places to provide input, which is then aggregated somehow (either by default or in a custom way). E.g. if you want to allow input to be provided from multiple sources, I'd change the definition to
export const codeGraphData = Facet.define<CodeGraphData>()and then provide the input as
$: codeGraph = codeGraphDataFacet.computeN([], () => codeGraphData)(at least I think that works)
If having static: true is important then we can't use compute I think. In which case, if you still want to allow multiple inputs, we should do
export const codeGraphData = Facet.define<CodeGraphData[], CodeGraphData[]>({
static: true,
combine: values => values.flat()
})Of course if you only want to accept the most high pri input then what you have here is fine, though not obvious without a doc comment.
There was a problem hiding this comment.
I don't think having static: true is important, and right now we should only ever have one value for this facet, so I didn't bother making it work for multiple inputs (though we possibly should in the future). Will fixup and add a comment to clarify.
729fd8b to
60b19a0
Compare
This is a bit of refactoring as a followup to #63217. The goal is to unify how we use `Occurrences` so that we can use the same interface for both syntax highlighting data and occurrence data from the new API. An additional goal is to encode more of the invariants in the type system so it's more difficult to use incorrectly.
This integrates the new occurrences API into the Svelte webapp. This fixes a number of issues where the syntax highlighting data is not an accurate way to determine hoverable tokens.
(In the Svelte app only:)
Fixes SRCH-582
Fixes SRCH-162 (validated)
Fixes SRCH-299 (validated)
Fixes SRCH-223 (not validated -- do we know of any groovy repos we have with precise indexing set up?)
Missing any?
Test plan
A bunch of manual testing that the files referenced in those issues now use the data from the occurrences API to provide more accurate hoverable token info. Once release tests are live, I plan to make a few end-to-end tests for all this. I attempted to make a playwright integration test for the hover ranges, but it turns out that it's quite difficult to target a specific text range in playwright. I've added it to my list for our next pairing time.