Ranking: standardize ctags kind names before scoring#674
Conversation
| type SymbolKind uint8 | ||
|
|
||
| const ( | ||
| Accessor SymbolKind = iota |
There was a problem hiding this comment.
Working on this PR got me thinking ... it'd be nice to just use SCIP as the exchange format instead of ctags output, and have a tool to map universal-ctags onto SCIP. That would get us closer to an actual spec, unlike the ctags output which has inconsistent naming and an unknown universe of values.
There was a problem hiding this comment.
100% agree with this. I think this was discussed when scip-ctags was added and that was seen as an part of the end goal.
| t.Fatal(err) | ||
| } | ||
|
|
||
| examplePython, err := os.ReadFile("./testdata/example.py") |
There was a problem hiding this comment.
In a follow-up, I'll try to pull in SCIP ctags so we can run the same tests using that binary.
| type SymbolKind uint8 | ||
|
|
||
| const ( | ||
| Accessor SymbolKind = iota |
There was a problem hiding this comment.
100% agree with this. I think this was discussed when scip-ctags was added and that was seen as an part of the end goal.
SCIP ctags can output different kind names than universal-ctags (for example `typeAlias` instead of `talias`). This change makes sure we handle different names for the same kind. To do so, it refactors the logic so we first match strings to standard kinds, then decide how these are scored for each language. That way, you don't need to remember to cover all the possible kind names each time you adjust scoring for a new language. Also added basic tests for Ruby and Python to ensure we don't accidentally change the scoring.
To implement `select:symbol.enum` filters, we look at each symbol's ctags 'kind' and check if it matches the filter value `enum`. We accidentally didn't include 'enum' in this match logic, so all these symbols were filtered away. This PR fixes that, and adds a few improvements: * Use a shared map between `symbol.LSPKind` and `symbol.SelectKind`, to avoid drift between these two conversions. * Audit the ctags mapping from [sourcegraph/zoekt#674](sourcegraph/zoekt#674) and add other missing kinds (besides enum) Closes SPLF-178
SCIP ctags can output different kind names than universal-ctags (for example
typeAliasinstead oftalias). This change makes sure we handle differentnames for the same kind. To do so, it refactors the logic so we first match
strings to standard kinds, then decide how these are scored for each language.
That way, you don't need to remember to cover all the possible kind names each
time you adjust scoring for a new language.
Also added basic tests for Ruby and Python to ensure we don't accidentally
change the scoring.
Relates to https://github.com/sourcegraph/sourcegraph/issues/57659