Skip to content

[ty] Use Salsa to cache workspace symbols#20030

Merged
BurntSushi merged 9 commits intomainfrom
ag/workspace-symbol-caching
Aug 23, 2025
Merged

[ty] Use Salsa to cache workspace symbols#20030
BurntSushi merged 9 commits intomainfrom
ag/workspace-symbol-caching

Conversation

@BurntSushi
Copy link
Member

This PR pretty much does what astral-sh/ty#998 suggests: it turns the
symbol gathering at the file level into a Salsa query while being
careful not to make the query string a dependency of that query. This
lets Salsa cache the symbols on a per-file basis, while we do the
filtering in a separate aggregation step.

This PR also includes a (small) bonus optimization using a regex for
query filtering. :-)

Fixes astral-sh/ty#998

@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

mypy_primer results

No ecosystem changes detected ✅
No memory usage changes detected ✅

@AlexWaygood AlexWaygood added server Related to the LSP server ty Multi-file analysis & type inference labels Aug 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Do we have any numbers on the difference between a few requests like (a) no query (b) with small and large query string?

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are ways on how we could improve the caching further (or, at least, reduce the caching overhead) and concurrency should make this much faster.

It does hurt us a little that we garbage collect parsed ASTs when workspace diagnostics are enabled, but that's out of scope for this change

BurntSushi added a commit that referenced this pull request Aug 22, 2025
This is a pretty naive approach, but it makes cold times for listing
workspace symbols in home-assistant under 1s on my machine.

Courtesy of Micha:
#20030 (comment)
@BurntSushi BurntSushi force-pushed the ag/workspace-symbol-caching branch from e195b9d to 78d87ea Compare August 22, 2025 15:57
Useful for ad hoc debugging, but it's also useful to have permanently to
enable serendipitous discovery of performance problems.
This is prep work for turning this into a Salsa query.
Specifically, we don't want the Salsa query to be
dependent upon the query string.
There is a small amount of subtlety to this matching routine,
and it could be implemented in a faster way. So let's right some
tests for what we have to ensure we don't break anything when
we optimize it.
While this doesn't typically matter, when ty returns a very
large list of symbols, this can have an impact. Specifically,
when searching `async` in home-assistant, this gets times
closer to 500ms versus closer to 600ms before this change.
It looks like an overall ~50ms improvement (so around 10%),
but variance is all over the place and I didn't do any
statistical tests.

But this does make intuitive sense. Previously, we were
allocating intermediate strings, doing UTF-8 decoding and
consulting Unicode casing tables. Now we're just doing what
is likely a single DFA scan. In effect, we front load all
of the Unicode junk into regex compilation.
BurntSushi added a commit that referenced this pull request Aug 23, 2025
This is a pretty naive approach, but it makes cold times for listing
workspace symbols in home-assistant under 1s on my machine.

Courtesy of Micha:
#20030 (comment)
@BurntSushi BurntSushi force-pushed the ag/workspace-symbol-caching branch 2 times, most recently from 17e7b0f to fd8109c Compare August 23, 2025 12:48
@BurntSushi
Copy link
Member Author

Do we have any numbers on the difference between a few requests like (a) no query (b) with small and large query string?

After the initial cold run, requests generally take a few dozen milliseconds. There isn't a ton of variation with respect to query string length (which is what I'd expect I think).

The slowest part of the process here is when we return a lot of symbols and VS Code needs to update its drop-down UI. There's a fairly sizeable delay there (on the order of a couple seconds). But to VS Code's credit, if you keep typing, it's pretty good about stopping what it was doing and moving on to the next request.

It overall feels pretty good. It'd be nice if VS Code was a little snappier, but I think the only thing we could do there is return fewer symbols. And I'm not sure if that's a good idea.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thank you for figuring out this tricky representation. This required more work than I expected

let mut last_end: usize = 0;
for (tree, child_symbol_ids) in self.symbols.iter().zip(children_ids) {
let start = last_end;
let end = start + child_symbol_ids.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised this works. Is it guaranteed that we add all symbols (children) in breadth-first and not depth-first order? Or am I overlooking something here that makes this possible?

We use a similar representation for Scope in the SemanticIndex where each Scope stores a Range<FileScopeId>. However, that range doesn't capture the Scope's children; it's the Scope's descendants. Finding the Scope's children requires iterating over the descendants and skipping all those with a different parent.

It's not fully clear to me why we don't have the descendants problem here. Maybe it's worth adding a comment why start + child_symbol_ids.len() works.

pub(crate) struct DescendantsIter<'a> {
next_id: FileScopeId,
descendants: std::slice::Iter<'a, Scope>,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can add a comment. It works because we guarantee all parents are added before their children in the visitor. I think I wrote a comment about that somewhere, but I should make it more prominent.

All this code is doing is taking the Map<SymbolId, Vec<SymbolId>> representation created above and flattening it.

Basically, this splits the implementation into two pieces:
the first piece does the traversal and finds *all* symbols
across the workspace. The second piece does filtering based
on a user provided query string. Only the first piece is
cached by Salsa.

This brings warm "workspace symbols" requests down from
500-600ms to 100-200ms.
This is a pretty naive approach, but it makes cold times for listing
workspace symbols in home-assistant under 1s on my machine.

Courtesy of Micha:
#20030 (comment)
In effect, we make the Salsa query aspect keyed only on whether we want
global symbols. We move everything else (hierarchical and querying) to
an aggregate step *after* the query.

This was a somewhat involved change since we want to return a flattened
list from visiting the source while also preserving enough information
to reform the symbols into a hierarchical structure that the LSP
expects. But I think overall the API has gotten simpler and we encode
more invariants into the type system. (For example, previously you got a
runtime assertion if you tried to provide a query string while enabling
hierarchical mode. But now that's prevented by construction.)
This makes use of early continue/return to keep rightward drift under
control. (I also find it easier to read.)
@BurntSushi BurntSushi force-pushed the ag/workspace-symbol-caching branch from fd8109c to 428a532 Compare August 23, 2025 16:40
@BurntSushi BurntSushi merged commit e723765 into main Aug 23, 2025
38 checks passed
BurntSushi added a commit that referenced this pull request Aug 23, 2025
This is a pretty naive approach, but it makes cold times for listing
workspace symbols in home-assistant under 1s on my machine.

Courtesy of Micha:
#20030 (comment)
@BurntSushi BurntSushi deleted the ag/workspace-symbol-caching branch August 23, 2025 16:53
second-ed pushed a commit to second-ed/ruff that referenced this pull request Sep 9, 2025
This is a pretty naive approach, but it makes cold times for listing
workspace symbols in home-assistant under 1s on my machine.

Courtesy of Micha:
astral-sh#20030 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

server Related to the LSP server ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workspace symbols is very slow

4 participants