[ty] Use Salsa to cache workspace symbols by BurntSushi · Pull Request #20030 · astral-sh/ruff

BurntSushi · 2025-08-21T18:59:03Z

This PR pretty much does what astral-sh/ty#998 suggests: it turns the
symbol gathering at the file level into a Salsa query while being
careful not to make the query string a dependency of that query. This
lets Salsa cache the symbols on a per-file basis, while we do the
filtering in a separate aggregation step.

This PR also includes a (small) bonus optimization using a regex for
query filtering. :-)

Fixes astral-sh/ty#998

github-actions · 2025-08-21T19:00:52Z

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

github-actions · 2025-08-21T19:03:00Z

`mypy_primer` results

No ecosystem changes detected ✅
No memory usage changes detected ✅

github-actions · 2025-08-21T19:10:17Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dhruvmanila

Looks great! Do we have any numbers on the difference between a few requests like (a) no query (b) with small and large query string?

crates/ty_ide/src/symbols.rs

crates/ty_server/src/server/api/requests/workspace_symbols.rs

MichaReiser

I think there are ways on how we could improve the caching further (or, at least, reduce the caching overhead) and concurrency should make this much faster.

It does hurt us a little that we garbage collect parsed ASTs when workspace diagnostics are enabled, but that's out of scope for this change

crates/ty_ide/src/symbols.rs

crates/ty_server/src/server/api/requests/workspace_symbols.rs

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

Useful for ad hoc debugging, but it's also useful to have permanently to enable serendipitous discovery of performance problems.

This is prep work for turning this into a Salsa query. Specifically, we don't want the Salsa query to be dependent upon the query string.

There is a small amount of subtlety to this matching routine, and it could be implemented in a faster way. So let's right some tests for what we have to ensure we don't break anything when we optimize it.

While this doesn't typically matter, when ty returns a very large list of symbols, this can have an impact. Specifically, when searching `async` in home-assistant, this gets times closer to 500ms versus closer to 600ms before this change. It looks like an overall ~50ms improvement (so around 10%), but variance is all over the place and I didn't do any statistical tests. But this does make intuitive sense. Previously, we were allocating intermediate strings, doing UTF-8 decoding and consulting Unicode casing tables. Now we're just doing what is likely a single DFA scan. In effect, we front load all of the Unicode junk into regex compilation.

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

BurntSushi · 2025-08-23T12:52:36Z

Do we have any numbers on the difference between a few requests like (a) no query (b) with small and large query string?

After the initial cold run, requests generally take a few dozen milliseconds. There isn't a ton of variation with respect to query string length (which is what I'd expect I think).

The slowest part of the process here is when we return a lot of symbols and VS Code needs to update its drop-down UI. There's a fairly sizeable delay there (on the order of a couple seconds). But to VS Code's credit, if you keep typing, it's pretty good about stopping what it was doing and moving on to the next request.

It overall feels pretty good. It'd be nice if VS Code was a little snappier, but I think the only thing we could do there is return fewer symbols. And I'm not sure if that's a good idea.

MichaReiser

Nice! Thank you for figuring out this tricky representation. This required more work than I expected

MichaReiser · 2025-08-23T13:04:50Z

crates/ty_ide/src/symbols.rs

+        let mut last_end: usize = 0;
+        for (tree, child_symbol_ids) in self.symbols.iter().zip(children_ids) {
+            let start = last_end;
+            let end = start + child_symbol_ids.len();


I'm a bit surprised this works. Is it guaranteed that we add all symbols (children) in breadth-first and not depth-first order? Or am I overlooking something here that makes this possible?

We use a similar representation for Scope in the SemanticIndex where each Scope stores a Range<FileScopeId>. However, that range doesn't capture the Scope's children; it's the Scope's descendants. Finding the Scope's children requires iterating over the descendants and skipping all those with a different parent.

It's not fully clear to me why we don't have the descendants problem here. Maybe it's worth adding a comment why start + child_symbol_ids.len() works.

ruff/crates/ty_python_semantic/src/semantic_index.rs

Lines 612 to 615 in 4ac2b2c

pub(crate) struct DescendantsIter<'a> {

next_id: FileScopeId,

descendants: std::slice::Iter<'a, Scope>,

}

Yeah I can add a comment. It works because we guarantee all parents are added before their children in the visitor. I think I wrote a comment about that somewhere, but I should make it more prominent.

All this code is doing is taking the Map<SymbolId, Vec<SymbolId>> representation created above and flattening it.

Basically, this splits the implementation into two pieces: the first piece does the traversal and finds *all* symbols across the workspace. The second piece does filtering based on a user provided query string. Only the first piece is cached by Salsa. This brings warm "workspace symbols" requests down from 500-600ms to 100-200ms.

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

In effect, we make the Salsa query aspect keyed only on whether we want global symbols. We move everything else (hierarchical and querying) to an aggregate step *after* the query. This was a somewhat involved change since we want to return a flattened list from visiting the source while also preserving enough information to reform the symbols into a hierarchical structure that the LSP expects. But I think overall the API has gotten simpler and we encode more invariants into the type system. (For example, previously you got a runtime assertion if you tried to provide a query string while enabling hierarchical mode. But now that's prevented by construction.)

This makes use of early continue/return to keep rightward drift under control. (I also find it easier to read.)

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: astral-sh#20030 (comment)

BurntSushi requested review from AlexWaygood, MichaReiser, carljm, dcreager and sharkdp as code owners August 21, 2025 18:59

AlexWaygood added server Related to the LSP server ty Multi-file analysis & type inference labels Aug 21, 2025

BurntSushi removed request for AlexWaygood, carljm, dcreager and sharkdp August 21, 2025 19:05

dhruvmanila approved these changes Aug 22, 2025

View reviewed changes

crates/ty_ide/src/symbols.rs Outdated Show resolved Hide resolved

crates/ty_ide/src/symbols.rs Outdated Show resolved Hide resolved

crates/ty_server/src/server/api/requests/workspace_symbols.rs Show resolved Hide resolved

MichaReiser approved these changes Aug 22, 2025

View reviewed changes

crates/ty_ide/src/symbols.rs Outdated Show resolved Hide resolved

crates/ty_ide/src/symbols.rs Outdated Show resolved Hide resolved

crates/ty_ide/src/symbols.rs Outdated Show resolved Hide resolved

crates/ty_server/src/server/api/requests/workspace_symbols.rs Show resolved Hide resolved

BurntSushi added a commit that referenced this pull request Aug 22, 2025

[ty] Parallelize workspace symbols

78d87ea

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

BurntSushi force-pushed the ag/workspace-symbol-caching branch from e195b9d to 78d87ea Compare August 22, 2025 15:57

BurntSushi added 5 commits August 23, 2025 08:34

[ty] Add a TODO for linting on todo!

ec2bc15

[ty] Add debug trace for workspace symbol elapsed time

1541736

Useful for ad hoc debugging, but it's also useful to have permanently to enable serendipitous discovery of performance problems.

[ty] Move query filtering outside of symbol visitor

5de94a7

This is prep work for turning this into a Salsa query. Specifically, we don't want the Salsa query to be dependent upon the query string.

[ty] Add some unit tests for "query matches symbol"

436b5cd

There is a small amount of subtlety to this matching routine, and it could be implemented in a faster way. So let's right some tests for what we have to ensure we don't break anything when we optimize it.

BurntSushi added a commit that referenced this pull request Aug 23, 2025

[ty] Parallelize workspace symbols

d57d87b

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

BurntSushi force-pushed the ag/workspace-symbol-caching branch 2 times, most recently from 17e7b0f to fd8109c Compare August 23, 2025 12:48

MichaReiser approved these changes Aug 23, 2025

View reviewed changes

BurntSushi added 2 commits August 23, 2025 12:19

[ty] Parallelize workspace symbols

35beaf8

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

BurntSushi added 2 commits August 23, 2025 12:26

[ty] Lightly refactor document symbols AST visitor

428a532

This makes use of early continue/return to keep rightward drift under control. (I also find it easier to read.)

BurntSushi force-pushed the ag/workspace-symbol-caching branch from fd8109c to 428a532 Compare August 23, 2025 16:40

BurntSushi merged commit e723765 into main Aug 23, 2025
38 checks passed

BurntSushi added a commit that referenced this pull request Aug 23, 2025

[ty] Parallelize workspace symbols

f407f12

This is a pretty naive approach, but it makes cold times for listing workspace symbols in home-assistant under 1s on my machine. Courtesy of Micha: #20030 (comment)

BurntSushi deleted the ag/workspace-symbol-caching branch August 23, 2025 16:53

MichaReiser mentioned this pull request Dec 11, 2025

[ty] Fix workspace symbols to return members too #21926

Merged

	pub(crate) struct DescendantsIter<'a> {
	next_id: FileScopeId,
	descendants: std::slice::Iter<'a, Scope>,
	}

Conversation

BurntSushi commented Aug 21, 2025

Uh oh!

github-actions bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagnostic diff on typing conformance tests

Uh oh!

github-actions bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

github-actions bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

dhruvmanila left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BurntSushi commented Aug 23, 2025

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

MichaReiser Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

BurntSushi Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Aug 21, 2025 •

edited

Loading

github-actions bot commented Aug 21, 2025 •

edited

Loading

`mypy_primer` results

github-actions bot commented Aug 21, 2025 •

edited

Loading

`ruff-ecosystem` results