[ty] Add support for the LSP protocol's "type hierarchy" feature#23566
[ty] Add support for the LSP protocol's "type hierarchy" feature#23566BurntSushi merged 7 commits intomainfrom
Conversation
|
Demo with a small project: 2026-02-25T14.43.10-05.00.mp4Demo on Home Assistant (notice that asking for all subtypes for 2026-02-25T14.44.53-05.00.mp4 |
f054016 to
8f4c478
Compare
Typing conformance resultsNo changes detected ✅ |
|
I assigned @dhruvmanila to this PR for review since it's involved with LSP. But I also assigned @AlexWaygood because of the stuff in |
|
Memory usage reportMemory usage unchanged ✅ |
d455f07 to
cbf3b4d
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
cbf3b4d to
0dd2212
Compare
dhruvmanila
left a comment
There was a problem hiding this comment.
Looks great!
I did glance through the semantic changes but it would still be great to have @AlexWaygood take a look as well.
| /// The file containing the class definition. | ||
| pub file: ruff_db::files::File, | ||
| /// The range covering the full class definition header. | ||
| pub full_range: ruff_db::files::FileRange, | ||
| /// The range of the class name (for selection/focus). | ||
| pub selection_range: ruff_db::files::FileRange, | ||
| } |
There was a problem hiding this comment.
Should we just use TextRange for full_range and selection_range because the file is already present on the struct?
There was a problem hiding this comment.
Ah nice catch. Yeah that's much simpler.
| db: &dyn Db, | ||
| class_literal: ClassLiteral<'_>, | ||
| ) -> TypeHierarchyClass { | ||
| let name = class_literal.name(db).to_string(); |
There was a problem hiding this comment.
Can we use the Name struct here instead of .to_string()?
There was a problem hiding this comment.
Fair. I don't think it ends up reducing allocs though. By the time we build this, all of our filtering has already been done. And we eventually need to convert this to a String at the LSP boundary anyway. We'd probably need to define our own LSP wire types to truly eliminate the extra alloc.
I did make this change though for consistency reasons. I agree it's still nice to use Name when we can.
| Type::Union(union) => union | ||
| .elements(db) | ||
| .iter() | ||
| .find_map(|elem| extract_class_literal(db, *elem)), |
There was a problem hiding this comment.
Should we return a list of TypeHierarchyItems here instead of picking the first one in reponse to textDocument/prepareTypeHierarchy? Or, is that not the correct way to think about this from an editor perspective?
There was a problem hiding this comment.
Ok, it might get a bit complicated to handle Option<impl Iterator<Item = ClassLiteral>> where extract_class_literal is being used so we can leave this for a follow-up improvement.
| // For the dynamic cases, we special case a variable definition | ||
| // like this: | ||
| // | ||
| // Dynamic = type("Dynamic", (object,), {}) | ||
| // | ||
| // In this case, the range for the element we return will correspond to | ||
| // the left hand side of the variable assignment. This works better as | ||
| // an "anchor" point because it avoids ambiguity with asking for the | ||
| // type hierarchy of `type` itself. | ||
| // | ||
| // If there is not a variable definition, then we fall back to the | ||
| // class definition's "header" range, which will be the `type` (or | ||
| // `namedtuple`) call. Subsequent type hierarchy requests will then | ||
| // (likely incorrectly) return the type hierarchy for `type` itself. | ||
| ClassLiteral::Dynamic(dynamic_class) => { |
There was a problem hiding this comment.
I think the current implementation is perfectly fine but just a random thought where we could possibly use the string value to represent this type hierarchy symbol if there's no assignment and that wouldn't have any subtypes (haven't looked whether it's feasible in terms of follow-up requests). I think this might be a more work for little gains so would avoid until user feedback.
| // Skip files that don't contain the class name. This avoids expensive | ||
| // semantic analysis for files that can't possibly contain a subclass | ||
| // of the target. We can't do this when looking for subtypes of | ||
| // `object` since `object` can be implicit. | ||
| if !target_is_object && !source_text(db, file).contains(target_name.as_str()) { | ||
| continue; |
| // through" it. But what if the user actually wants the type | ||
| // hierarchy for `type`? Maybe we should only recognize the | ||
| // idiom when the cursor is on the `"Dynamic"` string literal. | ||
| // ---AG | ||
| let test = cursor_test( |
There was a problem hiding this comment.
Yeah, this seems reasonable, we can check whether the cursor is on type vs on the "Dynamic" string literal.
| /// This tests that we don't currently respect `__all__` when returning | ||
| /// subtypes. | ||
| #[test] | ||
| fn subtypes_all_not_respected() { |
There was a problem hiding this comment.
I'm not sure if __all__ should be respected in this case, I'd consider this request to return all subtypes irrespective of whether it's a public symbol or not but then we're not considering private modules so an argument can be made to consider __all__ as well. I guess we can defer it based on user feedback.
There was a problem hiding this comment.
Yeah I think waiting on user feedback is wise here. The current "ignore private/test modules but also don't respect __all__" seems to be what pylance does.
| "); | ||
| } | ||
|
|
||
| fn snapshot(db: &dyn Db, items: &[TypeHierarchyItem]) -> String { |
There was a problem hiding this comment.
nit: might be useful to use diagnostic to highlight the class which would allow us to snapshot the range values as well, not urgent and can be pushed as follow-up and could even be a "help wanted" issue
There was a problem hiding this comment.
Yeah I can file an issue about this.
| /// Note that this implements the `BackgroundRequestHandler` because the | ||
| /// request might be for a symbol in a document that is not open in the current | ||
| /// session. | ||
| pub(crate) struct TypeHierarchySupertypesRequestHandler; |
There was a problem hiding this comment.
I think we should have separate files (to follow the current convention e.g., prepare_rename and rename) for each of these request handler, so all in all 3 files each for prepare, subtype and supertype request. The common hierarchy_handler can be moved outside the requests directory similar to server/api/semantic_tokens.rs.
There was a problem hiding this comment.
Oh interesting. I actually originally had 3 separate files to follow the convention, but I found it a lot nicer to put all three in the same file since they are so closely related (and share helpers). Do you feel strongly about this? If not, I'd like to keep it this way. The helpers are highly specific to these request handlers and it feels weird to me to move them somewhere else away from the trait impls.
There was a problem hiding this comment.
I don't feel too strongly, it's mainly for consistency. It should also be easier to go directly to the file for a specific request using a fuzzing finder but that can easily be done by searching for the symbol as well. In addition to the rename traits, the diagnostics are also split this way and uses the server/api/diagnostics.rs as the common file containing shared code.
There was a problem hiding this comment.
Blech, okay. I went back and looked and yeah, this pattern is indeed quite strong. I don't want to be the one to break it, so I've split them back out into separate modules. I've also added a module comment to ty_server::server::api::requests documenting the pattern.
I'm still not really a fan of this though. I would argue that the semantic tokens request handlers should also probably be merged. But nothing else really sticks out to me. The shared code for diagnostics, for example, is used in more request handlers than just the diagnostic ones. And the request handlers for diagnostics vs workspace diagnostics are quite different. So them getting their own modules seems fine.
| // We don't actually know which project the request | ||
| // came from, so just look for results across all | ||
| // projects. | ||
| let mut items = vec![]; |
There was a problem hiding this comment.
I think we can avoid resolving the item location by using the data field in the prepareTypeHierarchy response to store the file URI:
/**
* A data entry field that is preserved between a type hierarchy prepare and
* supertypes or subtypes requests. It could also be used to identify the
* type hierarchy in the server, helping improve the performance on
* resolving supertypes and subtypes.
*/
data?: [LSPAny](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#lspAny);We do this in ruff-lsp between code action and the code action resolve request. You can basically pass in any information you want between the prepare and the actual request.
There was a problem hiding this comment.
Yeah I saw the arbitrary data blob, but it wasn't obvious to me that it would be helpful here? In particular, TypeHierarchyItem already has the file URI (and an offset), so we don't need to additionally store it in a data blob. The resolve_item_location routine is basically just mapping that file URI back into our internal types (either a system path or a vendored path).
BurntSushi
left a comment
There was a problem hiding this comment.
Holding off on pushing my changes based on your feedback because I rebased and don't want to mess up Alex's review if he's in the middle of one. :)
| /// This tests that we don't currently respect `__all__` when returning | ||
| /// subtypes. | ||
| #[test] | ||
| fn subtypes_all_not_respected() { |
There was a problem hiding this comment.
Yeah I think waiting on user feedback is wise here. The current "ignore private/test modules but also don't respect __all__" seems to be what pylance does.
| /// Note that this implements the `BackgroundRequestHandler` because the | ||
| /// request might be for a symbol in a document that is not open in the current | ||
| /// session. | ||
| pub(crate) struct TypeHierarchySupertypesRequestHandler; |
There was a problem hiding this comment.
Oh interesting. I actually originally had 3 separate files to follow the convention, but I found it a lot nicer to put all three in the same file since they are so closely related (and share helpers). Do you feel strongly about this? If not, I'd like to keep it this way. The helpers are highly specific to these request handlers and it feels weird to me to move them somewhere else away from the trait impls.
| // We don't actually know which project the request | ||
| // came from, so just look for results across all | ||
| // projects. | ||
| let mut items = vec![]; |
There was a problem hiding this comment.
Yeah I saw the arbitrary data blob, but it wasn't obvious to me that it would be helpful here? In particular, TypeHierarchyItem already has the file URI (and an offset), so we don't need to additionally store it in a data blob. The resolve_item_location routine is basically just mapping that file URI back into our internal types (either a system path or a vendored path).
| "); | ||
| } | ||
|
|
||
| fn snapshot(db: &dyn Db, items: &[TypeHierarchyItem]) -> String { |
There was a problem hiding this comment.
Yeah I can file an issue about this.
| /// The file containing the class definition. | ||
| pub file: ruff_db::files::File, | ||
| /// The range covering the full class definition header. | ||
| pub full_range: ruff_db::files::FileRange, | ||
| /// The range of the class name (for selection/focus). | ||
| pub selection_range: ruff_db::files::FileRange, | ||
| } |
There was a problem hiding this comment.
Ah nice catch. Yeah that's much simpler.
| db: &dyn Db, | ||
| class_literal: ClassLiteral<'_>, | ||
| ) -> TypeHierarchyClass { | ||
| let name = class_literal.name(db).to_string(); |
There was a problem hiding this comment.
Fair. I don't think it ends up reducing allocs though. By the time we build this, all of our filtering has already been done. And we eventually need to convert this to a String at the LSP boundary anyway. We'd probably need to define our own LSP wire types to truly eliminate the extra alloc.
I did make this change though for consistency reasons. I agree it's still nice to use Name when we can.
crates/ty_ide/src/type_hierarchy.rs
Outdated
| /// The range covering the full class definition. | ||
| pub full_range: FileRange, | ||
| /// The range of the class name (for selection/focus). | ||
| pub selection_range: FileRange, |
|
I haven't started a review yet but it is on my list for today! One question about this:
Does this account for the possibility that classes might be aliased? E.g.
class Foo: ...
from a import Foo as Bar
__all__ = ["Bar"]
from b import Bar
class Baz(Bar): ...The string "Foo" appears nowhere in the source for For real-world examples of where this could cause issues: certain builtin "exception classes" in Python are actually just aliases to ruff/crates/ty_vendored/vendor/typeshed/stdlib/builtins.pyi Lines 4614 to 4617 in 0e19fc9 |
0dd2212 to
88046e8
Compare
|
Ah okay, just pushed then. :) @AlexWaygood It indeed does not account for that. There are tests documenting that limitation. I noted in the OP that I tried to fix this, but there are challenges. Perf being one of them. The other being that my approach ended up with a lot of false positives. I felt like there were higher priority things to work on, and as far as I can tell, pylance has the same limitation. (This manifests in other ways too. e.g., It means we don't discover subtypes that are created through |
88046e8 to
ca7cc1a
Compare
This brings in astral-sh/lsp-types#2, which is necessary to advertise an LSP server's capability to provide type hierarchy information.
These were previously only used in auto-import, but we'll want to use them for filtering subtypes too. So put them in a more central location. I think these codify pretty solid ecosystem conventions, but I've added some cautionary language to the docs for these methods.
…y" feature Most of the interesting logic is inside of `ty_python_semantic`. We include a light wrapper API along with tests in `ty_ide`, which I think follows our existing convention for this sort of thing. Some of the tests demonstrate limitations in the current approach. I believe all such limitations are present in pylance as well. So this should bring us to parity at minimum.
In some cases, the LSP client will send us file paths corresponding to a vendored file in typeshed. We really want to treat this as a `VendoredPath` and not a `SystemPath`. In particular, if we treat it as the latter, we can end up with two different interned `File` values for the same file. And this leads to disastrous things (like definitions no longer being equivalent).
I think the main interesting piece here is mapping file URLs back to their appropriate vendored or system file path type. Otherwise, this mostly just wraps the API exposed in `ty_ide`. We also add a few "sanity check" end-to-end tests, but leave the bulk of the testing responsibility to `ty_ide`.
ca7cc1a to
112240c
Compare
|
@AlexWaygood I'm going to bring this in, but I'm more than happy to do follow-ups. :-) |
The LSP protocol's "type hierarchy" support comes in the form of three
distinct requests:
textDocument/prepareTypeHierarchytypeHierarchy/supertypestypeHierarchy/subtypesThe idea is that
textDocument/prepareTypeHierarchyis called witha particular document offset and that the server returns a non-empty
response if it's valid to ask for super- or sub-types for that
location. The client can then use that information as input to one
of either
typeHierarchy/supertypesortypeHierarchy/subtypestorequest supertypes or subtypes, respectively. Note that these requests
are only for direct supertypes or subtypes.
The main implementation challenge here was discovering subtypes. In
particular, this requires a full scan of the environment and so can
take quite a bit of time on large projects. pylance in particular seems
to take a similar approach, although it is quite a bit slower than
ty. This PR does come with a basic optimization where we won't load a
module for parsing/checking unless it literally contains the class name
we're interested in.
One thing I did attempt was to account for re-exports of classes and also
for subtypes to find dynamic classes. For example:
If one puts the cursor on
Baseand asks for its subtypes, ty willnot include
Dynamic. It's possible to implement this, but 1) perfsuffered quite a bit on bigger projects and 2) I ended up with a lot of
false positives. Probably (2) is resolvable, but I think that's best
served for future work. (I think there are other higher priority things
to work on. And pylance doesn't handle this case either.)
Closes astral-sh/ty#534