Remove AST-node dependency from FunctionType and ClassType#14087
Remove AST-node dependency from FunctionType and ClassType#14087MichaReiser merged 6 commits intomainfrom
FunctionType and ClassType#14087Conversation
ScopeId::node to Scope to avoid depending on AST nodeType
817ee80 to
7908514
Compare
|
079e46a to
1fc3dc4
Compare
CodSpeed Performance ReportMerging #14087 will improve performances by 12.36%Comparing Summary
Benchmarks breakdown
|
|
Lol what |
1fc3dc4 to
a1a709f
Compare
TypeFunctionType and ClassType
a1a709f to
6d5f492
Compare
| /// would depend on the function's AST and rerun for every change in that file. | ||
| pub fn return_ty(self, db: &'db dyn Db) -> Type<'db> { | ||
| let scope = self.body_scope(db); | ||
| let function_stmt_node = scope.node(db).expect_function(); |
There was a problem hiding this comment.
The issue without this being a salsa query is that the calling-query suddenly depends on the function's ast (and, therefore, the file).
In the case of our benchmark: _parser.py calls match_to_datetime which is defined in _re.py. Now, it shouldn't be necessary to recheck _parser.py because no type in _re.py changed. However, Salsa had to recheck every callsite because the return_type query accessed the definition.kind(db) which is the AST node of the match_to_datetime and its AST has changed (not really, but we use no_eq...)
I verified that making return_ty is the reason for the perf improvement by removing the #[salsa::tracked].
0b80226 to
6d5f492
Compare
f93cea2 to
4bb6209
Compare
4bb6209 to
a4f8f95
Compare
crates/red_knot_python_semantic/src/semantic_index/definition.rs
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/src/semantic_index/expression.rs
Outdated
Show resolved
Hide resolved
a4f8f95 to
51a9983
Compare
AlexWaygood
left a comment
There was a problem hiding this comment.
LGTM other than my remaining docs nits and one further optional comment:
carljm
left a comment
There was a problem hiding this comment.
Code changes here look great, thanks for sorting this out!
My main concern is that I think the guidance here is going to prove quite difficult to remember and follow manually and solely via code review; it's just very easy for these things to leak. It seems like it should be possible to write tests that are similar to what happen in our benchmark, just verifying that if you make a no-op change to one file, it doesn't cause re-execution of queries in another file that shouldn't need to re-execute. I think a few tests like that could go a really long way in helping us avoid regression here.
crates/red_knot_python_semantic/src/semantic_index/definition.rs
Outdated
Show resolved
Hide resolved
# Conflicts: # crates/red_knot_python_semantic/src/types.rs
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
@carljm I agree that a test would be nice. I added one for call inference although I'm not sure how valuable it is because it is rather fragile (a slight change to inference will break the test itself). I don't know how to write the ideal test that:
The problem here is that salsa doesn't provide the necessary reflection API to answer the second question. We could log all events but that test is likely going to break with every inference test AND it's kind of impossible to tell from a I don't mind adding more tests if you have suggestions on how to test this, but I'll leave it at what I have for now. I do think that our benchmarks do a decent job at this anyway. I would hope that we would investigate a 12% perf regression ;) I don't think there's a way to test deserialization in a meaningful before salsa adds persistent caching support. |
f3bec2d to
26cd7ed
Compare
26cd7ed to
969461f
Compare
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
|
The added test looks good to me, thank you! |
Summary
This PR improves Red Knot's incremental computation by:
FunctionType::return_typeandClassType::explicit_basessalsa queries. I suspect this is where the main performance improvement comes from. Making these two functions salsa queries prevents that the class or function's node (accessed throughdefinition.node) is marked as a dependency at the callee side. Instead, the callee only depends on whatever the return type or bases are.FunctionType::definitionandClassType::definitionand instead usesemantic_index.definition(..)to lookup the type's definition lazily. This ensures that deserializing a type doesn't require deserializing the AST as well.ScopeId::nodeand move it toScope::node. Same as forFunctionType::definition. Moving theNodefrom theScopeIdto the definition allows deserializing theScopeIdwithout deserializing the AST node. This is important because bothFunctionTypeandClassTypehave aScopeIdfield.Fixes #14054
Test Plan
cargo test