Use a single node hierarchy to track statements and expressions#6709
Use a single node hierarchy to track statements and expressions#6709charliermarsh merged 1 commit intomainfrom
Conversation
| pub enum NodeRef<'a> { | ||
| Stmt(&'a Stmt), | ||
| Expr(&'a Expr), | ||
| } |
There was a problem hiding this comment.
@MichaReiser - I really want this API to be capable of returning &Stmt and &Expr so that we can separate this refactor (and some follow-up refactors) from sweeping changes to the Ruff internals to use ExpressionRef etc. everywhere, which we should probably do but don't need to be coupled to this change. I can't figure out how to do this with AnyNodeRef -- is it possible?
There was a problem hiding this comment.
This, unfortunately, isn't possible (and the reason why ExpressionRef exists). It is impossible to go from e.g. a &StmtIf to &Stmt::If
There was a problem hiding this comment.
Can you expand on why / how this motivates ExpressionRef? My understanding is that ExpressionRef has the same problem -- you can't go from ExpressionRef::BoolOp to &Expr, only &ExprBoolOp.
There was a problem hiding this comment.
It's correct that you can't go from ExpressionRef to &Expr. But ExpressionRef allows you to go from AnyNodeRef to ExprRef. Meaning, ExpressionRef gives you a union that is limited to the variants of Expr, but can be converted to from any node.
There was a problem hiding this comment.
Okay this makes sense, but it still has the same problem, which is that if I use AnyNodeRef here, I have to make extensive changes to the rest of the codebase to use StatementRef, ExpressionRef, etc. instead of &Stmt, &Expr, etc.
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinuxWindows |
34aa4f7 to
eae37e0
Compare
Try defining our own AnyNode Try changing
eae37e0 to
ff51957
Compare
|
No significant changes before / after: |
|
Hyperfine benchmarking shows maybe a small decrease in performance (one or two milliseconds on a ~208ms baseline) but candidly it's hard to tell what's noise and what's real. |
Summary
This PR is a follow-up to the suggestion in #6345 (comment) to use a single stack to store all statements and expressions, rather than using separate vectors for each, which gives us something closer to a full-fidelity chain. (We can then generalize this concept to include all other AST nodes too.)
This is in part made possible by the removal of the hash map from
&StmttoStatementId(#6694), which makes it much cheaper to store these using a single interface (since doing so no longer introduces the requirement that we hash all expressions).I'll follow-up with some profiling, but a few notes on how the data requirements have changed:
BranchIdfor every expression, not just every statement, so that's an extrau32.NodeIdon every snapshot, rather than separateStatementIdandExpressionIdIDs, so that's one feweru32for each snapshot.current_statement()etc. now have to iterate up the node hierarchy until they identify the first statement.Test Plan
cargo test