Skip to content

Use separate structs for expression and statement tracking#6351

Merged
charliermarsh merged 2 commits intomainfrom
charlie/expressions-ii
Aug 7, 2023
Merged

Use separate structs for expression and statement tracking#6351
charliermarsh merged 2 commits intomainfrom
charlie/expressions-ii

Conversation

@charliermarsh
Copy link
Member

Summary

This PR fixes the performance degradation introduced in #6345. Instead of using the generic Nodes structs, we now use separate Statement and Expression structs. Importantly, we can avoid tracking a bunch of state for expressions that we need for parents: we don't need to track reference-to-ID pointers (we just have no use-case for this -- I'd actually like to remove this from statements too, but we need it for branch detection right now), we don't need to track depth, etc.

In my testing, this entirely removes the regression on all-rules, and gets us down to 2ms slower on the default rules (as a crude hyperfine benchmark, so this is within margin of error IMO).

No behavioral changes.

@charliermarsh charliermarsh force-pushed the charlie/expressions-ii branch from 2a0c25b to 30c42ef Compare August 4, 2023 18:42
@charliermarsh charliermarsh added the internal An internal refactor or improvement label Aug 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.03     10.1±0.44ms     4.0 MB/sec    1.00      9.8±0.30ms     4.2 MB/sec
formatter/numpy/ctypeslib.py               1.00  1846.7±96.06µs     9.0 MB/sec    1.02  1881.6±83.02µs     8.8 MB/sec
formatter/numpy/globals.py                 1.00   216.3±11.89µs    13.6 MB/sec    1.02   220.9±15.12µs    13.4 MB/sec
formatter/pydantic/types.py                1.00      3.9±0.15ms     6.6 MB/sec    1.00      3.9±0.15ms     6.6 MB/sec
linter/all-rules/large/dataset.py          1.03     13.1±0.74ms     3.1 MB/sec    1.00     12.8±0.52ms     3.2 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.3±0.12ms     5.1 MB/sec    1.02      3.4±0.12ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.00   495.3±29.62µs     6.0 MB/sec    1.00   494.9±35.73µs     6.0 MB/sec
linter/all-rules/pydantic/types.py         1.03      6.3±0.29ms     4.1 MB/sec    1.00      6.1±0.37ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.06      6.8±0.28ms     6.0 MB/sec    1.00      6.4±0.25ms     6.4 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.06  1418.1±65.61µs    11.7 MB/sec    1.00  1333.2±49.80µs    12.5 MB/sec
linter/default-rules/numpy/globals.py      1.08    178.5±9.56µs    16.5 MB/sec    1.00    165.6±8.18µs    17.8 MB/sec
linter/default-rules/pydantic/types.py     1.05      3.0±0.11ms     8.6 MB/sec    1.00      2.8±0.11ms     9.0 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.2±0.14ms     4.0 MB/sec    1.00     10.1±0.17ms     4.0 MB/sec
formatter/numpy/ctypeslib.py               1.01  1948.2±39.26µs     8.5 MB/sec    1.00  1922.7±23.46µs     8.7 MB/sec
formatter/numpy/globals.py                 1.01    218.2±9.50µs    13.5 MB/sec    1.00    217.1±8.95µs    13.6 MB/sec
formatter/pydantic/types.py                1.01      4.2±0.06ms     6.0 MB/sec    1.00      4.2±0.06ms     6.1 MB/sec
linter/all-rules/large/dataset.py          1.00     13.1±0.19ms     3.1 MB/sec    1.01     13.2±0.16ms     3.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.5±0.07ms     4.7 MB/sec    1.01      3.6±0.04ms     4.7 MB/sec
linter/all-rules/numpy/globals.py          1.00   435.4±11.61µs     6.8 MB/sec    1.00    433.6±6.36µs     6.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.0±0.10ms     4.3 MB/sec    1.02      6.1±0.08ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.00      6.9±0.09ms     5.9 MB/sec    1.02      7.0±0.09ms     5.8 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1423.5±20.05µs    11.7 MB/sec    1.00  1418.0±18.38µs    11.7 MB/sec
linter/default-rules/numpy/globals.py      1.02    163.4±3.50µs    18.1 MB/sec    1.00    160.5±2.57µs    18.4 MB/sec
linter/default-rules/pydantic/types.py     1.01      3.1±0.04ms     8.3 MB/sec    1.00      3.0±0.05ms     8.4 MB/sec

@charliermarsh charliermarsh force-pushed the charlie/expressions-ii branch from 30c42ef to 9b3216b Compare August 4, 2023 23:03

/// An [`Expr`] AST node in a program, along with a pointer to its parent expression (if any).
#[derive(Debug)]
struct Expression<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here will conflict with Expr if we go ahead and rename it to Expression.

Maybe ExpressionWithParent. The name doesn't seem as important as it is an internal type.

#[derive(Debug, Default)]
pub struct Statements<'a> {
nodes: IndexVec<StatementId, Statement<'a>>,
node_to_id: FxHashMap<RefEquality<'a, Stmt>, StatementId>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove RefEquality if we use a TextRange here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so... I'll do this separately.

}

/// Return the parent of the given statement.
pub fn parent(&self, statement: &'a Stmt) -> Option<&'a Stmt> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The parent and parent_id methods are asymetric in that one accepts a StatementId and the other Stmt. Are we back to statement_parent 😆 ? Or resolve_parent or require two steps: statement_id(stmt) and then call parent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. I will fix this in a follow-up since this API already exists on Nodes (now removed).

Comment on lines +95 to +94
impl<'a> IndexMut<StatementId> for Statements<'a> {
#[inline]
fn index_mut(&mut self, index: StatementId) -> &mut Self::Output {
&mut self.nodes[index].node
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the mutability (and indexing)? Same for Expr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! Good call.

Comment on lines +25 to +26
/// The depth of this node in the tree.
depth: u32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a frequent operation or could we compute the depth by counting the ancestor chain`?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revisit in a follow-up PR, since this already exists on Nodes (now removed).

Base automatically changed from charlie/expressions to main August 7, 2023 13:42
charliermarsh added a commit that referenced this pull request Aug 7, 2023
## Summary

When we iterate over the AST for analysis, we often process nodes in a
"deferred" manner. For example, if we're analyzing a function, we push
the function body onto a deferred stack, along with a snapshot of the
current semantic model state. Later, when we analyze the body, we
restore the semantic model state from the snapshot. This ensures that we
know the correct scope, hierarchy of statement parents, etc., when we go
to analyze the function body.

Historically, we _haven't_ included the _expression_ hierarchy in the
model snapshot -- so we track the current expression parents in the
visitor, but we never save and restore them when processing deferred
nodes. This can lead to subtle bugs, in that methods like
`expr_parent()` aren't guaranteed to be correct, if you're in a deferred
visitor.

This PR migrates expression tracking to mirror statement tracking
exactly. So we push all expressions onto an `IndexVec`, and include the
current expression on the snapshot. This ensures that `expr_parent()`
and related methods are "always correct" rather than "sometimes
correct".

There's a performance cost here, both at runtime and in terms of memory
consumption (we now store an additional pointer for every expression).
In my hyperfine testing, it's about a 1% performance decrease for
all-rules on CPython (up to 533.8ms, from 528.3ms) and a 4% performance
decrease for default-rules on CPython (up to 212ms, from 204ms).
However... I think this is worth it given the incorrectness of our
current approach. In the future, we may want to reconsider how we do
these upward traversals (e.g., with something like a red-green tree).
(**Note**: in #6351, the slowdown
seems to be entirely removed.)
@charliermarsh charliermarsh force-pushed the charlie/expressions-ii branch 2 times, most recently from 4499d39 to a960cf5 Compare August 7, 2023 15:02
@charliermarsh charliermarsh force-pushed the charlie/expressions-ii branch 2 times, most recently from c953b52 to 9121d7b Compare August 7, 2023 15:06
@charliermarsh charliermarsh enabled auto-merge (squash) August 7, 2023 15:06
@charliermarsh charliermarsh force-pushed the charlie/expressions-ii branch from 9121d7b to d3e3c90 Compare August 7, 2023 15:17
@charliermarsh charliermarsh merged commit b21abe0 into main Aug 7, 2023
@charliermarsh charliermarsh deleted the charlie/expressions-ii branch August 7, 2023 15:27
@charliermarsh charliermarsh mentioned this pull request Aug 7, 2023
charliermarsh added a commit that referenced this pull request Aug 7, 2023
charliermarsh added a commit that referenced this pull request Aug 7, 2023
## Summary

See discussion in
#6351 (comment). We
can remove `RefEquality` entirely and instead use a text offset for
statement keys, since no two statements can start at the same text
offset.

## Test Plan

`cargo test`
durumu pushed a commit to durumu/ruff that referenced this pull request Aug 12, 2023
## Summary

When we iterate over the AST for analysis, we often process nodes in a
"deferred" manner. For example, if we're analyzing a function, we push
the function body onto a deferred stack, along with a snapshot of the
current semantic model state. Later, when we analyze the body, we
restore the semantic model state from the snapshot. This ensures that we
know the correct scope, hierarchy of statement parents, etc., when we go
to analyze the function body.

Historically, we _haven't_ included the _expression_ hierarchy in the
model snapshot -- so we track the current expression parents in the
visitor, but we never save and restore them when processing deferred
nodes. This can lead to subtle bugs, in that methods like
`expr_parent()` aren't guaranteed to be correct, if you're in a deferred
visitor.

This PR migrates expression tracking to mirror statement tracking
exactly. So we push all expressions onto an `IndexVec`, and include the
current expression on the snapshot. This ensures that `expr_parent()`
and related methods are "always correct" rather than "sometimes
correct".

There's a performance cost here, both at runtime and in terms of memory
consumption (we now store an additional pointer for every expression).
In my hyperfine testing, it's about a 1% performance decrease for
all-rules on CPython (up to 533.8ms, from 528.3ms) and a 4% performance
decrease for default-rules on CPython (up to 212ms, from 204ms).
However... I think this is worth it given the incorrectness of our
current approach. In the future, we may want to reconsider how we do
these upward traversals (e.g., with something like a red-green tree).
(**Note**: in astral-sh#6351, the slowdown
seems to be entirely removed.)
durumu pushed a commit to durumu/ruff that referenced this pull request Aug 12, 2023
…#6351)

## Summary

This PR fixes the performance degradation introduced in
astral-sh#6345. Instead of using the
generic `Nodes` structs, we now use separate `Statement` and
`Expression` structs. Importantly, we can avoid tracking a bunch of
state for expressions that we need for parents: we don't need to track
reference-to-ID pointers (we just have no use-case for this -- I'd
actually like to remove this from statements too, but we need it for
branch detection right now), we don't need to track depth, etc.

In my testing, this entirely removes the regression on all-rules, and
gets us down to 2ms slower on the default rules (as a crude hyperfine
benchmark, so this is within margin of error IMO).

No behavioral changes.
durumu pushed a commit to durumu/ruff that referenced this pull request Aug 12, 2023
durumu pushed a commit to durumu/ruff that referenced this pull request Aug 12, 2023
## Summary

See discussion in
astral-sh#6351 (comment). We
can remove `RefEquality` entirely and instead use a text offset for
statement keys, since no two statements can start at the same text
offset.

## Test Plan

`cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants