feat(linter/eslint): add `no_unreachable` rule. by rzvxa · Pull Request #3238 · oxc-project/oxc

rzvxa · 2024-05-11T12:33:04Z

This rule is done but since it is running for every possible statement and does quite a bit of work on them to determine whether it is 100% reachable or not; The performance in my opinion is kind of abysmal.

I'll try to work it out, I know Biome does 2 types of checks to simplify the rule for some nodes, However, they have a lot more false negatives than our implementation.

Here is one example of those false negatives

Update 1:

I've benchmarked this rule using only the simplified reachability checks and it was around 5% faster, To be honest, it isn't much improvement especially considering that we can only use this check for a small portion of nodes and even that is accompanied by newly introduced checks which would lessen the amount of performance gain further.

Most of the performance regression is because of allocations during our depth first search since we have to store both the visited and finished nodes which results in a bunch of rapid-fire allocations back to back. Currently, At the moment I don't have a great idea of how to improve it, We may have to implement our own graph to use arenas underneath.

Given that this rule is the most extensive use case of control flow (It doesn't come with a limited scope similar to property and constructor rules already implemented) this performance drop might be reasonable to some extent.

Update 2:

I reworked my approach in 2 senses, First I used @Boshen's suggestion inspired by TypeScript and kept some of the reachability information in the basic block structure instead of calculating it on the fly. It is done by propagating the Unreachable edge and Unreachable instruction throughout subgraphs.

This for sure helped with the performance but the next part is what never failed to amaze me, Going from something near O(n!) in the worst-case scenario to O(n^2) (in the worst-case scenario). By changing the approach instead of checking the reachability of each statement we do it in 3 paths; First, we do a path on the entire CFG and query all reachable but suspicious cases, and then we do another path on each of these suspicions subgraphs to determine the reachability with higher confidence. Finally, we iterate all of the appropriate nodes and check their reachability status according to the information collected in 2 previous paths.

With these 2 this rule went from -24% to ~-2%.

This performance gain doesn't come for free though; It increases the likelihood of false positives/negatives, But as long as we are passing our ecosystem-ci it should be fine. We can always sacrifice some performance to check for edge cases if there are any.

new oxlint-echosystem-ci result

graphite-app · 2024-05-11T12:33:08Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

rzvxa · 2024-05-11T12:33:19Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @rzvxa and the rest of your teammates on Graphite

codspeed-hq · 2024-05-12T19:46:36Z

CodSpeed Performance Report

Merging #3238 will not alter performance

_{Comparing 05-11-feat_linter_eslint_add_no_unreachable_rule (9cc5fb5) with main (9c31ed9)}

Summary

✅ 22 untouched benchmarks

rzvxa · 2024-06-06T22:04:43Z

@Boshen Can you read the first comment on this PR? I was wondering how much should I improve the performance to make it acceptable.

Boshen · 2024-06-07T04:22:32Z

@rzvxa Instead of doing the check inside the rule, are you able to mark the statement as unreachable during the cfg building phase inside semantic builder? For example add any of the following:

pub struct Semantic<'a> {
    pub unreachable: Vec<Span>, // All the unreachable spans
}

In the rule you just print out these spans in one go - a rule with only one line of code lol.

Or:

pub struct SemanticNode<'a> {
  pub unreachable: bool
}

Think outside of the box, we have the whole compiler pipeline, we can precompute a lot of stuff!

rzvxa · 2024-06-07T09:23:34Z

@rzvxa Instead of doing the check inside the rule, are you able to mark the statement as unreachable during the cfg building phase inside semantic builder? For example add any of the following:
pub struct Semantic<'a> {
    pub unreachable: Vec<Span>, // All the unreachable spans
}
In the rule you just print out these spans in one go - a rule with only one line of code lol.

Or:
pub struct SemanticNode<'a> {
  pub unreachable: bool
}
Think outside of the box, we have the whole compiler pipeline, we can precompute a lot of stuff!

Doesn't adding this to the semantic mean we have to pay for this cost in places where it isn't necessary? Not to mention the amount of code we have to add to our semantic builder to compute and determine the reachability.

I'm sure with this much performance regression even I wouldn't like my current results (and if it was my project I wouldn't allow it to merge) But having it in semantic doesn't feel right to me either. Maybe we can achieve a better performance by rewriting the rule to use run_once and compare reachablity in relation to the last rechable node instead of checking from function(or root) down.

For example if we visit an infinite loop we know for sure everything after this loop is unreachable, But in my current implementation we can't remember that information between nodes so we get to do it all over again.

Boshen · 2024-06-07T10:16:29Z

Yes, a run_once through the entire CFG can do the trick! Assuming you can query out the unreachable spans.

Boshen · 2024-06-07T10:24:33Z

In TypeScript, it marks nodes as unreachable via FlowFlags.Unreachable, and is queried in checker.ts by the function isReachableFlowNode, maybe we can draw some inspirations there 😅

Boshen · 2024-06-07T10:28:55Z

Maybe there's a bug somewhere? Or maybe you need to add some cache to the visited nodes?

checker.ts has some really complicated TypeScript code so maybe we hit an edge case somewhere.

rzvxa · 2024-06-07T10:47:29Z

In TypeScript, it marks nodes as unreachable via FlowFlags.Unreachable, and is queried in checker.ts by the function isReachableFlowNode, maybe we can draw some inspirations there 😅

We already have an unreachable instruction to mark the first unreachable block but it doesn't propagate and we use it to filter out the subgraph when walking through the flow, I assume they are doing all of our checks - if not more - to set these flags correctly. I'll give the typescript approach a read before making any major changes but I have a hunch that whatever approach we use to implement this rule it should get rid of our current redundant revisits either by running once or precomputing the information(which in of itself is another kind of run once happening in the core instead of linter crate).

It all brings me to the question I was going to ask before seeing this comment. Considering that this information can be reused but comes at the cost of more memory and overhead for everything using the semantics, Do you think having it there is going to be better overall?
Let's say we will have a 15% regression for the linter benchmark(which includes everything else so the number is higher than 15% if we pinpoint it to the linter itself) but if we do it in the semantic it would be faster let's say the time is cut in half and is 7% for the doing it in the semantic. It would make our linter benchmark to be 8% faster than the other approach but in return, it makes all of the other benchmarks to be about 7% slower; in reality, things like semantic benchmarks are going to get hit much greater than something like transformers - when they are all implemented - unless we heavily lean into feature-gating it.

rzvxa · 2024-06-07T10:51:24Z

Maybe there's a bug somewhere? Or maybe you need to add some cache to the visited nodes?

checker.ts has some really complicated TypeScript code so maybe we hit an edge case somewhere.

I know we are revisiting nodes over and over since we can't memorize the information from our run on the previous nodes, That's why I hope we get a great boost after getting rid of them, if I'm correct it would be something like going from around O(n^3) or even O(n!) in the worst case scenarios to around O(n^2).

rzvxa · 2024-06-07T10:52:58Z

The sad part is that all of our dfs paths are only 8% of the performance regression the other 12% is for the allocations these visits do.

Boshen · 2024-06-07T11:20:13Z

Do you think having it there is going to be better overall

We'll eventually make cfg construction optional at some point.

Regarding performance regression, I think we can accept around 10 - 15% given that we are doing a lot of work for cfg.

rzvxa · 2024-06-07T11:37:41Z

Regarding performance regression, I think we can accept around 10 - 15% given that we are doing a lot of work for cfg.

Just to be clear do you mean it is acceptable if we keep it in the semantics or are you talking about this rule alone?

Boshen · 2024-06-07T12:22:29Z

Regarding performance regression, I think we can accept around 10 - 15% given that we are doing a lot of work for cfg.

Just to be clear do you mean it is acceptable if we keep it in the semantics or are you talking about this rule alone?

Overall performance regression of 10 to 15%.

We can keep the unreachable query in the linter rule if that's what you prefer, I was only suggesting that maybe it's better to precompute in the semantic builder. I'll let you make the decision 😁

graphite-app · 2024-06-13T07:25:12Z

Merge activity

Jun 13, 3:25 AM EDT: Boshen added this pull request to the Graphite merge queue.
Jun 13, 3:45 AM EDT: The Graphite merge queue wasn't able to merge this pull request due to internal failures.
Jun 13, 3:45 AM EDT: The Graphite merge queue removed this pull request due to downstack failures on PR #3547.
Jun 13, 4:07 AM EDT: Boshen added this pull request to the Graphite merge queue.
Jun 13, 4:07 AM EDT: The Graphite merge queue wasn't able to merge this pull request due to internal failures.
Jun 13, 4:08 AM EDT: The Graphite merge queue removed this pull request due to downstack failures on PR #3566.
Jun 13, 4:36 AM EDT: Boshen added this pull request to the Graphite merge queue.
Jun 13, 4:39 AM EDT: CI is running for this PR on a draft PR: #3656
Jun 13, 4:40 AM EDT: The Graphite merge queue wasn't able to merge this pull request due to The Graphite App does not appear to have permissions required for parallel CI.
Jun 13, 5:05 AM EDT: The merge label 'merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
Jun 13, 5:05 AM EDT: The merge label 'merge' was removed. This PR will no longer be merged by the Graphite merge queue
Jun 13, 5:37 AM EDT: Boshen added this pull request to the Graphite merge queue.
Jun 13, 5:38 AM EDT: CI is running for this PR on a draft PR: #3657
Jun 13, 5:41 AM EDT: Boshen merged this pull request with the Graphite merge queue via draft PR: #3657.

@Boshen

closes #621 [no-unreachable](https://github.com/eslint/eslint/blob/069aa680c78b8516b9a1b568519f1d01e74fb2a2/lib/rules/no-unreachable.js#L196) [oxlint-echosystem-ci result](https://github.com/rzvxa/oxlint-ecosystem-ci/actions/runs/9406195143/job/25909079029) This rule is done but since it is running for every possible statement and does quite a bit of work on them to determine whether it is 100% reachable or not; The performance in my opinion is kind of abysmal. I'll try to work it out, I know Biome does 2 types of checks to simplify the rule for some nodes, However, they have a lot more false negatives than our implementation. ##### Here is one example of those [false negatives](https://biomejs.dev/playground/?code=ZgB1AG4AYwB0AGkAbwBuACAAeAAoACkAIAB7ACAAZABvACAAewAgAGEAKAApADsAIAB9ACAAdwBoAGkAbABlACgAdAByAHUAZQApADsAIABiACgAKQA7ACAAfQA%3D) ------------- ### Update 1: I've benchmarked this rule using only the simplified reachability checks and it was around 5% faster, To be honest, it isn't much improvement especially considering that we can only use this check for a small portion of nodes and even that is accompanied by newly introduced checks which would lessen the amount of performance gain further. Most of the performance regression is because of allocations during our depth first search since we have to store both the visited and finished nodes which results in a bunch of rapid-fire allocations back to back. Currently, At the moment I don't have a great idea of how to improve it, We may have to implement our own graph to use arenas underneath. Given that this rule is the most extensive use case of control flow (It doesn't come with a limited scope similar to property and constructor rules already implemented) this performance drop might be reasonable to some extent. ------------ ### Update 2: I reworked my approach in 2 senses, First I used @Boshen's suggestion inspired by TypeScript and kept some of the reachability information in the basic block structure instead of calculating it on the fly. It is done by propagating the `Unreachable` edge and `Unreachable` instruction throughout subgraphs. This for sure helped with the performance but the next part is what never failed to amaze me, Going from something near `O(n!)` in the worst-case scenario to `O(n^2)` (in the worst-case scenario). By changing the approach instead of checking the reachability of each statement we do it in 3 paths; First, we do a path on the entire CFG and query all reachable but suspicious cases, and then we do another path on each of these suspicions subgraphs to determine the reachability with higher confidence. Finally, we iterate all of the appropriate nodes and check their reachability status according to the information collected in 2 previous paths. With these 2 this rule went from `-24%` to `~-2%`. This performance gain doesn't come for free though; It increases the likelihood of false positives/negatives, But as long as we are passing our `ecosystem-ci` it should be fine. We can always sacrifice some performance to check for edge cases if there are any. [new oxlint-echosystem-ci result](https://github.com/rzvxa/oxlint-ecosystem-ci/actions/runs/9490791181)

github-actions bot added A-linter Area - Linter A-ast Area - AST labels May 11, 2024

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch 2 times, most recently from 57f6ca1 to 94c3001 Compare May 12, 2024 19:41

rzvxa mentioned this pull request May 14, 2024

linter: react/rule-of-hooks false positives #3257

Closed

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 94c3001 to ece1ff3 Compare May 21, 2024 08:52

rzvxa mentioned this pull request May 21, 2024

feat(linter): eslint/no-constructor-return #3321

Merged

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 4b0951e to a4f9ee3 Compare May 28, 2024 13:30

rzvxa changed the base branch from main to 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s May 28, 2024 13:30

rzvxa force-pushed the 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s branch from 658e5d5 to ae657bc Compare May 28, 2024 14:48

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from a4f9ee3 to ba99160 Compare May 28, 2024 14:50

rzvxa force-pushed the 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s branch from ae657bc to 857840a Compare May 28, 2024 18:06

This was referenced May 29, 2024

improvement(semantic/cfg): better break and continue flow. #3462

Merged

improvement(semantic/cfg): better CFG API #3472

Merged

improvement(semantic/cfg): better throw control flow. #3473

Merged

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from ba99160 to 842eccc Compare May 30, 2024 17:32

rzvxa changed the base branch from 05-28-improvement_semantic_cfg_better_control_flow_for_forstatement_s to 05-30-improvement_semantic_cfg_cfg_api_for_detecting_conditional_paths May 30, 2024 17:32

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch 3 times, most recently from 6bb0206 to 9bdf934 Compare May 30, 2024 20:16

rzvxa changed the base branch from 05-30-improvement_semantic_cfg_cfg_api_for_detecting_conditional_paths to 05-30-improvement_linter_react_use_new_cfg_for_detection_conditional_nodes May 30, 2024 20:17

rzvxa force-pushed the 06-06-feat_semantic_cfg_add_condition_instruction branch from 504c906 to 1644851 Compare June 6, 2024 18:40

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from b54237e to 3b12589 Compare June 6, 2024 18:40

rzvxa requested a review from Boshen June 6, 2024 22:03

This was referenced Jun 11, 2024

perf: branch for benchmarking, please ignore me^_^ #3631

Closed

feat(semantic/cfg): propagate unreachable edges through subgraphs. #3648

Closed

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 0569841 to dac177e Compare June 12, 2024 19:56

rzvxa changed the base branch from 06-06-feat_semantic_cfg_add_condition_instruction to 06-12-feat_semantic_cfg_propagate_unreachable_edges_through_subgraphs June 12, 2024 19:56

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from dac177e to 4548ad5 Compare June 12, 2024 20:08

rzvxa marked this pull request as ready for review June 12, 2024 23:02

rzvxa force-pushed the 06-12-feat_semantic_cfg_propagate_unreachable_edges_through_subgraphs branch from 03ff002 to ca3307c Compare June 12, 2024 23:08

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 306c912 to 3397176 Compare June 12, 2024 23:08

rzvxa mentioned this pull request Jun 12, 2024

refactor(semantic/cfg): cleanup control flow and it's builder. #3650

Closed

rzvxa force-pushed the 06-12-feat_semantic_cfg_propagate_unreachable_edges_through_subgraphs branch from ca3307c to 0478a0b Compare June 13, 2024 04:18

rzvxa force-pushed the 05-11-feat_linter_eslint_add_no_unreachable_rule branch from 5faf0e9 to 0cf1b04 Compare June 13, 2024 04:18

This was referenced Jun 14, 2024

Release oxlint v0.4.4 #3669

Merged

Release crates v0.15.0 #3743

Merged

Uh oh!

Conversation

rzvxa commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Here is one example of those false negatives

Update 1:

Update 2:

Uh oh!

graphite-app bot commented May 11, 2024

Your org has enabled the Graphite merge queue for merging into main

Uh oh!

rzvxa commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented May 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #3238 will not alter performance

Summary

Uh oh!

rzvxa commented Jun 6, 2024

Uh oh!

Boshen commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzvxa commented Jun 7, 2024

Uh oh!

Boshen commented Jun 7, 2024

Uh oh!

Boshen commented Jun 7, 2024

Uh oh!

Boshen commented Jun 7, 2024

Uh oh!

rzvxa commented Jun 7, 2024

Uh oh!

rzvxa commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzvxa commented Jun 7, 2024

Uh oh!

Boshen commented Jun 7, 2024

Uh oh!

rzvxa commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Jun 7, 2024

Uh oh!

graphite-app bot commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rzvxa commented May 11, 2024 •

edited

Loading

rzvxa commented May 11, 2024 •

edited

Loading

codspeed-hq bot commented May 12, 2024 •

edited

Loading

Boshen commented Jun 7, 2024 •

edited

Loading

rzvxa commented Jun 7, 2024 •

edited

Loading

rzvxa commented Jun 7, 2024 •

edited

Loading

graphite-app bot commented Jun 13, 2024 •

edited

Loading