Skip to content

Use accumulator for diagnostics#14116

Closed
MichaReiser wants to merge 2 commits intomainfrom
micha/accumulator-diagnostics
Closed

Use accumulator for diagnostics#14116
MichaReiser wants to merge 2 commits intomainfrom
micha/accumulator-diagnostics

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Nov 5, 2024

Summary

Use a Salsa accumulator for diagnostics to avoid double-emitting diagnostics for unpack assignments.

We now use a single CompileDiagnostic accumulator that stores all diagnostics. That includes IO errors, parse errors, and type check errors. I decided that we're beyond where String annotations are fun. That's why I introduced a Diagnostic trait with 5 implementations:

  • ParseDiagnostic for parse errors
  • SourceTextDiagnostic for IO errors
  • TypeCheckDiagnostic for type inference errors
  • RevealedTypeDiagnostic for reveal_type (info severity)
  • CompileDiagnostic which is a cheap cloneable any diagnostic wrapper that implements salsa accumulator.

Using a structured diagnostic has the advantage of no longer needing manual parsing in the LSP.

A nice side effect of this is that mdtests now support tests with syntax errors because parse-errors are just another diagnostic :)

Performance regression

I expect this to hit performance pretty bad, especially the incremental benchmark. The problem is that there's currently no way to run an accumulator concurrently. That's why I had to resort to a hack where we run type checking in parallel, and then collect the diagnostics. This has the downside that Salsa first runs all queries (in parallel) but then has to iterate over all of them again to collect the diagnostics (even if there are none!). The queries are all cached, but it is still expensive because the query dependency tree is somewhat large.

This overhead is especially noticeable in the cache case. We'll need salsa-rs/salsa#568 to do the accumulation and checking in one go.

Test Plan

cargo test

Comment on lines +39 to +40
#[salsa::tracked]
pub fn check_types(db: &dyn Db, file: File) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be a salsa query so that the mdtest framework can get the diagnostics (accumulator require a query)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the code here is copied over from ruff_server.

@MichaReiser MichaReiser force-pushed the micha/accumulator-diagnostics branch from 3dd3ee3 to 24d6e87 Compare November 5, 2024 19:37
@codspeed-hq
Copy link

codspeed-hq bot commented Nov 5, 2024

CodSpeed Performance Report

Merging #14116 will degrade performances by 32.2%

Comparing micha/accumulator-diagnostics (371e5fc) with main (4ece8e5)

Summary

❌ 1 regressions
✅ 31 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main micha/accumulator-diagnostics Change
red_knot_check_file[incremental] 4.2 ms 6.3 ms -32.2%

@MichaReiser MichaReiser force-pushed the micha/accumulator-diagnostics branch 2 times, most recently from d9ec46b to ec9a12b Compare November 5, 2024 19:50
@MichaReiser MichaReiser marked this pull request as ready for review November 5, 2024 19:51
@MichaReiser MichaReiser added the ty Multi-file analysis & type inference label Nov 5, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser force-pushed the micha/accumulator-diagnostics branch from ec9a12b to 522f3d4 Compare November 5, 2024 20:16
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments from skimming. I'll try to look at this more in-depth tomorrow!

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thank you! I think this is definitely an improvement in terms of reliable handling of diagnostics (to avoid duplicating or accidentally missing them); hopefully parallel-db brings back most of the regression.

@@ -0,0 +1,175 @@
use ruff_text_size::TextRange;
use salsa::Accumulator as _;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of the as _ here? It seems to be just as happy for me locally without it.

Suggested change
use salsa::Accumulator as _;
use salsa::Accumulator;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both work, for as long as there's no other type named Accumulator. Using as _ imports that type "unnamed". You can't reference the type itself. That's often sufficient for traits because all that is needed is to bring the trait into the scope to call that trait's functions.

@MichaReiser
Copy link
Member Author

I just noticed that we can now support invalid-syntax in mdtests. I ported the exception handler with invalid syntax test case to demonstrate it

@MichaReiser MichaReiser force-pushed the micha/accumulator-diagnostics branch from 522f3d4 to 27d2a8e Compare November 6, 2024 07:45
@MichaReiser MichaReiser force-pushed the micha/accumulator-diagnostics branch from 27d2a8e to c9e4d72 Compare November 6, 2024 08:07
@MichaReiser MichaReiser force-pushed the micha/accumulator-diagnostics branch from c9e4d72 to 371e5fc Compare November 6, 2024 08:09
@MichaReiser
Copy link
Member Author

I don't think this is the way to go. The problem is that salsa's accumulator are O(tree) and not O(changes) which introduces a significant cost. They are very convenient but I'm concerned about the incremental performance in very large mono repository...

I have a fix specific for unpacking and I'll look into extracting some of the improvements I made in this PR but I'm more convinced now that accumulators, as they are today, are not a good fit with our very deeply nested queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants