Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatter: Recycle parsed AST and tokens in between rule invocations when no correction was applied to improve performance #1462

Merged
merged 5 commits into from Apr 28, 2020

Conversation

@bergmeister
Copy link
Collaborator

bergmeister commented Apr 26, 2020

PR Summary

This saves on the expensive calls to the Parser and the rule itself, which are called at the moment for every rule invocation AND after every applied DiagnosticRecord. With this PR, parsing is skipped and the previous result is used if a rule did not return a DiagnosticRecord that changed the analysed script definition string. If there is not DiagnosticRecord returned from a rule during formatting, this simple change can dramatically speed up the formatting time: In the case of a pre-formatted file (always using PowerShell's build.psm1 for reference), formatting is 3 times faster. In reality, I guess it's only around 50%-100% depending on how well formatter the script already is. Because I had to change publicly exposed methods, it is technically speaking a breaking change but I think we can live with that.

Since I haven't written this core logic, I can only guess it was written like this because applying a change invalidates the previous analysis and this explains why one cannot simply run the rules in parallel. However, I imagine, one could write code to not re-parse at all between applying a list of DiagnosticRecords by adapting the line and column numbers (which would also allow for parallelisation of the rules). At the moment one could probably even improve performance further by stopping rule analysis once the first object comes back due to this re-analysis pattern. I've actually tried that locally but it would've broken a few tests so maybe not as easy as it seems to be therefore let's leave that for later now though.

PR Checklist

@@ -1684,22 +1698,21 @@ private static Range SnapToEdges(EditableText text, Range range)
}

private static IEnumerable<CorrectionExtent> GetCorrectionExtentsForFix(
IEnumerable<CorrectionExtent> correctionExtents)
List<CorrectionExtent> correctionExtents)

This comment has been minimized.

@bergmeister

bergmeister Apr 26, 2020 Author Collaborator

If one goes up in the call tree of the Fix methods then one sees that the passed through correctionExtents object is already of type List here, therefore this allows us to drop calls to .ToList() and .Count() here.

/// <returns></returns>
public IEnumerable<DiagnosticRecord> AnalyzeScriptDefinition(string scriptDefinition, bool skipVariableAnalysis = false)
public IEnumerable<DiagnosticRecord> AnalyzeScriptDefinition(string scriptDefinition, out ScriptBlockAst scriptAst, out Token[] scriptTokens, bool skipVariableAnalysis = false)

This comment has been minimized.

@bergmeister

bergmeister Apr 26, 2020 Author Collaborator

Technically speaking this is a breaking change but I don't think anyone depends on it (at least publicly on GitHub) and even if they did it should be easy for them to adopt. If it is of concerns, an alternative to not have a breaking change is to provide a wrapping layer like this

 public IEnumerable<DiagnosticRecord> AnalyzeScriptDefinition(string scriptDefinition, bool skipVariableAnalysis = false)
{
    AnalyzeScriptDefinition(scriptDefinition, out _, out _, skipVariableAnalysis:skipVariableAnalysis )
}
/// <param name="skipVariableAnalysis">Whether to skip variable analysis.</param>
/// <returns>The same instance of `EditableText` that was passed to the method, but the instance encapsulates the fixed script text. This helps in chaining the Fix method.</returns>
public EditableText Fix(EditableText text, Range range, out Range updatedRange, out bool fixesWereApplied, bool skipVariableAnalysis = false)
public EditableText Fix(EditableText text, Range range, bool skipParsing, out Range updatedRange, out bool fixesWereApplied, ref ScriptBlockAst scriptAst, ref Token[] scriptTokens, bool skipVariableAnalysis = false)

This comment has been minimized.

@bergmeister

bergmeister Apr 26, 2020 Author Collaborator

Technically this is a breaking change as well but I think this class shouldn't have been public but rather internal in the first place. Again, we could still provide a wrapping layer for legacy purposed but it'd just leave the change as-is.

This comment has been minimized.

@rjmholt

rjmholt Apr 27, 2020 Member

I think this is acceptable; we've proceeded on the idea that the cmdlets are the real API so far and I think we can continue with that in the 1.x timeframe

/// <param name="skipVariableAnalysis">Whether to skip variable analysis.</param>
/// <returns>The same instance of `EditableText` that was passed to the method, but the instance encapsulates the fixed script text. This helps in chaining the Fix method.</returns>
public EditableText Fix(EditableText text, Range range, out Range updatedRange, out bool fixesWereApplied, bool skipVariableAnalysis = false)
public EditableText Fix(EditableText text, Range range, bool skipParsing, out Range updatedRange, out bool fixesWereApplied, ref ScriptBlockAst scriptAst, ref Token[] scriptTokens, bool skipVariableAnalysis = false)

This comment has been minimized.

@rjmholt

rjmholt Apr 27, 2020 Member

I think this is acceptable; we've proceeded on the idea that the cmdlets are the real API so far and I think we can continue with that in the 1.x timeframe

Christoph Bergmeister
@bergmeister bergmeister merged commit 3e6987b into PowerShell:master Apr 28, 2020
12 checks passed
12 checks passed
PSScriptAnalyzer-CI Build #20200427.8 succeeded
Details
PSScriptAnalyzer-CI (Build Full_Build) Build Full_Build succeeded
Details
PSScriptAnalyzer-CI (Test Ubuntu_16_04) Test Ubuntu_16_04 succeeded
Details
PSScriptAnalyzer-CI (Test Ubuntu_18_04) Test Ubuntu_18_04 succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2016_PowerShell_5_1) Test Windows_Server2016_PowerShell_5_1 succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2016_PowerShell_Core) Test Windows_Server2016_PowerShell_Core succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2019_PowerShell_5_1) Test Windows_Server2019_PowerShell_5_1 succeeded
Details
PSScriptAnalyzer-CI (Test Windows_Server2019_PowerShell_Core) Test Windows_Server2019_PowerShell_Core succeeded
Details
PSScriptAnalyzer-CI (Test macOS_10_14_Mojave) Test macOS_10_14_Mojave succeeded
Details
PSScriptAnalyzer-CI (Test macOS_10_15_Catalina) Test macOS_10_15_Catalina succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.