Skip to content

Issue #10934: OutOfMemory Error following ANTLR4 upgrade#10999

Merged
romani merged 1 commit into
checkstyle:masterfrom
nrmancuso:issue-10934-3
Jan 15, 2022
Merged

Issue #10934: OutOfMemory Error following ANTLR4 upgrade#10999
romani merged 1 commit into
checkstyle:masterfrom
nrmancuso:issue-10934-3

Conversation

@nrmancuso

@nrmancuso nrmancuso commented Dec 3, 2021

Copy link
Copy Markdown
Contributor

Closes #10934 .

Report from investigation of issue here: https://github.com/nmancus1/checkstyle-antlr4-memory-report

Further investigation has led me to the current implementation, and has disproven some results of my previous investigation. In order to keep memory usage down, clearing the lexer DFA is not as important as my original research concluded. Also, I have found that it is not critical to instantiate a new PredictionContextCache as some suggested in the ANTLR issues that I read through.

The root of the problem is that ANTLR DFA states are basically stored in an unbounded cache; as long as we keep finding syntax that is different than what has been previously parsed, the number of DFA states (and heap size) continues to grow without bound as we parse more files. The suggested approach of clearing the DFA states upon every parsing operation (every file) had much too high of a time performance cost.

The current implementation (clearing the DFA states every 500 files) gets us to a very good state performance wise; memory consumption is reasonable (stays within all specified JVM heap sizes), and parsing speed is nearly back to ANTLR2 times.


You can see the history of all no-exception-openjdk16 task execution here (most relevant real life metric of performance of changes in this PR):

https://github.com/checkstyle/checkstyle/actions/workflows/no-exception-workflow.yml?query=branch%3Aissue-10934-3

Note that file counter dfa clearing was added 12/29; you can see that most subsequent runs are equal to, or less than 30 minutes.


Some metrics from CI:

CI task average of 5 runs on master PR
no-exception-test.sh guava-with-google-checks 5:44 3:38
no-exception-test.sh guava-with-sun-checks 4:33 3:41
no-error-pmd 10:23 9:23
java11-verify 4:37 5:18
no-error-xwiki 4:28 3:56
no-exception-only-javadoc 6:12 5:41
no-exception-lucene-and-others-javadoc 7:50 6:15

CI tasks always have some variance on run times, but averaging out the master run times of these tasks gives us a good idea of how long certain executions should take.


Diff Regression projects: https://gist.githubusercontent.com/nmancus1/179fb374eb149c629322508baadc6ce1/raw/8f926d3450ba23172df94f39fd6901a355550d7a/projects.properties

New module config: https://gist.githubusercontent.com/nmancus1/ead52600003f218fb17823bc59ea596f/raw/a6cd0780e9f6e13296380c8328f7c064baa6c95e/config.xml

@nrmancuso nrmancuso self-assigned this Dec 3, 2021
@nrmancuso nrmancuso force-pushed the issue-10934-3 branch 2 times, most recently from 72bfc86 to 8c03584 Compare December 3, 2021 16:57
@nrmancuso nrmancuso marked this pull request as draft December 3, 2021 17:26
@rnveach rnveach self-requested a review December 3, 2021 17:36
@nrmancuso nrmancuso changed the title Issue #10934: OutOfMemory Error following ANTL4 upgrade Issue #10934: OutOfMemory Error following ANTLR4 upgrade Dec 3, 2021
@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541120220

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541197083

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541209916

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541241529

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541570395

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541601281

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541636995

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

github-actions Bot commented Dec 5, 2021

Copy link
Copy Markdown
Contributor

Report generation job failed on phase "make_report", step "Generate report".
Link: https://github.com/checkstyle/checkstyle/actions/runs/1541669153

@nrmancuso

nrmancuso commented Dec 11, 2021

Copy link
Copy Markdown
Contributor Author

Github, generate report

This will never run to completion in github action (when generating diff report), because we are running this PR branch (which will allow execution of diff.groovy to complete because of memory optimization) against master (which will cause OOM).

We know that report generation on newer openjdks with lower memory will complete in github action once this is merged because of https://github.com/checkstyle/checkstyle/runs/4422848467?check_suite_focus=true

I will try to generate report in single mode.

@nrmancuso

Copy link
Copy Markdown
Contributor Author

Github, generate report

@github-actions

Copy link
Copy Markdown
Contributor

@nrmancuso

nrmancuso commented Dec 11, 2021

Copy link
Copy Markdown
Contributor Author

Single report generation is successful, so once we fix #10934, we will be able to complete checkstyle/contribution#527 by using adding the openjdk project in this PR description. We will also be able to add openjdk16 project to https://github.com/checkstyle/contribution/blob/master/checkstyle-tester/projects-to-test-on-for-github-action.properties for github action diff report generation.

@rnveach

rnveach commented Dec 11, 2021

Copy link
Copy Markdown
Contributor

This will never run to completion in github action

Can you run this locally and give the regression more memory?

* @param input the token stream to parse
* @param dummyArg dummy argument
*/
public JavaLanguageParser(TokenStream input, boolean dummyArg) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make similar changes to the Javadoc? Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make similar changes to the Javadoc?

Assuming you mean Javadoc grammar here, we certainly could if we needed to; since this issue was reported for 9.X update, I only targeted the Java grammar.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you mean Javadoc grammar

Yes.

if we needed to

This is what my question is connected to. If we need further analysis on if Javadoc grammar needs this, then lets create an issue for it it. If there is no harm to adding it just to be sure we don't have to worry about it later, then I am fine with doing it too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking on this, I believe that this may hurt Javadoc grammar performance more than helping it. Consider post at http://roman-ivanov.blogspot.com/2016/02/performance-problem-while-using-checks.html, where we can see that Javadoc parsing slows Checkstyle execution significantly. In this case, DFA clearing will likely slow parsing even more. I can open an issue if you would like, but the Javadoc grammar has much more severe issues than memory usage; I would address them before employing any memory hacks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok if you are since you know this stuff more. I just want to make sure it was looked at.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmancus1 , please update issue description (if we have issue to this) to look at upadate like in this PR to possibly improve javadoc speed.

update of javadoc is not in scope of this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrmancuso nrmancuso marked this pull request as ready for review December 31, 2021 20:31
@snazy

snazy commented Jan 11, 2022

Copy link
Copy Markdown

Just run into this issue after upgrading checkstyle in a big maven project from 8.x to 9.2.1.
After looking into a heap dump, it seems that the static field _decisionToDFA in JavaLangugageParser is the issue here.
Since Maven keeps the plugin around for the whole build, the object tree via _decisionToDFA just grows until it OOMs.

I tried @nmancus1 issue-10934-3 branch and the build passed without hitting an OOM as with checkstyle 8.x. Also didn't notice any regression of the build time. So I think, the issue's fixed with the change in that branch.

FYI a screenshot from the heap dump w/ 9.2.1:
image

@romani

romani commented Jan 12, 2022

Copy link
Copy Markdown
Member

GitHub , rebase

@nrmancuso nrmancuso force-pushed the issue-10934-3 branch 2 times, most recently from 68458bb to 890a83e Compare January 12, 2022 06:14
public JavaLanguageParser(TokenStream input, boolean dummyArg) {
super(input);
_interp = new ParserATNSimulator(this, _ATN , _decisionToDFA, _sharedContextCache);
if (++fileCounter % 500 == 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmancus1 I'm ok with this constant, but what if we turn this into a parameter? Instead the dummyArg there can be dfaLimit or something like that. It is even possible to add a property to the TreeWalker to make it configurable by the user.

@romani

romani commented Jan 12, 2022

Copy link
Copy Markdown
Member

Better to avoid extension of Treewalker properties.
We can use system variable if we want it to be configured, but I do not see much benefit now

@nrmancuso nrmancuso force-pushed the issue-10934-3 branch 2 times, most recently from 88ac134 to a9b5f88 Compare January 12, 2022 16:15

@romani romani left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

items:

Comment thread src/main/java/com/puppycrawl/tools/checkstyle/JavaParser.java Outdated
@rnveach rnveach assigned romani and unassigned rnveach Jan 14, 2022
@nrmancuso nrmancuso force-pushed the issue-10934-3 branch 3 times, most recently from 70c2c54 to 7d567c6 Compare January 14, 2022 03:11

@romani romani left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item

Comment thread src/main/java/com/puppycrawl/tools/checkstyle/JavaParser.java Outdated
@romani

romani commented Jan 15, 2022

Copy link
Copy Markdown
Member

Let's merge as CI pass

@romani romani merged commit 6778867 into checkstyle:master Jan 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OutOfMemory following CheckStyle 9.X upgrade

6 participants