[kotlin] Fix #6659: Prevent parser hang via InterruptibleParserATNSimulator and parse timeout#6660
Conversation
Introduce InterruptibleParserATNSimulator which overrides closureCheckingStopState() to check Thread.interrupted() and throw ParseCancelledException. This lets Future.cancel(true) actually stop an ANTLR parse that has entered exponential ATN state expansion. PmdKotlinParser now: - Gives each parse its own fresh DFA[] and PredictionContextCache via InterruptibleParserATNSimulator, preventing cross-file ATN state accumulation (static shared fields in generated KotlinParser). - Runs each parse in a daemon thread via PARSE_EXECUTOR with a configurable timeout (system property pmd.kotlin.parseTimeoutSeconds, default 30s). Files exceeding the timeout are skipped with a warning and a ParseException is thrown.
…e guide Add two bullet points to step 6 of adding_a_new_antlr_based_language.md: - ATN state isolation: explain the static DFA/PredictionContextCache accumulation problem and how to fix it with a fresh simulator per parse. - Parse timeout: recommend wrapping the parse in a daemon thread with a configurable timeout as a safety net against infinite parse loops. Both reference PmdKotlinParser as the implementation example.
…rpreter is on Recognizer not Parser)
|
Compared to main: (comment created at 2026-05-25 15:06:13+00:00 for 6314289) |
adangel
left a comment
There was a problem hiding this comment.
Thanks for the PR.
Please make this solution contained within pmd-kotlin for now. I was actually aiming at a general solution like I described now in #6684, because PLSQL is Javacc based, not ANTLR. So, until we know of a general solution, I prefer to have this contained with in Kotlin for now.
| int numDecisions = KotlinParser._ATN.getNumberOfDecisions(); | ||
| DFA[] decisionToDfa = new DFA[numDecisions]; | ||
| for (int i = 0; i < numDecisions; i++) { | ||
| decisionToDfa[i] = new DFA(KotlinParser._ATN.getDecisionState(i), i); | ||
| } | ||
| return new InterruptibleParserATNSimulator( | ||
| parser, KotlinParser._ATN, decisionToDfa, new PredictionContextCache()); |
There was a problem hiding this comment.
You can access the _decisionToDFA field in KotlinParser - PmdKotlinParser and the generated KotlinParser live in the same package.
| int numDecisions = KotlinParser._ATN.getNumberOfDecisions(); | |
| DFA[] decisionToDfa = new DFA[numDecisions]; | |
| for (int i = 0; i < numDecisions; i++) { | |
| decisionToDfa[i] = new DFA(KotlinParser._ATN.getDecisionState(i), i); | |
| } | |
| return new InterruptibleParserATNSimulator( | |
| parser, KotlinParser._ATN, decisionToDfa, new PredictionContextCache()); | |
| return new InterruptibleParserATNSimulator( | |
| parser, KotlinParser._ATN, KotlinParser._decisionToDFA, KotlinParser._sharedContextCache); |
There was a problem hiding this comment.
This turned out to introduce thread contention. Needed unshared DFA and contextCache per file/simulator to prevent this.
There was a problem hiding this comment.
Hm... this is maybe worth to investigate in the future. The default behavior of ANTLR is, to use a shared (static) cache...
Maybe the cache is poisoned by the difficult to parse files?
Not sharing the cache might increase the memory footprint...
…kage-private Use Thread.currentThread().isInterrupted() to avoid clearing the interrupted flag. General pmd-core solution tracked in pmd#6684.
…llel lock contention Sharing KotlinParser._decisionToDFA and _sharedContextCache across parallel PMD threads causes severe monitor contention (all threads serialize on the same DFA HashMap and PredictionContextCache lock). Under parallel parsing of complex Kotlin files this effectively single-threads all parsing and can cause apparent hangs. Use fresh instances per parse instead.
KotlinHandler and PmdKotlinParser previously had public no-arg constructors. Add them back delegating to the new constructors with the default timeout from KotlinLanguageProperties.PARSE_TIMEOUT_SECONDS.
Note future improvement: introduce KotlinLanguageProcessor to own the executor lifecycle and supply timeout via task.getLanguageProcessor(), removing constructor injection from PmdKotlinParser.
|
@adangel I finished the review updates. Question remaining: how far to go with the properties setup and language processor to be more in line with the Java setup. There is only one timeout property right now, but it seems a tighter integration might be appropriate once more language properties in the language bundle pop up. And it might better fit the current Suggestion: introduce A follow-up improvement worth considering: introduce This would allow:
Not blocking for the current PR but a clean follow-up to align with the pmd-java architecture. |
adangel
left a comment
There was a problem hiding this comment.
Thanks!
I'll push some changes now and merge it.
Basically, I've introduced now the KotlinLanguageProcessor and made the language property experimental for now.
Fixes #6659
Changes
pmd-core:InterruptibleParserATNSimulatorNew class in
net.sourceforge.pmd.lang.ast.impl.antlr4that overridesclosureCheckingStopStateto checkThread.interrupted()at every recursion level of the ANTLR ATN closure loop. When interrupted, throwsParseCancelledExceptionto unwind the stack immediately. Usable by all ANTLR-based languages (Swift, PL/SQL, etc.).pmd-kotlin:PmdKotlinParserDFA[]andPredictionContextCacheper file (per-file ATN state isolation)ExecutorServicewith configurable timeout (default 30s,-Dpmd.kotlin.parseTimeoutSeconds=N)InterruptibleParserATNSimulatorso the timeout fires reliably inside the ATN loopParseCancelledExceptionis wrapped in aParseExceptionwith a clear WARN logDocs
Step 6 of
adding_a_new_antlr_based_language.mdupdated with ATN state isolation and timeout guidance.Verification
Validated on a real-world Android project (413 Kotlin files). jstack confirmed the
kotlin-parserthread is deeply inside the recursiveclosureCheckingStopStateloop with the override present at every level. With a 5s timeout, 86 files triggered clean timeout warnings; with the default 30s all files parsed successfully.Notes
mainand can be merged in any order.LexerATNSimulator) is linear and not affected.