Skip to content

[kotlin] Fix #6659: Prevent parser hang via InterruptibleParserATNSimulator and parse timeout#6660

Merged
adangel merged 16 commits into
pmd:mainfrom
stokpop:feature/kotlin-parser-timeout
May 25, 2026
Merged

[kotlin] Fix #6659: Prevent parser hang via InterruptibleParserATNSimulator and parse timeout#6660
adangel merged 16 commits into
pmd:mainfrom
stokpop:feature/kotlin-parser-timeout

Conversation

@stokpop

@stokpop stokpop commented May 11, 2026

Copy link
Copy Markdown
Contributor

Fixes #6659

Changes

pmd-core: InterruptibleParserATNSimulator

New class in net.sourceforge.pmd.lang.ast.impl.antlr4 that overrides closureCheckingStopState to check Thread.interrupted() at every recursion level of the ANTLR ATN closure loop. When interrupted, throws ParseCancelledException to unwind the stack immediately. Usable by all ANTLR-based languages (Swift, PL/SQL, etc.).

pmd-kotlin: PmdKotlinParser

  • Fresh DFA[] and PredictionContextCache per file (per-file ATN state isolation)
  • Daemon ExecutorService with configurable timeout (default 30s, -Dpmd.kotlin.parseTimeoutSeconds=N)
  • Uses InterruptibleParserATNSimulator so the timeout fires reliably inside the ATN loop
  • ParseCancelledException is wrapped in a ParseException with a clear WARN log

Docs

Step 6 of adding_a_new_antlr_based_language.md updated with ATN state isolation and timeout guidance.

Verification

Validated on a real-world Android project (413 Kotlin files). jstack confirmed the kotlin-parser thread is deeply inside the recursive closureCheckingStopState loop with the override present at every level. With a 5s timeout, 86 files triggered clean timeout warnings; with the default 30s all files parsed successfully.

Notes

stokpop added 2 commits May 11, 2026 13:28
Introduce InterruptibleParserATNSimulator which overrides
closureCheckingStopState() to check Thread.interrupted() and throw
ParseCancelledException. This lets Future.cancel(true) actually stop
an ANTLR parse that has entered exponential ATN state expansion.

PmdKotlinParser now:
- Gives each parse its own fresh DFA[] and PredictionContextCache via
  InterruptibleParserATNSimulator, preventing cross-file ATN state
  accumulation (static shared fields in generated KotlinParser).
- Runs each parse in a daemon thread via PARSE_EXECUTOR with a
  configurable timeout (system property pmd.kotlin.parseTimeoutSeconds,
  default 30s). Files exceeding the timeout are skipped with a warning
  and a ParseException is thrown.
…e guide

Add two bullet points to step 6 of adding_a_new_antlr_based_language.md:
- ATN state isolation: explain the static DFA/PredictionContextCache
  accumulation problem and how to fix it with a fresh simulator per parse.
- Parse timeout: recommend wrapping the parse in a daemon thread with a
  configurable timeout as a safety net against infinite parse loops.
Both reference PmdKotlinParser as the implementation example.
@stokpop stokpop changed the title Fix #6659: prevent Kotlin parser hang via InterruptibleParserATNSimulator and parse timeout [kotlin] fix #6659: prevent parser hang via InterruptibleParserATNSimulator and parse timeout May 11, 2026
@pmd-actions-helper

pmd-actions-helper Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Documentation Preview

Compared to main:
This changeset changes 0 violations,
introduces 0 new violations, 0 new errors and 0 new configuration errors,
removes 0 violations, 0 errors and 0 configuration errors.
There are 0 changed duplications, 0 new duplications and 0 removed duplications.
There are 0 changed CPD errors, 0 new CPD errors and 0 removed CPD errors.

Regression Tester Report

(comment created at 2026-05-25 15:06:13+00:00 for 6314289)

@adangel adangel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Please make this solution contained within pmd-kotlin for now. I was actually aiming at a general solution like I described now in #6684, because PLSQL is Javacc based, not ANTLR. So, until we know of a general solution, I prefer to have this contained with in Kotlin for now.

Comment thread pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/ast/PmdKotlinParser.java Outdated
Comment on lines +82 to +88
int numDecisions = KotlinParser._ATN.getNumberOfDecisions();
DFA[] decisionToDfa = new DFA[numDecisions];
for (int i = 0; i < numDecisions; i++) {
decisionToDfa[i] = new DFA(KotlinParser._ATN.getDecisionState(i), i);
}
return new InterruptibleParserATNSimulator(
parser, KotlinParser._ATN, decisionToDfa, new PredictionContextCache());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can access the _decisionToDFA field in KotlinParser - PmdKotlinParser and the generated KotlinParser live in the same package.

Suggested change
int numDecisions = KotlinParser._ATN.getNumberOfDecisions();
DFA[] decisionToDfa = new DFA[numDecisions];
for (int i = 0; i < numDecisions; i++) {
decisionToDfa[i] = new DFA(KotlinParser._ATN.getDecisionState(i), i);
}
return new InterruptibleParserATNSimulator(
parser, KotlinParser._ATN, decisionToDfa, new PredictionContextCache());
return new InterruptibleParserATNSimulator(
parser, KotlinParser._ATN, KotlinParser._decisionToDFA, KotlinParser._sharedContextCache);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to introduce thread contention. Needed unshared DFA and contextCache per file/simulator to prevent this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... this is maybe worth to investigate in the future. The default behavior of ANTLR is, to use a shared (static) cache...
Maybe the cache is poisoned by the difficult to parse files?
Not sharing the cache might increase the memory footprint...

Comment thread docs/pages/pmd/devdocs/major_contributions/adding_a_new_antlr_based_language.md Outdated
stokpop added 7 commits May 15, 2026 15:07
…kage-private

Use Thread.currentThread().isInterrupted() to avoid clearing the interrupted flag.
General pmd-core solution tracked in pmd#6684.
…llel lock contention

Sharing KotlinParser._decisionToDFA and _sharedContextCache across parallel
PMD threads causes severe monitor contention (all threads serialize on the
same DFA HashMap and PredictionContextCache lock). Under parallel parsing of
complex Kotlin files this effectively single-threads all parsing and can cause
apparent hangs. Use fresh instances per parse instead.
KotlinHandler and PmdKotlinParser previously had public no-arg constructors.
Add them back delegating to the new constructors with the default timeout
from KotlinLanguageProperties.PARSE_TIMEOUT_SECONDS.
Note future improvement: introduce KotlinLanguageProcessor to own
the executor lifecycle and supply timeout via task.getLanguageProcessor(),
removing constructor injection from PmdKotlinParser.
@stokpop

stokpop commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

@adangel I finished the review updates.

Question remaining: how far to go with the properties setup and language processor to be more in line with the Java setup. There is only one timeout property right now, but it seems a tighter integration might be appropriate once more language properties in the language bundle pop up. And it might better fit the current ExecutorService:

Suggestion: introduce KotlinLanguageProcessor

A follow-up improvement worth considering: introduce KotlinLanguageProcessor extends BatchLanguageProcessor<KotlinLanguageProperties> (similar to JavaLanguageProcessor).

This would allow:

  • The ExecutorService to be an instance field with proper lifecycle management (shutdown() via AutoCloseable.close()), rather than a static daemon-thread pool
  • PmdKotlinParser to read the timeout directly from task.getLanguageProcessor() instead of via constructor injection — removing the need for the PmdKotlinParser(int) constructor

Not blocking for the current PR but a clean follow-up to align with the pmd-java architecture.

@stokpop stokpop requested a review from adangel May 18, 2026 07:18
@adangel adangel changed the title [kotlin] fix #6659: prevent parser hang via InterruptibleParserATNSimulator and parse timeout [kotlin] Fix #6659: Prevent parser hang via InterruptibleParserATNSimulator and parse timeout May 25, 2026

@adangel adangel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I'll push some changes now and merge it.
Basically, I've introduced now the KotlinLanguageProcessor and made the language property experimental for now.

Comment thread pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/ast/PmdKotlinParser.java Outdated
Comment thread pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/ast/PmdKotlinParser.java Outdated
Comment thread pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/ast/PmdKotlinParser.java Outdated
Comment thread pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/ast/PmdKotlinParser.java Outdated
@adangel adangel added this to the 7.25.0 milestone May 25, 2026
@adangel adangel merged commit c93f482 into pmd:main May 25, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[kotlin] Parser hangs on complex files due to unbounded ATN prediction loop

2 participants