Problem
When parsing complex Kotlin files (e.g. large Jetpack Compose files with deeply nested lambda expressions), the ANTLR parser enters an exponential ATN prediction loop in ParserATNSimulator.closureCheckingStopState. This loop has no interruption points — the thread spins indefinitely and PMD hangs forever, requiring a kill.
Root cause: the ANTLR 4 ATN closure algorithm performs recursive PredictionContext.merge calls that can grow without bound on adversarial input. The generated parser also uses static shared DFA[] and PredictionContextCache fields, so ATN state from one file accumulates and affects subsequent files.
Solution
1. InterruptibleParserATNSimulator (in pmd-core)
New class overriding closureCheckingStopState to check Thread.interrupted() at every recursion level. When interrupted, throws ParseCancelledException to unwind the ATN stack immediately. Placed in pmd-core so it is reusable by all ANTLR-based languages (Kotlin, Swift, PL/SQL, etc.).
2. Per-file ATN state isolation
PmdKotlinParser creates fresh DFA[] and PredictionContextCache per file, preventing unbounded state accumulation across files.
3. Parse timeout
A daemon ExecutorService runs each parse with a configurable timeout (default 30s, system property pmd.kotlin.parseTimeoutSeconds). On timeout the thread is interrupted, ParseCancelledException is thrown, and a clear ParseException with WARN log is reported.
Verification
Validated on a real-world Android project (413 Kotlin files). jstack confirmed the kotlin-parser thread is deeply inside the recursive closureCheckingStopState loop and the override fires at every level. With a 5s timeout, 86 files triggered clean timeout warnings; with the default 30s all files parsed successfully.
Notes
Problem
When parsing complex Kotlin files (e.g. large Jetpack Compose files with deeply nested lambda expressions), the ANTLR parser enters an exponential ATN prediction loop in
ParserATNSimulator.closureCheckingStopState. This loop has no interruption points — the thread spins indefinitely and PMD hangs forever, requiring a kill.Root cause: the ANTLR 4 ATN closure algorithm performs recursive
PredictionContext.mergecalls that can grow without bound on adversarial input. The generated parser also uses static sharedDFA[]andPredictionContextCachefields, so ATN state from one file accumulates and affects subsequent files.Solution
1.
InterruptibleParserATNSimulator(inpmd-core)New class overriding
closureCheckingStopStateto checkThread.interrupted()at every recursion level. When interrupted, throwsParseCancelledExceptionto unwind the ATN stack immediately. Placed inpmd-coreso it is reusable by all ANTLR-based languages (Kotlin, Swift, PL/SQL, etc.).2. Per-file ATN state isolation
PmdKotlinParsercreates freshDFA[]andPredictionContextCacheper file, preventing unbounded state accumulation across files.3. Parse timeout
A daemon
ExecutorServiceruns each parse with a configurable timeout (default 30s, system propertypmd.kotlin.parseTimeoutSeconds). On timeout the thread is interrupted,ParseCancelledExceptionis thrown, and a clearParseExceptionwith WARN log is reported.Verification
Validated on a real-world Android project (413 Kotlin files). jstack confirmed the
kotlin-parserthread is deeply inside the recursiveclosureCheckingStopStateloop and the override fires at every level. With a 5s timeout, 86 files triggered clean timeout warnings; with the default 30s all files parsed successfully.Notes
LexerATNSimulator) is linear and not affected.