[kotlin] Fix #6608: Improve kotlin parser error handling#6650
Conversation
|
Compared to main: (comment created at 2026-05-22 19:13:48+00:00 for 169fd69) |
adangel
left a comment
There was a problem hiding this comment.
Thanks!
I would really throw a LexException when there are lexer errors. And fail fast.
Maybe we should add a note to
The Swift Example (PmdSwiftParser) seems to be incomplete, we should probably also add there an error listener to the lexer.
Note: As you see in the TODO comment, we don't throw an exception yet, as the test sample swift code we have in our test resources cannot be parsed by the grammar...
Note 2: When this is merged, we'll get visibility of errors when parsing Kotlin code, that we didn't have before. There might be an increase on kotlin related issues in PMD...
| public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, | ||
| int line, int charPositionInLine, | ||
| String msg, RecognitionException e) { | ||
| LOG.warn("Lexer error at {}:{}:{}: {}", task.getFileId().getOriginalPath(), line, charPositionInLine, msg); |
There was a problem hiding this comment.
I think, we should throw a LexException here as well. If the lexer failed, the tokens are not correct, and so the parse tree won't be correct and so anything on the analyzed file is unknown.
There was a problem hiding this comment.
Now throws LexException to fail fast
There was a problem hiding this comment.
We should either log the error or throw an exception, but do not both - because when throwing an exception, its likely that this exception will be handled in one way or the other - we would end up in logging the same error multiple times.
-> Please remove the LOG.warn call here
|
Sounds good to fail fast. I will add the Also added implementation notes in the md docs for adding a new ANTLR base language. Updating the Swift code to behave similar can be in another issue/PR? Collect all errors ideaThe thing missing might be that currently all errors are at least reported, and with failing fast on first lexer issue only the first one is reported. That might give multiple cycles to devs to solve issues (like we saw with the $ replacement case). We can also collect all Lex and Parser exceptions for one file and report all the issues in one For example: for BadCode.kt: package nl.stokpop
fun xor1(a: Int, b: Int) = (a ^ b)
fun xor2(a: Int, b: Int) = (a ^ b)
fun broken( { }Good idea to bundle the errors? If so I can provide a second PR for this when this is merged. |
- Add custom ANTLR error listeners in PmdKotlinParser - Lexer errors: log WARN with file location and throw LexException - Parser errors: log WARN with file location and throw ParseException - Short WARN message (strip verbose expected-token list), full at DEBUG - Simplify KotlinParsingHelper: remove duplicate custom parseImpl override - Add tests: syntaxErrorThrowsParseException, lexerErrorThrowsLexException
19db0e6 to
ff6d2f4
Compare
- Expand step 6 with error handling guidance: LexException for lexer errors, ParseException for parser errors - Replace PmdSwiftParser reference with PmdKotlinParser as the reference implementation
adangel
left a comment
There was a problem hiding this comment.
Thanks for the update.
However, we should not both log and throw an exception. The exception will end up in the report as processing error.
Good idea to bundle the errors?
I like the idea. Be aware, that some errors reported that way might be consequential errors.
| implementation that you need to extend to create your own adapter. See | ||
| [`PmdKotlinParser`](https://github.com/stokpop/pmd/blob/main/pmd-kotlin/src/main/java/net/sourceforge/pmd/lang/kotlin/ast/PmdKotlinParser.java) | ||
| as the reference implementation. | ||
| * **Error handling**: ANTLR by default swallows errors silently. You must register custom |
There was a problem hiding this comment.
| * **Error handling**: ANTLR by default swallows errors silently. You must register custom | |
| * **Error handling**: ANTLR by default prints errors to stderr only and otherwise ignores them. You must register custom |
| as the reference implementation. | ||
| * **Error handling**: ANTLR by default swallows errors silently. You must register custom | ||
| `BaseErrorListener`s on both the lexer and the parser to surface errors properly: | ||
| * **Lexer errors** (unrecognized tokens): log a `WARN` and throw a |
There was a problem hiding this comment.
We should either LOG or throw an exception, but not both...
| public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, | ||
| int line, int charPositionInLine, | ||
| String msg, RecognitionException e) { | ||
| LOG.warn("Lexer error at {}:{}:{}: {}", task.getFileId().getOriginalPath(), line, charPositionInLine, msg); |
There was a problem hiding this comment.
We should either log the error or throw an exception, but do not both - because when throwing an exception, its likely that this exception will be handled in one way or the other - we would end up in logging the same error multiple times.
-> Please remove the LOG.warn call here
| LOG.warn("Parser error at {}:{}:{}: {}", task.getFileId().getOriginalPath(), line, charPositionInLine, shortMessage(msg)); | ||
| LOG.debug("Parser error at {}:{}:{}: {}", task.getFileId().getOriginalPath(), line, charPositionInLine, msg); |
There was a problem hiding this comment.
Same, we should not both log and throw. Please remove the loggings.
Also: I would provide the full message to the parse exception and not remove any information fro the message.
|
@stokpop I've updated the PR to collect the errors and throw only one exception per file (and remove the extra logging). |
adangel
left a comment
There was a problem hiding this comment.
Thanks for the PR.
I've updated the PR to collect the errors and throw only one exception per file (and remove the extra logging).
If this proves useful, then we can think about moving the (private inner) class "AntlrErrorListener" to pmd-core.
improve kotlin parser error handling
Report lexer and parser errors via regular logging instead to std err.
Add reference to file and location.
Makes a difference between lexer and parser errors.
Throws ParserError to be used in reports and testing.
Related issues
Ready?
./mvnw clean verifypasses (checked automatically by github actions)