According to Checkstyle's limitations, checkstyle does not support UTF-8 characters, and indeed the grammar defines an overly simple regex for recognizing identifiers. But it's not quite true that checkstyle doesn't support any UTF-8 though, since UTF-8 characters in strings or comments are matched by the . regex and are therefore silently accepted.
Per the JLS, section 3.8, identifiers are richer than the simple [A-Za-z_$][0-9A-Za-z_$]* regex defined in JavadocLexer.g4. I think a closer regex that's compatible with Antlr's Lexer rule elements would be
fragment JavaIdentStart: [\p{N}\p{L}\p{Sc}\p{Pc}];
fragment JavaIdentPart: [\p{N}\p{L}\p{Sc}\p{Pc}\p{M}];
fragment Identifier: JavaIdentStart (JavaIdentPart)*;
I see that there are prior issues similar to this (mainly #4562), but I guess my question is, if it's technically feasible and not too difficult to fix the definition of Identifier above to be more faithful to the JLS, why should this limitation continue? If this is simply a hard rule that checkstyle insists upon, feel free to close this issue as a duplicate, but it seems like an odd inconsistency to enforce.
According to Checkstyle's limitations, checkstyle does not support UTF-8 characters, and indeed the grammar defines an overly simple regex for recognizing identifiers. But it's not quite true that checkstyle doesn't support any UTF-8 though, since UTF-8 characters in strings or comments are matched by the
.regex and are therefore silently accepted.Per the JLS, section 3.8, identifiers are richer than the simple
[A-Za-z_$][0-9A-Za-z_$]*regex defined in JavadocLexer.g4. I think a closer regex that's compatible with Antlr's Lexer rule elements would beI see that there are prior issues similar to this (mainly #4562), but I guess my question is, if it's technically feasible and not too difficult to fix the definition of
Identifierabove to be more faithful to the JLS, why should this limitation continue? If this is simply a hard rule that checkstyle insists upon, feel free to close this issue as a duplicate, but it seems like an odd inconsistency to enforce.