A grammar containing Unicode code point references for newlines (carriage return and line feed) may produce invalid Java code.
Sample grammar:
grammar Demo;
linebreak: LF
| CR
;
CR : '\u000D';
LF : '\u000A';
Snippet from resulting DemoLexer.java:
private static final String[] _LITERAL_NAMES = {
null, "'\u000D'", "'\u000A'"
};
private static final String[] _SYMBOLIC_NAMES = {
null, "CR", "LF"
};
This will fail to compile. It turns out that since the Java compiler interprets Unicode character escapes before parsing, "\u000D" is equivalent to having a literal carriage return in the middle of a string. Instead of \u000D and \u000A, ANTLR should emit \r and \n.
A grammar containing Unicode code point references for newlines (carriage return and line feed) may produce invalid Java code.
Sample grammar:
Snippet from resulting DemoLexer.java:
This will fail to compile. It turns out that since the Java compiler interprets Unicode character escapes before parsing,
"\u000D"is equivalent to having a literal carriage return in the middle of a string. Instead of\u000Dand\u000A, ANTLR should emit\rand\n.