Skip to content

IllegalTokenText reports false positives for Unicode whitespace characters without escape sequences #18790

@vivek-0509

Description

@vivek-0509

I have read check documentation: https://checkstyle.org/checks/coding/illegaltokentext.html
I have downloaded the latest cli from: https://checkstyle.org/cmdline.html#Download_and_Run
I have executed the cli and showed it below, as cli describes the problem better than 1,000 words

How it works Now:

vivek@Viveks-MacBook-Air checkstyle % cat > /tmp/TestEscapes.java << 'EOF'
class TestEscapes {
  void test() {
    // These have NO escape sequences, should NOT be violations (FALSE POSITIVES)
    final String r1 = "\u000b"; 
    final String r2 = "\u001c"; 
    final String r3 = "\u001D"; 
    final String r4 = "\u1680"; 
    final String r5 = "\u2000"; 
    final String r6 = "\u3000"; 

    // These SHOULD be violations (have valid escape replacements)
    final String a1 = "\u0008"; // Should use \b
    final String a2 = "\u0009"; // Should use \t
    final String a3 = "\u0020"; // Should use \s
  }
}
EOF

vivek@Viveks-MacBook-Air checkstyle % RUN_LOCALE="-Duser.language=en -Duser.country=US"
java $RUN_LOCALE -jar target/checkstyle-*-all.jar -c src/main/resources/google_checks.xml /tmp/TestEscapes.java
Starting audit...
[WARN] /tmp/TestEscapes.java:4:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:5:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:6:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:7:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:8:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:9:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] /tmp/TestEscapes.java:13:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
Audit done.

Is your feature request related to a problem? Please describe.

The IllegalTokenText check in google_checks.xml reports false positives for Unicode characters that have no corresponding escape sequence in Java.

Per JLS §3.10.7, the only valid escape sequences are:

EscapeSequence:
\ b (backspace BS, Unicode \u0008)
\ s (space SP, Unicode \u0020)
\ t (horizontal tab HT, Unicode \u0009)
\ n (linefeed LF, Unicode \u000a)
\ f (form feed FF, Unicode \u000c)
\ r (carriage return CR, Unicode \u000d)
\ [LineTerminator](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-LineTerminator) (line continuation, no Unicode representation)
\ " (double quote ", Unicode \u0022)
\ ' (single quote ', Unicode \u0027)
\ \ (backslash \, Unicode \u005c)
[OctalEscape](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalEscape) (octal value, Unicode \u0000 to \u00ff)
OctalEscape:
\ [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
\ [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
\ [ZeroToThree](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-ZeroToThree) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit) [OctalDigit](https://docs.oracle.com/javase/specs/jls/se21/html/jls-3.html#jls-OctalDigit)
OctalDigit:
(one of)
0 1 2 3 4 5 6 7
ZeroToThree:
(one of)
0 1 2 3

Characters like \u000b (vertical tab), \u2000, \u3000 etc. have NO escape sequences in Java. Flagging them tells users to use an escape that doesn't exist.
Additionally, \u0008 (backspace -> \b) and \u0020 (space -> \s) are NOT currently flagged, but they SHOULD be.

Describe the solution you'd like

Update the regex pattern in google_checks.xml from:
\\u00(09|0(a|A)|0(b|B)|(0|1)(c|C)|(0|1)(d|D)|1(d|D)|1(e|E)|1(f|F)|22|27|5(C|c))|\\u1680|\\u3000|\\u20(00|0(a|A)|28|29|(2|5)(f|F))|\\(0(10|11|12|14|15|40|42|47)|134)

To:
\\u00(08|09|0(a|A)|0(c|C)|0(d|D)|20|22|27|5(C|c))|\\(0(10|11|12|14|15|40|42|47)|134)

Expected CLI output after fix:

Starting audit...
[WARN] TestEscapes.java:12:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] TestEscapes.java:13:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
[WARN] TestEscapes.java:14:23: Consider using special escape sequence instead of octal value or Unicode escaped value. [IllegalTokenText]
Audit done.

Only \u0008, \u0009, and \u0020 are flagged (lines 12-14), because they have valid escape replacements (\b, \t, \s).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions