Skip to content

Consider handling of non-printable characters when showing code/diffs #10841

@carljm

Description

@carljm

Search keyword used: "non-printable"

Python source code containing non-printable chars (e.g. ^H or backspace, ^Z or substitution, ^[ or escape; see https://github.com/astral-sh/ruff/blob/main/crates/ruff_linter/resources/test/fixtures/pylint/invalid_characters.py for examples) can cause unpredictable and confusing results if we output it as-is.

Here's a minimal example of the current behavior:

➜ python3 -c "print('b = \'\x08\'')" > bad.py

➜ cat bad.py
b = '

➜ cat -v bad.py
b = '^H'

➜ xxd bad.py
00000000: 6220 3d20 2708 270a                      b = '.'.

➜ cargo run -- check --diff --no-cache --preview --select PLE2510 bad.py
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running `target/debug/ruff check --diff --no-cache --preview --select PLE2510 bad.py`
--- bad.py
+++ bad.py
@@ -1 +1 @@
-b = '
+b = '\b'

Would fix 1 error.

Here the diff is confusing because it looks like the original string doesn't even have matching quotes, just a single quote char. In fact it has two quotes, but the intervening backspace character deleted one of them. (Note this is the same behavior shown by cat without -v; it just allows control characters to take effect.)

Interestingly, this is also the same as the default behavior of both diff and git diff, which means we may want to be cautious about diverging from it. But we could consider doing some kind of escaping of non-printable characters before outputting code frames or diffs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions