[eradicate] ignore # language= in commented-out-code rule (ERA001)#14069
[eradicate] ignore # language= in commented-out-code rule (ERA001)#14069charliermarsh merged 1 commit intoastral-sh:mainfrom
eradicate] ignore # language= in commented-out-code rule (ERA001)#14069Conversation
|
Do you mind pushing the test cases? I only see the code changes. For reference: https://www.jetbrains.com/help/pycharm/using-language-injections.html |
96f0c10 to
b94339b
Compare
|
Sorry @charliermarsh , forgot to force-push after I amended the commit with the test cases. |
|
No prob, thanks for the PR. |
|
And looks like I need to fix the formatting... Working on it. |
b94339b to
0d9abfe
Compare
Fixes one common case cited on astral-sh#6019 Should work with the examples from Jetbrains documentation: https://www.jetbrains.com/help/pycharm/using-language-injections.html
0d9abfe to
58922e4
Compare
I've added some more test cases and fixed the regexp with the additional cases described here. |
|
|
Thanks and welcome to the project. |
| static ALLOWLIST_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new( | ||
| r"^(?i)(?:pylint|pyright|noqa|nosec|region|endregion|type:\s*ignore|fmt:\s*(on|off)|isort:\s*(on|off|skip|skip_file|split|dont-add-imports(:\s*\[.*?])?)|mypy:|SPDX-License-Identifier:|(?:en)?coding[:=][ \t]*([-_.a-zA-Z0-9]+))", | ||
| r"^(?i)(?:pylint|pyright|noqa|nosec|region|endregion|type:\s*ignore|fmt:\s*(on|off)|isort:\s*(on|off|skip|skip_file|split|dont-add-imports(:\s*\[.*?])?)|mypy:|SPDX-License-Identifier:|language=[a-zA-Z](?: ?[-_.a-zA-Z0-9]+)+(?:\s+prefix=\S+)?(?:\s+suffix=\S+)?|(?:en)?coding[:=][ \t]*([-_.a-zA-Z0-9]+))", |
There was a problem hiding this comment.
This nested quantifier doesn't look so nice: [a-zA-Z](?: ?[-_.a-zA-Z0-9]+)+. The optional space at the start of the group leads to multiple ways of matching the same text.
I know Rust's regex engine guarantees linear time, but this is nevertheless a bad pattern. Prefer [a-zA-Z][-_.a-zA-Z0-9]*(?: [-_.a-zA-Z0-9]+)* instead.
Technically, language IDs can be anything, so it's perhaps better to drop the leading character requirement: [-_.a-zA-Z0-9]+(?: [-_.a-zA-Z0-9]+)*.
There was a problem hiding this comment.
Should we just use language=? All language identifiers will start with that, and false positives seem somewhat rare / not a huge deal here.
There was a problem hiding this comment.
@InSyncWithFoo maybe I was extra cautious about false positives.
Technically, language IDs can be anything,
Good catch! Maybe we can just require a non-space character after the equals sign. This should take care of most false positives anyway, as it's uncommon for Python users to omit spaces around assignment operators.
The only thing that worries me is about detecting a commented-out named parameter on a multi-line method invocation or declaration. But maybe I'm just being overzealous.
def check_spelling(
text,
# language=DEFAULT_LANGUAGE
) ...There was a problem hiding this comment.
A quick search shows that matching on language= alone would lead to many false negatives.
Language IDs are also displayed to the user as part of the UI, so I doubt they would contain non-ASCII characters. I would say limiting to [-_.a-zA-Z0-9] and spaces is the most balanced heuristic.
There was a problem hiding this comment.
[...] it's uncommon for Python users to omit spaces around assignment operators.
Don't forget keyword arguments, which are recommended to be written with no spaces around the equal sign:
# frobnicate(
# language='en' # Could also be `language=EN` with a predefined constant `EN`
# )I think both this and the false negative you mention are well within the acceptable error margin.
There was a problem hiding this comment.
So @InSyncWithFoo , will you write the PR with the new regexp and test cases, or should I?
There was a problem hiding this comment.
Turns out language IDs in comments must be normalized.
This is JSON Lines (one value on each line, comments allowed):
This is pure JSON (one single top-level value, comments disallowed):
jsonlines must be used even though autocompletion popups use json lines:
Notably, this rule applies to some languages but not others (though Ruff doesn't need to care about this):




Fixes one common case cited on #6019
Summary
The
commented-out-coderule (ERA001) fromeradicateis currently flagging a very common idiom that marks Python strings as another language, to help with syntax highlighting:This PR adds this idiom to the list of allowed exceptions to the rule.
Test Plan
I've added some additional test cases.