-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
This issue affects the expected sorting of imports in Python and hence (incorrectly) triggers a I001 error.
When there is a folder that has the same name as a module, it is (possibly incorrectly) identified as SourceMatch. Ruff then categorizes the module as Known(FirstParty) and adapts the expected sorting accordingly.
This happens commonly, but not exclusively, when using the wandb library because it creates a wandb folder, as discovered in ChartBoost/ruff-action#20.
To reproduce the issue, create a Python file with these imports (I called it test.py):
import csv
import logging
import random
import sys
from typing import Any, Optional, TextIO
import torch
import torch.nn as nn
from torch import optim
from torcheval.metrics import (
Metric,
MulticlassAccuracy,
MulticlassF1Score,
MulticlassPrecision,
MulticlassRecall,
)
from tqdm import tqdm
import wandbIn the initial scenario, there is a wandb directory:
% ls -d wandb/
wandb/Running Ruff to check the import sorting:
% poetry run ruff check -v --select=I001 test.py
[2024-03-22][08:57:28][ruff::resolve][DEBUG] Using configuration file (via parent) at: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/pyproject.toml
[2024-03-22][08:57:28][ruff::commands::check][DEBUG] Identified files to lint in: 2.013375ms
[2024-03-22][08:57:28][ruff::diagnostics][DEBUG] Checking: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/test.py
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch.nn' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'csv' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'random' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'sys' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'logging' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'wandb' as Known(FirstParty) (SourceMatch("/Users/carstenschnober/LAHTeR/workspace/document-segmentation"))
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torcheval.metrics' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'typing' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][08:57:28][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'tqdm' as Known(ThirdParty) (NoMatch)
[2024-03-22][08:57:28][ruff::commands::check][DEBUG] Checked 1 files in: 765.709µs
All checks passed!The checks pass, wandb has been categorized as Known(FirstParty)
Now remove the wandb directory:
% mv wandb wandb.bak
% ls -d wandb/
ls: wandb/: No such file or directoryRunning the same Ruff check triggers a I001 error on the same file, categorizing wandb as Known(ThirdParty); the module categorization is cached, so I remove the .ruff_cache directory first to reproduce the error:
% rm -r .ruff_cache
% poetry run ruff check -v --select=I001 test.py
[2024-03-22][09:06:42][ruff::resolve][DEBUG] Using configuration file (via parent) at: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/pyproject.toml
[2024-03-22][09:06:42][ruff::commands::check][DEBUG] Identified files to lint in: 1.959083ms
[2024-03-22][09:06:42][ruff::diagnostics][DEBUG] Checking: /Users/carstenschnober/LAHTeR/workspace/document-segmentation/test.py
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch.nn' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'csv' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'random' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'sys' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'logging' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'wandb' as Known(FirstParty) (SourceMatch("/Users/carstenschnober/LAHTeR/workspace/document-segmentation"))
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torch' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'torcheval.metrics' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'typing' as Known(StandardLibrary) (KnownStandardLibrary)
[2024-03-22][09:06:42][ruff_linter::rules::isort::categorize][DEBUG] Categorized 'tqdm' as Known(ThirdParty) (NoMatch)
[2024-03-22][09:06:42][ruff::commands::check][DEBUG] Checked 1 files in: 1.004834ms
test.py:1:1: I001 [*] Import block is un-sorted or un-formatted
Found 1 error.
[*] 1 fixable with the `--fix` option.This is now the expected sorting that is generated when calling ruff --fix call above:
import csv
import logging
import random
import sys
from typing import Any, Optional, TextIO
import torch
import torch.nn as nn
import wandb
from torch import optim
from torcheval.metrics import (
Metric,
MulticlassAccuracy,
MulticlassF1Score,
MulticlassPrecision,
MulticlassRecall,
)
from tqdm import tqdmThis configuration option fixes the issue properly (see ChartBoost/ruff-action#20 (comment)):
[tool.ruff.lint.isort]
known-third-party = ["wandb"]However, it is difficult for users to identify the issue and fix the configuration accordingly. I think a better solution would be to have a more robust source directory detection.
A heuristics like checking for __init__.py or generally the presence of *.py files as a condition might be a solid starting point for Python.