pass autopep8 first by MarcoGorelli · Pull Request #807 · nbQA-dev/nbQA

MarcoGorelli · 2023-04-01T09:59:09Z

closes #806

nbqa/__main__.py

nbqa/replace_source.py

tests/test_ipython_magics.py

tests/tools/test_ruff_works.py

MarcoGorelli · 2023-04-02T09:55:31Z

nbqa/__main__.py

+            with open(file_name) as fd:
+                for line in fd:
+                    if line == CODE_SEPARATOR:
+                        parsed_cells.append(current_cell)
+                        current_cell = []
+                    current_cell.append(line)
+                parsed_cells.append("".join(current_cell))
+            if parsed_cells:
+                # skip the initial empty one
+                parsed_cells = parsed_cells[1:]


way too complicated, just return result from first pass?

MarcoGorelli · 2023-04-02T15:24:08Z

seems to be fine, from doing some integration tests

MarcoGorelli · 2023-04-02T18:13:54Z

seems fine, let's ship it and see what happens

## Summary Add support for applying auto-fixes in Jupyter Notebook. ### Solution Cell offsets are the boundaries for each cell in the concatenated source code. They are represented using `TextSize`. It includes the start and end offset as well, thus creating a range for each cell. These offsets are updated using the `SourceMap` markers. ### SourceMap `SourceMap` contains markers constructed from each edits which tracks the original source code position to the transformed positions. The following drawing might make it clear: ![SourceMap visualization](https://github.com/astral-sh/ruff/assets/67177269/3c94e591-70a7-4b57-bd32-0baa91cc7858) The center column where the dotted lines are present are the markers included in the `SourceMap`. The `Notebook` looks at these markers and updates the cell offsets after each linter loop. If you notice closely, the destination takes into account all of the markers before it. The index is constructed only when required as it's only used to render the diagnostics. So, a `OnceCell` is used for this purpose. The cell offsets, cell content and the index will be updated after each iteration of linting in the mentioned order. The order is important here as the content is updated as per the new offsets and index is updated as per the new content. ## Limitations ### 1 Styling rules such as the ones in `pycodestyle` will not be applicable everywhere in Jupyter notebook, especially at the cell boundaries. Let's take an example where a rule suggests to have 2 blank lines before a function and the cells contains the following code: ```python import something # --- def first(): pass def second(): pass ``` (Again, the comment is only to visualize cell boundaries.) In the concatenated source code, the 2 blank lines will be added but it shouldn't actually be added when we look in terms of Jupyter notebook. It's as if the function `first` is at the start of a file. `nbqa` solves this by recording newlines before and after running `autopep8`, then running the tool and restoring the newlines at the end (refer nbQA-dev/nbQA#807). ## Test Plan Three commands were run in order with common flags (`--select=ALL --no-cache --isolated`) to isolate which stage the problem is occurring: 1. Only diagnostics 2. Fix with diff (`--fix --diff`) 3. Fix (`--fix`) ### https://github.com/facebookresearch/segment-anything ``` ------------------------------------------------------------------------------- Jupyter Notebooks 3 0 0 0 0 |- Markdown 3 98 0 94 4 |- Python 3 513 468 4 41 (Total) 611 468 98 45 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/segment-anything/**/*.ipynb --fix ... Found 180 errors (89 fixed, 91 remaining). ``` ### https://github.com/openai/openai-cookbook ``` ------------------------------------------------------------------------------- Jupyter Notebooks 65 0 0 0 0 |- Markdown 64 3475 12 2507 956 |- Python 65 9700 7362 1101 1237 (Total) 13175 7374 3608 2193 =============================================================================== ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/openai-cookbook/**/*.ipynb --fix error: Failed to parse /path/to/openai-cookbook/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb:cell 4:29:18: unexpected token '-' ... Found 4227 errors (2165 fixed, 2062 remaining). ``` ### https://github.com/tensorflow/docs ``` ------------------------------------------------------------------------------- Jupyter Notebooks 150 0 0 0 0 |- Markdown 1 55 0 46 9 |- Python 1 402 289 60 53 (Total) 457 289 106 62 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-docs/**/*.ipynb --fix error: Failed to parse /path/to/tensorflow-docs/site/en/guide/extension_type.ipynb:cell 80:1:1: unexpected token Indent error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/eager/custom_layers.ipynb:cell 20:1:1: unexpected token Indent error: Failed to parse /path/to/tensorflow-docs/site/en/guide/data.ipynb:cell 175:5:14: unindent does not match any outer indentation level error: Failed to parse /path/to/tensorflow-docs/site/en/r1/tutorials/representation/unicode.ipynb:cell 30:1:1: unexpected token Indent ... Found 12726 errors (5140 fixed, 7586 remaining). ``` ### https://github.com/tensorflow/models ``` ------------------------------------------------------------------------------- Jupyter Notebooks 46 0 0 0 0 |- Markdown 1 11 0 6 5 |- Python 1 328 249 19 60 (Total) 339 249 25 65 ------------------------------------------------------------------------------- ``` ```console $ cargo run --all-features --bin ruff -- check --no-cache --isolated --select=ALL /path/to/tensorflow-models/**/*.ipynb --fix ... Found 4856 errors (2690 fixed, 2166 remaining). ``` resolves: #1218 fixes: #4556

MarcoGorelli added 2 commits April 1, 2023 10:54

pass autopep8 first

6eb7c22

clean up

6adfed4