Skip to content

Conversation

@aphedges
Copy link
Contributor

@aphedges aphedges commented Jun 24, 2022

Back in #1602, I proposed adding flake8 checks, and @jeffra said it was a good idea. I've put in two PRs (#1601 and #2050) that likely would have been caught by flake8, so I decided to finally get some of the work I had done locally into a better state and into a PR.

I don't have the testing infrastructure to run the full test suite locally, so I'll be using periodic pushes to this PR to fix problems as needed. Hopefully it doesn't take too long to complete. All CI now passes.

I changed a docstring but couldn't render the documentation to verify I didn't break anything. I think my almost complete lack of Ruby knowledge makes trying to fix the rendering errors not worth the effort.

I tried to fix as much code as I could, but I used # noqa: F821 to suppress some errors so the check can be used going forward. Here are the locations of my usage and when the error was introduced:

  • print_rank_0 was recently broken in 5349347 (2022-07-25).
  • unfused_mode was recently broken in 8164ea9 (2022-05-24).
  • _dim_to_name has always broken since it was added in 6996bb0 (2021-09-21). Broken code removed in cf587b8 (2022-07-26).
  • q_range has always broken since it was added in ed3de0c (2021-05-24). Broken code removed in 0f4f2f9 (2022-07-19).
  • profile has always broken since it was added in a128f34 (2021-04-07).
  • checkpoint, cuda, and resolved_archive_file have always broken since it was added in 734d899 (2020-05-29).

None of these errors, which would have shown up if the code were ever run, are reported as issues, so I suspect the code is neither tested nor used. I suggest someone with more knowledge of these components look into fixes or remove what might be dead code.

@mrwyattii
Copy link
Contributor

@aphedges This is great - is it ready for review?

@aphedges
Copy link
Contributor Author

@mrwyattii, not yet. I've been rebasing a local version periodically to fix new breakages, but I haven't pushed it yet. I also have a comment to finish writing up because I found some bugs that I've suppressed warnings for in favor of getting the checks implemented. I expect to finish my improvements and set it ready for review later this week.

@aphedges
Copy link
Contributor Author

@mrwyattii, it's all ready! I've pushed my latest changes, updated the PR description with some additional information, and set the PR as ready for review.

@aphedges aphedges marked this pull request as ready for review July 12, 2022 04:54
@aphedges aphedges changed the title [DRAFT] Add flake8 to pre-commit checks Add flake8 to pre-commit checks Jul 12, 2022
Copy link
Contributor

@mrwyattii mrwyattii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good to me. I left some comments for changes that I'm unsure about if you could help me understand.

There are quite a few added #noqa for F401 and F821 - is this normal to need to add these when bringing flake into a project? I'd like to better understand why they are necessary.

@aphedges
Copy link
Contributor Author

There are quite a few added #noqa for F401 and F821 - is this normal to need to add these when bringing flake into a project? I'd like to better understand why they are necessary.

I personally think it depends on the codebase, and the reason for locally suppressing the two checks are different:

The usages of # noqa: F821 ("undefined name name") are discussed in the main comment (#2051 (comment)). In short, I decided to suppress some of these flake8 errors for undeclared variables that I was unable to fix myself. I was able to fix most of them (e.g., in here and in #2050), but I'd rather someone with more knowledge of the codebase fix these. They all involve actual code that currently will fail at runtime but probably hasn't been run in months or years, and the intended behavior is not clear to me from the surrounding code or the commit history. Please see the linked comment for a more detailed explanation.

I needed to refresh myself about my usage of # noqa: F401 ("module imported but unused"). The check is to remove unused imports, which in addition to being messy, can cause bugs like the one I fixed in #1601. In all but one of these cases, the import is used only to check whether a module can be imported, even though the import isn't used. There might be a better way to do this, but I'm not knowledgeable enough about the import system to do it myself. The only other case is special: https://github.com/microsoft/DeepSpeed/blob/bb1836a2f1868441323f32c5b7cb3a2fbcb34220/deepspeed/git_version_info.py#L3 The identifiers imported from .git_version_info_installed.py are only used for re-exporting from git_version_info.py. Most of these are the sole statement in a try block, and the pattern is common enough in this project and others that it might be worth changing flake8 itself, but that seems like a bit much. I definitely removed many more unused imports than I needed to add # noqa F401.

I also suppressed some checks in the pre-commit configuration itself. Some, like the E and W families, are mostly for code formatting. They would take some more work that I don't think is worth it, given that my focus here was on fixing current and preventing future bugs. In addition, this repository already uses yapf for code formatting checks, so they might have been redundant or contradictory. Other checks, such as F541 ("f-string without any placeholders") and F841 ("local variable name is assigned to but never used"), might be related to bugs or cause wasted resources, but they were too much effort for me to personally fix. There is also a risk of false positives in some cases, hence why I ignored unused imports in __init__.py, where such a construct is often intentional.

I don't intend this to be an absolutely comprehensive PR for flake8 integration. This is just meant to be a step to fix some problems and prevent the introduction of some more in the future. Other people can improve on the work I've done here if more or better checks are desired.

@aphedges
Copy link
Contributor Author

@mrwyattii, just following up on this. This kind of PR is annoying to rebase, so I'd like it get merged before one of the larger PRs makes it in.

@jeffra
Copy link
Collaborator

jeffra commented Jul 18, 2022

Hi @aphedges, thank you so much for doing this PR! I am looking it over now and overall it looks good, but just trying to make sure there aren't any subtle issues since it touches so many files. Let's try and merge it in this week but there are other PRs coming that we're working on today.

@aphedges
Copy link
Contributor Author

I completely understand that you want to carefully review this. I'll keep updating it as new PRs make it to master. For example, aa88137 introduced some new breakages, and I just pushed my rebased branch with fixes in a separate commit. Part of the reason I was hoping this could get merged sooner rather than later is that I need to do some guesswork about some of the breakages that frequently happen in the very large commits. There are multiple undeclared variables in deepspeed/module_inject/replace_module.py that I hope I fixed correctly, but I don't really know if I did.

@aphedges
Copy link
Contributor Author

I have rebased against master again to fix the merge conflict and fixed the new flake8 error introduced in ee7ea3b.

@aphedges
Copy link
Contributor Author

I have rebased against master again to fix the merge conflict and the new flake8 errors introduced in 0f4f2f9. The commit also removed two errors I was suppressing, so it wasn't all bad.

@aphedges
Copy link
Contributor Author

I have rebased against master again to fix the merge conflict and the new flake8 errors introduced in 69b7c97. The commit also introduced an error (calling Path.isfile() instead of Path.is_file()) that wasn't caught by flake8 but was caught by PyCharm.

@aphedges
Copy link
Contributor Author

I have fixed the new merge conflict caused by 844d9f3.

@aphedges
Copy link
Contributor Author

@jeffra, have you gotten around to reviewing yet?

@aphedges
Copy link
Contributor Author

I have rebased against master again to fix the merge conflict and the new flake8 errors introduced in 80d0a32.

@aphedges
Copy link
Contributor Author

I have rebased against master again to fix the merge conflict and the new flake8 errors introduced in 8413b7f and 5349347. The latter fix was more complex, so I needed to do some guessing and suppress an error I could not figure out how to fix myself.

The sooner this PR gets merged, the sooner PR authors can fix these mistakes themselves that will often cause their code to fail at runtime.

@aphedges
Copy link
Contributor Author

There was a test failure but I couldn't rerun it myself, so I rebased and force-pushed.

Copy link
Collaborator

@jeffra jeffra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aphedges for all your hard work on this! It's very much appreciated :)

@jeffra jeffra merged commit 316c4a4 into deepspeedai:master Jul 25, 2022
@aphedges
Copy link
Contributor Author

aphedges commented Aug 3, 2022

@jeffra, thanks for merging it!

I needed to suppress some errors for undeclared variables I could not fix myself. All of the lines would cause a crash if they are run. Should I make issues for them so they aren't just lost in this PR? They are all bugs waiting to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants