Add type annotations to the project and run mypy on CI#261
Add type annotations to the project and run mypy on CI#261dan-blanchard merged 1 commit intochardet:masterfrom jdufresne:mypy
Conversation
dan-blanchard
left a comment
There was a problem hiding this comment.
This would be a great contribution, but I think some of the types aren't quite right. Once those are fixed, we should be good to go though.
chardet/__init__.py
Outdated
|
|
||
|
|
||
| def detect(byte_str): | ||
| def detect(byte_str: str) -> ResultDict: |
There was a problem hiding this comment.
This should be Union[bytes, bytearray], not str. This function explicitly does not accept str.
chardet/chardistribution.py
Outdated
| self._freq_chars = 0 | ||
|
|
||
| def feed(self, char, char_len): | ||
| def feed(self, char: Sequence[int], char_len: int) -> None: |
There was a problem hiding this comment.
I think most of the places you have Sequence[int] should be Union[bytes, bytearray].
chardet/charsetgroupprober.py
Outdated
| return self._best_guess_prober.language | ||
|
|
||
| def feed(self, byte_str): | ||
| def feed(self, byte_str: bytes) -> int: |
There was a problem hiding this comment.
Every byte_str argument should be Union[bytes, bytearray].
chardet/charsetprober.py
Outdated
| self._state = None | ||
| _state: int | ||
|
|
||
| def __init__(self, lang_filter: int = 0) -> None: |
There was a problem hiding this comment.
lang_filter should be LanguageFilter. Although, the default of 0 probably makes that tricky. Honestly, all the enums in the chardet.enums module should probably be overhauled and made into proper Python 3 Enum or Flag types.
chardet/chardistribution.py
Outdated
| self._freq_chars = None | ||
| # Mapping table to get frequency order from char order (get from | ||
| # GetOrder()) | ||
| _char_to_freq_order: Tuple[int, ...] |
There was a problem hiding this comment.
Why are you getting rid of all the attribute initializations from __init__ (here and elsewhere)? In local testing, that would make it an AttributeError to try to access any of those.
chardet/eucjpprober.py
Outdated
| return "Japanese" | ||
|
|
||
| def feed(self, byte_str): | ||
| def feed(self, byte_str: bytes) -> int: |
There was a problem hiding this comment.
This int is really a ProbingState.
|
Thanks for the thorough review and great feedback! I believe I have applied all suggestions and this is ready for another round. |
dan-blanchard
left a comment
There was a problem hiding this comment.
Thanks for putting so much time into this project! This is mostly looking good. Just a few minor complaints at this point.
Also, you probably want to rebase from master, because I added a new prober file.
chardet/charsetgroupprober.py
Outdated
| self.probers = [] | ||
| self._best_guess_prober = None | ||
| self.probers: List[CharSetProber] = [] | ||
| self.active: Dict[CharSetProber, bool] = {} |
There was a problem hiding this comment.
Why do we need this dictionary when the probers already all have an active property?
There was a problem hiding this comment.
One was not defined in charsetprober.py, so mypy reported an error. I reverted this change and added the missing property to the parent class.
| filtered = bytearray() | ||
| in_tag = False | ||
| prev = 0 | ||
| buf = memoryview(buf).cast("c") |
There was a problem hiding this comment.
The way this was before was added specifically as a performance improvement in #252. Calling ord() every time will slow this down a little.
There was a problem hiding this comment.
I see, thanks. This was done to workaround typeshed bug python/typeshed#8182
I'll revert the change and then ignore the false positive.
chardet/enums.py
Outdated
|
|
||
|
|
||
| class ProbingState: | ||
| class ProbingState(Flag): |
There was a problem hiding this comment.
This (and the other ones in this file except for LanguageFilter) should inherit from Enum, because they're not meaningfully combinable by bitwise operations.
There was a problem hiding this comment.
That makes sense, I changed this to an Enum in the latest revision.
This helps consumers of chardet ensure that they are using the provided API correctly. The project includes a py.typed file for PEP-561 compliance. This also helps ensure internal correctness and consistency. Adding the static type checking caught at least one suspect pattern in chardet/hebrewprober.py where an int value was compared to string (probably leftover from Python 2 support). Type checking will run on all pull requests through GitHub actions and pre-commit.
This helps consumers of chardet ensure that they are using the provided
API correctly. The project includes a py.typed file for PEP-561
compliance.
This also helps ensure internal correctness and consistency. Adding the
static type checking caught at least one suspect pattern in
chardet/hebrewprober.py where an int value was compared to
string (probably leftover from Python 2 support).
Type checking will run on all pull requests through GitHub actions and
pre-commit.