-
Notifications
You must be signed in to change notification settings - Fork 45
Description
binaryornot 0.4.4 crashes when used with chardet 7.0.0. The new version of chardet can return {'encoding': None, 'confidence': 0.99} — high confidence, but no encoding. The condition in is_binary_string does not guard against a None encoding, so it passes None directly to .decode(), causing a crash.
Traceback
File ".../binaryornot/helpers.py", line 103, in is_binary_string
bytes_to_check.decode(encoding=detected_encoding['encoding'])
TypeError: decode() argument 'encoding' must be str, not None
During handling of the above exception, another exception occurred:
File ".../binaryornot/helpers.py", line 106, in is_binary_string
unicode(bytes_to_check, encoding=detected_encoding['encoding'])
NameError: name 'unicode' is not defined
Root Cause
In helpers.py, the guard condition before decoding is:
if (detected_encoding['confidence'] > 0.9 and
detected_encoding['encoding'] != 'ascii'):With chardet 7.0.0, detect() can return:
{'encoding': None, 'confidence': 0.99, 'language': ''}Since None != 'ascii' evaluates to True, the code enters the if block and calls:
bytes_to_check.decode(encoding=None) # TypeErrorThe except TypeError block then falls back to:
unicode(bytes_to_check, encoding=None) # NameError: Python 2 onlySo the error handler itself crashes instead of recovering gracefully.
Suggested Fix
Add detected_encoding['encoding'] is not None to the guard condition:
if (detected_encoding['confidence'] > 0.9 and
detected_encoding['encoding'] is not None and
detected_encoding['encoding'] != 'ascii'):This is a minimal, surgical fix that preserves the existing logic while handling the new chardet behavior correctly.
Environment
binaryornot: 0.4.4chardet: 7.0.0Python: 3.12.3