-
Notifications
You must be signed in to change notification settings - Fork 291
IndexError on sending in ISO-8859-7 #124
Copy link
Copy link
Closed
Description
When I send in the following (python 3.5), chardet will raise an IndexError:
b'\xcc\xe5 \xef\xec\xe9\xeb\xdf\xe1 \xf4\xe7\xf2'
In 2.3.0 this would return ISO-8859-7 in version 3.0.1 and 3.0.0 it returns None. Not entirely sure which is the correct behaviour.
Traceback:
encoding = chardet.detect(body)['encoding']
/opt/anaconda3/envs/env/lib/python3.5/site-packages/chardet/__init__.py:39: in detect
return detector.close()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <chardet.universaldetector.UniversalDetector object at 0x7f177ce554a8>
def close(self):
"""
Stop analyzing the current document and come up with a final
prediction.
:returns: The ``result`` attribute, a ``dict`` with the keys
`encoding`, `confidence`, and `language`.
"""
# Don't bother with checks if we're already done
if self.done:
return self.result
self.done = True
if not self._got_data:
self.logger.debug('no data received!')
# Default to ASCII if it is all we've seen so far
elif self._input_state == InputState.PURE_ASCII:
self.result = {'encoding': 'ascii',
'confidence': 1.0,
'language': ''}
# If we have seen non-ASCII, return the best that met MINIMUM_THRESHOLD
elif self._input_state == InputState.HIGH_BYTE:
prober_confidence = None
max_prober_confidence = 0.0
max_prober = None
for prober in self._charset_probers:
if not prober:
continue
prober_confidence = prober.get_confidence()
if prober_confidence > max_prober_confidence:
max_prober_confidence = prober_confidence
max_prober = prober
if max_prober and (max_prober_confidence > self.MINIMUM_THRESHOLD):
charset_name = max_prober.charset_name
lower_charset_name = max_prober.charset_name.lower()
confidence = max_prober.get_confidence()
# Use Windows encoding name instead of ISO-8859 if we saw any
# extra Windows-specific bytes
if lower_charset_name.startswith('iso-8859'):
if self._has_win_bytes:
charset_name = self.ISO_WIN_MAP.get(lower_charset_name,
charset_name)
self.result = {'encoding': charset_name,
'confidence': confidence,
'language': max_prober.language}
# Log all prober confidences if none met MINIMUM_THRESHOLD
if self.logger.getEffectiveLevel() == logging.DEBUG:
if self.result['encoding'] is None:
self.logger.debug('no probers hit minimum threshold')
> for prober in self._charset_probers[0].probers:
E IndexError: list index out of range
/opt/anaconda3/envs/env/lib/python3.5/site-packages/chardet/universaldetector.py:271: IndexError
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels