`chardet.detect` a lot slower than using `UniversalDetector.feed` with chunks

I noticed a very big difference in execution time between using `chardet.detect` with a complete bytes string and using `UniversalDetector.feed` with chunks of 1 MB.

- With a 100 MB file, composed only of "tests tests tests tests [....]":
  - `chardet.detect` takes ~64 seconds.
  - `UniversalDetector.feed` takes ~3 seconds.

- With the previous file on which I appended a file in MacRoman of ~10 **K**B (containing the character `’` in MacRoman):
  - `chardet.detect`: I interrupted the execution after 20 minutes...
  - `UniversalDetector.feed` takes ~3 seconds.

In case you wonder what code I used, I compared the execution time of the following:
- `chardet.detect`:
```python
print(detect(original_txt))
```
- `UniversalDetector.feed`:
```python
num_chunks_processed = 0
for start, end in _get_chunk_slice_intervals(len(original_txt), CHUNK_SIZE):
    chunk = original_txt[start:end]
    detector.feed(chunk)
    num_chunks_processed += 1
detector.close()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`chardet.detect` a lot slower than using `UniversalDetector.feed` with chunks #286

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

chardet.detect a lot slower than using UniversalDetector.feed with chunks #286

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`chardet.detect` a lot slower than using `UniversalDetector.feed` with chunks #286