-
Notifications
You must be signed in to change notification settings - Fork 116
Skip ASCII check until the non ASCII byte is found #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If the 8 + 8 bytes ASCII check is unsuccessful, then there is no reason to repeat it until the non-ASCII byte is found.
918186d to
0cbb098
Compare
|
What are your benchmark numbers supporting this optimization? By how much do you speed up the processing... please include diverse data sources... |
|
Master branch This PR The dataset is a 10 MiB file made of Will run more benchmarks with real world dataset later. |
|
I am concerned about such a synthetic dataset. Would you try again with the files from this repository ? |
|
Wikipedia Japanese main page (https://ja.wikipedia.org/wiki/メインページ): Master branch This PR |
All of them? |
|
Nvm, I did not see the |
|
Master branch This PR |
|
Master branch This PR |
|
Master branch This PR |
|
Thanks. It does look convincing. Here are my own results (Apple M1, LLVM 12):
It is a bit surprising at first that it would help even with pure ASCII files, but it makes sense. Merging. |
If the 8 + 8 bytes ASCII check is unsuccessful, then there is no reason
to repeat it until the non-ASCII byte is found.