Skip to content

Conversation

@lemire
Copy link
Member

@lemire lemire commented Jan 28, 2024

Credit @Validark

Test results... GCC 12, Icelake, Arabic-Lipsum.utf8.txt

I am getting a 2% to 4% gain. By itself, this may not be statistically meaningful, but we see the instruction count going down by about the same amount which tells us (with high confidence) that it is an actual optimization.

Before:

validate_utf8+haswell, input size: 81685, iterations: 20000, dataset: unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
   1.113 ins/byte,    0.319 cycle/byte,   10.081 GB/s (0.5 %),     3.213 GHz,    3.493 ins/cycle 
   1.987 ins/char,    0.569 cycle/char,    5.648 Gc/s (0.5 %)     1.78 byte/char 
validate_utf8+westmere, input size: 81685, iterations: 20000, dataset: unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
   2.426 ins/byte,    0.492 cycle/byte,    6.514 GB/s (2.8 %),     3.206 GHz,    4.931 ins/cycle 
   4.331 ins/char,    0.878 cycle/char,    3.650 Gc/s (2.8 %)     1.78 byte/char 

After:

validate_utf8+haswell, input size: 81685, iterations: 30000, dataset: unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
   1.066 ins/byte,    0.306 cycle/byte,   10.485 GB/s (1.7 %),     3.213 GHz,    3.479 ins/cycle 
   1.903 ins/char,    0.547 cycle/char,    5.874 Gc/s (1.7 %)     1.78 byte/char 
validate_utf8+westmere, input size: 81685, iterations: 30000, dataset: unicode_lipsum/lipsum/Arabic-Lipsum.utf8.txt
   2.364 ins/byte,    0.484 cycle/byte,    6.628 GB/s (0.4 %),     3.206 GHz,    4.887 ins/cycle 
   4.219 ins/char,    0.863 cycle/char,    3.713 Gc/s (0.4 %)     1.78 byte/char 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants