-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Terrible performance regression due to commit "Replace macros by include files" #258
Copy link
Copy link
Closed
Labels
Description
We have just merged a commit that drops our performance in stage 1 very significantly (replacing macros by include files):
before:
simdjson$ git checkout bd9628df93851c2baf6316378a91a0b7dff32a22
Previous HEAD position was 349068d... Update README.md
HEAD is now at bd9628d... Producing a new release
dlemire@skylake:~/CVS/github/simdjson$ make parse && ./parse jsonexamples/twitter.json
g++-8 -msse4.2 -mpclmul -std=c++17 -Wall -Wextra -Wshadow -Iinclude -Ibenchmark/linux -O3 -o parse src/jsonioutil.cpp src/jsonparser.cpp src/simdjson.cpp src/stage1_find_marks.cpp src/stage2_build_tape.cpp src/parsedjson.cpp src/parsedjsoniterator.cpp benchmark/parse.cpp
number of iterations 1000
number of bytes 631515 number of structural chars 55264 ratio 0.088
mem alloc instructions: 1207 cycles: 815 (0.08 %) ins/cycles: 1.48 mis. branches: 4 (cycles/mis.branch 193.66) cache accesses: 28010 (failure 0)
mem alloc runs at 0.00 cycles per input byte.
stage 1 instructions: 1743028 cycles: 541350 (51.69 %) ins/cycles: 3.22 mis. branches: 1536 (cycles/mis.branch 352.38) cache accesses: 28010 (failure 0)
stage 1 runs at 0.86 cycles per input byte.
stage 2 instructions: 1490175 cycles: 505102 (48.23 %) ins/cycles: 2.95 mis. branches: 1662 (cycles/mis.branch 303.88) cache accesses: 51482 (failure 0)
stage 2 runs at 0.80 cycles per input byte and 9.14 cycles per structural character.
all stages: 1.66 cycles per input byte.
Estimated average frequency: 3.719 GHz.
Min: 0.000281597 bytes read: 631515 Gigabytes/second: 2.24262
after:
simdjson$ git checkout 2a24567370a7a96aacf2aa4ebf4548c817450516
Previous HEAD position was bd9628d... Producing a new release
HEAD is now at 2a24567... Replace macros by include files (#236) (#248)
dlemire@skylake:~/CVS/github/simdjson$ make parse && ./parse jsonexamples/twitter.json
g++-8 -msse4.2 -mpclmul -std=c++17 -Wall -Wextra -Wshadow -Iinclude -Ibenchmark/linux -O3 -o parse src/jsonioutil.cpp src/jsonparser.cpp src/simdjson.cpp src/stage1_find_marks.cpp src/stage2_build_tape.cpp src/parsedjson.cpp src/parsedjsoniterator.cpp benchmark/parse.cpp
number of iterations 1000
number of bytes 631515 number of structural chars 55264 ratio 0.088
mem alloc instructions: 1207 cycles: 750 (0.06 %) ins/cycles: 1.61 mis. branches: 2 (cycles/mis.branch 313.73) cache accesses: 29845 (failure 0)
mem alloc runs at 0.00 cycles per input byte.
stage 1 instructions: 2219082 cycles: 798002 (61.39 %) ins/cycles: 2.78 mis. branches: 1662 (cycles/mis.branch 479.89) cache accesses: 29845 (failure 0)
stage 1 runs at 1.26 cycles per input byte.
stage 2 instructions: 1490175 cycles: 501151 (38.55 %) ins/cycles: 2.97 mis. branches: 1546 (cycles/mis.branch 324.08) cache accesses: 54613 (failure 0)
stage 2 runs at 0.79 cycles per input byte and 9.07 cycles per structural character.
all stages: 2.06 cycles per input byte.
Estimated average frequency: 3.711 GHz.
Min: 0.000350329 bytes read: 631515 Gigabytes/second: 1.80263
It is bad enough to be considered a bug.
cc @ioioioio
Reactions are currently unavailable