Skip to content

Terrible performance regression due to commit "Replace macros by include files" #258

@lemire

Description

@lemire

We have just merged a commit that drops our performance in stage 1 very significantly (replacing macros by include files):

before:

simdjson$ git checkout bd9628df93851c2baf6316378a91a0b7dff32a22
Previous HEAD position was 349068d... Update README.md
HEAD is now at bd9628d... Producing a new release
dlemire@skylake:~/CVS/github/simdjson$ make parse && ./parse jsonexamples/twitter.json
g++-8 -msse4.2 -mpclmul  -std=c++17   -Wall -Wextra -Wshadow -Iinclude  -Ibenchmark/linux  -O3 -o parse src/jsonioutil.cpp src/jsonparser.cpp src/simdjson.cpp src/stage1_find_marks.cpp src/stage2_build_tape.cpp src/parsedjson.cpp src/parsedjsoniterator.cpp benchmark/parse.cpp
number of iterations 1000
number of bytes 631515 number of structural chars 55264 ratio 0.088
mem alloc instructions:       1207 cycles:        815 (0.08 %) ins/cycles: 1.48 mis. branches:          4 (cycles/mis.branch 193.66) cache accesses:      28010 (failure          0)
 mem alloc runs at 0.00 cycles per input byte.
stage 1 instructions:    1743028 cycles:     541350 (51.69 %) ins/cycles: 3.22 mis. branches:       1536 (cycles/mis.branch 352.38) cache accesses:      28010 (failure          0)
 stage 1 runs at 0.86 cycles per input byte.
stage 2 instructions:    1490175 cycles:     505102 (48.23 %) ins/cycles: 2.95 mis. branches:       1662  (cycles/mis.branch 303.88)  cache accesses:      51482 (failure          0)
 stage 2 runs at 0.80 cycles per input byte and 9.14 cycles per structural character.
 all stages: 1.66 cycles per input byte.
Estimated average frequency: 3.719 GHz.
Min:  0.000281597 bytes read: 631515 Gigabytes/second: 2.24262

after:

simdjson$ git checkout 2a24567370a7a96aacf2aa4ebf4548c817450516
Previous HEAD position was bd9628d... Producing a new release
HEAD is now at 2a24567... Replace macros by include files (#236) (#248)
dlemire@skylake:~/CVS/github/simdjson$ make parse && ./parse jsonexamples/twitter.json
g++-8 -msse4.2 -mpclmul  -std=c++17   -Wall -Wextra -Wshadow -Iinclude  -Ibenchmark/linux  -O3 -o parse src/jsonioutil.cpp src/jsonparser.cpp src/simdjson.cpp src/stage1_find_marks.cpp src/stage2_build_tape.cpp src/parsedjson.cpp src/parsedjsoniterator.cpp benchmark/parse.cpp
number of iterations 1000
number of bytes 631515 number of structural chars 55264 ratio 0.088
mem alloc instructions:       1207 cycles:        750 (0.06 %) ins/cycles: 1.61 mis. branches:          2 (cycles/mis.branch 313.73) cache accesses:      29845 (failure          0)
 mem alloc runs at 0.00 cycles per input byte.
stage 1 instructions:    2219082 cycles:     798002 (61.39 %) ins/cycles: 2.78 mis. branches:       1662 (cycles/mis.branch 479.89) cache accesses:      29845 (failure          0)
 stage 1 runs at 1.26 cycles per input byte.
stage 2 instructions:    1490175 cycles:     501151 (38.55 %) ins/cycles: 2.97 mis. branches:       1546  (cycles/mis.branch 324.08)  cache accesses:      54613 (failure          0)
 stage 2 runs at 0.79 cycles per input byte and 9.07 cycles per structural character.
 all stages: 2.06 cycles per input byte.
Estimated average frequency: 3.711 GHz.
Min:  0.000350329 bytes read: 631515 Gigabytes/second: 1.80263

It is bad enough to be considered a bug.

cc @ioioioio

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions