Skip to content

Reorganize Chorba activation and refactor Chorba dispatch#2003

Closed
Dead2 wants to merge 2 commits intodevelopfrom
chorba-option
Closed

Reorganize Chorba activation and refactor Chorba dispatch#2003
Dead2 wants to merge 2 commits intodevelopfrom
chorba-option

Conversation

@Dead2
Copy link
Copy Markdown
Member

@Dead2 Dead2 commented Nov 11, 2025

First commit:

  • Disabling Chorba using CMake parameter WITH_CRC32_CHORBA=OFF will now only disable the crc32_chorba C fallback.

  • SSE2, SSE41 and pclmul variants will still be able to use their own Chorba-algorithm based code, but their fallback to
    the generic crc32_chorba C code in SSE2 and SSE41 will be disabled, reducing their performance on really big
    input buffers. But this allows the SSE2 and SSE41 functions to still be used in the first place, and they provide a huge win over crc32_braid for buffers over 72 bytes.

  • Remove the crc32_c function (and its file crc32_c.c), instead use the normal functable routing to select
    between crc32_braid and crc32_chorba.

Second commit:

  • Unify crc32_chorba, chorba_sse2 and chorba_sse41 dispatch functions.
  • Fixed alignment diff calculation in crc32_chorba.
  • Fixed length check to happen early, avoiding extra branches for too short lengths,
    this also allows removing one function call to crc32_braid_internal to handle those.
    Gbench shows ~0.15-0.25ns saved per call for lengths shorter than CHORBA_SMALL_THRESHOLD.
  • Avoid calculating aligned len if buffer is already aligned

@Dead2 Dead2 added optimization cleanup Improving maintainability or removing code. labels Nov 11, 2025
@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Nov 11, 2025

Develop chorba:

   text    data     bss     dec     hex filename
 172705    1336       8  174049   2a7e1 libz-ng.so.2

crc32/generic_chorba/1             3.10 ns         3.10 ns    675690494
crc32/generic_chorba/8             18.9 ns         18.9 ns    110886773
crc32/generic_chorba/12            28.3 ns         28.3 ns     74274078
crc32/generic_chorba/16            37.3 ns         37.3 ns     56274779
crc32/generic_chorba/32            73.2 ns         73.2 ns     28700075
crc32/generic_chorba/64             136 ns          136 ns     15404068
crc32/generic_chorba/512            218 ns          218 ns      9651682
crc32/generic_chorba/4096           801 ns          801 ns      2621940
crc32/generic_chorba/32768         3744 ns         3741 ns       560932
crc32/generic_chorba/262144       42695 ns        42664 ns        49218
crc32/generic_chorba/4194304     473708 ns       473200 ns         4438
crc32/braid/1                      2.89 ns         2.89 ns    728547467
crc32/braid/8                      18.5 ns         18.5 ns    113315139
crc32/braid/12                     27.6 ns         27.6 ns     76220392
crc32/braid/16                     36.7 ns         36.7 ns     57207607
crc32/braid/32                     72.8 ns         72.8 ns     28862844
crc32/braid/64                      136 ns          136 ns     15487978
crc32/braid/512                     338 ns          338 ns      6219458
crc32/braid/4096                   1856 ns         1855 ns      1132360
crc32/braid/32768                 14198 ns        14189 ns       147956
crc32/braid/262144               112853 ns       112778 ns        18620
crc32/braid/4194304             1804162 ns      1802761 ns         1165
crc32/chorba_sse2/1                3.58 ns         3.58 ns    587932705
crc32/chorba_sse2/8                18.8 ns         18.7 ns    112066886
crc32/chorba_sse2/12               27.8 ns         27.8 ns     75499817
crc32/chorba_sse2/16               36.9 ns         36.9 ns     56961587
crc32/chorba_sse2/32               72.9 ns         72.8 ns     28839354
crc32/chorba_sse2/64                136 ns          136 ns     15454656
crc32/chorba_sse2/512               218 ns          218 ns      9646373
crc32/chorba_sse2/4096              767 ns          767 ns      2739492
crc32/chorba_sse2/32768            5147 ns         5143 ns       408270
crc32/chorba_sse2/262144          40222 ns        40194 ns        52243
crc32/chorba_sse2/4194304        473518 ns       473006 ns         4436
crc32/chorba_sse41/1               3.87 ns         3.87 ns    542077360
crc32/chorba_sse41/8               19.0 ns         19.0 ns    110902721
crc32/chorba_sse41/12              28.1 ns         28.1 ns     74842151
crc32/chorba_sse41/16              37.2 ns         37.2 ns     56492273
crc32/chorba_sse41/32              73.0 ns         72.9 ns     28795892
crc32/chorba_sse41/64               136 ns          136 ns     15438183
crc32/chorba_sse41/512              218 ns          217 ns      9654918
crc32/chorba_sse41/4096             767 ns          766 ns      2741171
crc32/chorba_sse41/32768           3158 ns         3156 ns       669681
crc32/chorba_sse41/262144         40220 ns        40192 ns        52247
crc32/chorba_sse41/4194304       474687 ns       474174 ns         4432
crc32/pclmulqdq/1                  2.83 ns         2.83 ns    742418583
crc32/pclmulqdq/8                  18.6 ns         18.5 ns    113264403
crc32/pclmulqdq/12                 27.6 ns         27.6 ns     76224065
crc32/pclmulqdq/16                 15.9 ns         15.9 ns    131754599
crc32/pclmulqdq/32                 18.4 ns         18.4 ns    114226316
crc32/pclmulqdq/64                 23.1 ns         23.1 ns     90810841
crc32/pclmulqdq/512                40.7 ns         40.7 ns     51632668
crc32/pclmulqdq/4096                177 ns          177 ns     11843701
crc32/pclmulqdq/32768              1268 ns         1267 ns      1657291
crc32/pclmulqdq/262144            10007 ns        10000 ns       209998
crc32/pclmulqdq/4194304          161074 ns       160909 ns        13051
crc32/vpclmulqdq/1                 2.86 ns         2.86 ns    735542979
crc32/vpclmulqdq/8                 18.6 ns         18.5 ns    113213237
crc32/vpclmulqdq/12                27.6 ns         27.5 ns     76254784
crc32/vpclmulqdq/16                13.6 ns         13.6 ns    154014073
crc32/vpclmulqdq/32                16.0 ns         15.9 ns    131629404
crc32/vpclmulqdq/64                20.2 ns         20.2 ns    104103404
crc32/vpclmulqdq/512               32.3 ns         32.2 ns     65145546
crc32/vpclmulqdq/4096              88.5 ns         88.5 ns     23732276
crc32/vpclmulqdq/32768              601 ns          601 ns      3494008
crc32/vpclmulqdq/262144            4760 ns         4757 ns       441285
crc32/vpclmulqdq/4194304         111769 ns       111656 ns        18808

PR chorba first commit

   text    data     bss     dec     hex filename
 172705    1336       8  174049   2a7e1 libz-ng.so.2
 
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
crc32/braid/1                    2.93 ns         2.92 ns    719406136
crc32/braid/8                    18.5 ns         18.5 ns    113317008
crc32/braid/12                   27.6 ns         27.6 ns     76220173
crc32/braid/16                   36.8 ns         36.7 ns     57156932
crc32/braid/32                   72.8 ns         72.8 ns     28860763
crc32/braid/64                    136 ns          136 ns     15487645
crc32/braid/512                   338 ns          338 ns      6220800
crc32/braid/4096                 1856 ns         1855 ns      1132144
crc32/braid/32768               14199 ns        14190 ns       148129
crc32/braid/262144             112856 ns       112781 ns        18620
crc32/braid/4194304           1804209 ns      1802775 ns         1164
crc32/chorba_c/1                 3.18 ns         3.17 ns    659048456
crc32/chorba_c/8                 18.9 ns         18.9 ns    111080363
crc32/chorba_c/12                28.3 ns         28.3 ns     74239968
crc32/chorba_c/16                37.3 ns         37.3 ns     56319108
crc32/chorba_c/32                73.2 ns         73.2 ns     28693749
crc32/chorba_c/64                 137 ns          136 ns     15392666
crc32/chorba_c/512                218 ns          218 ns      9640366
crc32/chorba_c/4096               802 ns          801 ns      2621320
crc32/chorba_c/32768             3740 ns         3737 ns       562384
crc32/chorba_c/262144           42692 ns        42662 ns        49228
crc32/chorba_c/4194304         473657 ns       473137 ns         4435
crc32/chorba_sse2/1              3.58 ns         3.58 ns    586856341
crc32/chorba_sse2/8              18.8 ns         18.7 ns    112055775
crc32/chorba_sse2/12             27.8 ns         27.8 ns     75497218
crc32/chorba_sse2/16             36.9 ns         36.9 ns     56987277
crc32/chorba_sse2/32             72.9 ns         72.8 ns     28838736
crc32/chorba_sse2/64              136 ns          136 ns     15454577
crc32/chorba_sse2/512             218 ns          218 ns      9636752
crc32/chorba_sse2/4096            767 ns          767 ns      2739199
crc32/chorba_sse2/32768          5143 ns         5139 ns       408603
crc32/chorba_sse2/262144        40216 ns        40188 ns        52252
crc32/chorba_sse2/4194304      472758 ns       472251 ns         4453
crc32/chorba_sse41/1             3.90 ns         3.89 ns    540539383
crc32/chorba_sse41/8             19.0 ns         18.9 ns    110901743
crc32/chorba_sse41/12            28.0 ns         28.0 ns     74863248
crc32/chorba_sse41/16            37.2 ns         37.1 ns     56518383
crc32/chorba_sse41/32            73.0 ns         72.9 ns     28801059
crc32/chorba_sse41/64             136 ns          136 ns     15438072
crc32/chorba_sse41/512            218 ns          217 ns      9660084
crc32/chorba_sse41/4096           767 ns          766 ns      2740947
crc32/chorba_sse41/32768         3101 ns         3099 ns       680597
crc32/chorba_sse41/262144       40217 ns        40189 ns        52254
crc32/chorba_sse41/4194304     472276 ns       471762 ns         4450
crc32/pclmulqdq/1                2.83 ns         2.83 ns    742497836
crc32/pclmulqdq/8                18.6 ns         18.5 ns    113264558
crc32/pclmulqdq/12               27.6 ns         27.6 ns     76220698
crc32/pclmulqdq/16               15.9 ns         15.9 ns    131762314
crc32/pclmulqdq/32               18.4 ns         18.4 ns    114223427
crc32/pclmulqdq/64               23.1 ns         23.1 ns     90810289
crc32/pclmulqdq/512              40.7 ns         40.7 ns     51632732
crc32/pclmulqdq/4096              177 ns          177 ns     11843777
crc32/pclmulqdq/32768            1268 ns         1267 ns      1657558
crc32/pclmulqdq/262144          10006 ns        10000 ns       210008
crc32/pclmulqdq/4194304        160816 ns       160650 ns        13075
crc32/vpclmulqdq/1               2.86 ns         2.86 ns    735418521
crc32/vpclmulqdq/8               18.6 ns         18.5 ns    113220817
crc32/vpclmulqdq/12              27.6 ns         27.5 ns     76254241
crc32/vpclmulqdq/16              13.6 ns         13.6 ns    154058114
crc32/vpclmulqdq/32              16.0 ns         15.9 ns    131527717
crc32/vpclmulqdq/64              20.2 ns         20.2 ns    104094000
crc32/vpclmulqdq/512             32.2 ns         32.2 ns     65162256
crc32/vpclmulqdq/4096            88.5 ns         88.5 ns     23731126
crc32/vpclmulqdq/32768            601 ns          601 ns      3493667
crc32/vpclmulqdq/262144          4704 ns         4701 ns       446740
crc32/vpclmulqdq/4194304       112096 ns       111981 ns        18751

PR chorba w/second commit

   text    data     bss     dec     hex filename
 172641    1336       8  173985   2a7a1 libz-ng.so.2

---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
crc32/chorba_c/1                 3.06 ns         3.06 ns    644173372
crc32/chorba_c/8                 18.4 ns         18.4 ns    113878535
crc32/chorba_c/12                27.6 ns         27.6 ns     75787290
crc32/chorba_c/16                36.5 ns         36.5 ns     57539119
crc32/chorba_c/32                72.6 ns         72.6 ns     28897460
crc32/chorba_c/64                 136 ns          136 ns     15492882
crc32/chorba_c/512                217 ns          217 ns      9664754
crc32/chorba_c/4096               801 ns          801 ns      2625290
crc32/chorba_c/32768             3690 ns         3690 ns       566865
crc32/chorba_c/262144           42649 ns        42650 ns        49230
crc32/chorba_c/4194304         476330 ns       476337 ns         4456
crc32/chorba_sse2/1              3.14 ns         3.14 ns    685480799
crc32/chorba_sse2/8              18.6 ns         18.6 ns    112965504
crc32/chorba_sse2/12             27.8 ns         27.8 ns     75638620
crc32/chorba_sse2/16             36.6 ns         36.6 ns     57291437
crc32/chorba_sse2/32             72.8 ns         72.8 ns     28821955
crc32/chorba_sse2/64              136 ns          136 ns     15476053
crc32/chorba_sse2/512             218 ns          218 ns      9629954
crc32/chorba_sse2/4096            766 ns          766 ns      2742872
crc32/chorba_sse2/32768          5138 ns         5138 ns       408700
crc32/chorba_sse2/262144        40152 ns        40153 ns        52344
crc32/chorba_sse2/4194304      468526 ns       468532 ns         4503
crc32/chorba_sse41/1             3.34 ns         3.34 ns    628580403
crc32/chorba_sse41/8             19.0 ns         19.0 ns    109422843
crc32/chorba_sse41/12            27.8 ns         27.8 ns     74813569
crc32/chorba_sse41/16            37.1 ns         37.1 ns     56900526
crc32/chorba_sse41/32            72.8 ns         72.9 ns     28781865
crc32/chorba_sse41/64             136 ns          136 ns     15475719
crc32/chorba_sse41/512            217 ns          217 ns      9656945
crc32/chorba_sse41/4096           766 ns          767 ns      2736065
crc32/chorba_sse41/32768         3132 ns         3132 ns       682495
crc32/chorba_sse41/262144       40184 ns        40185 ns        52298
crc32/chorba_sse41/4194304     462848 ns       462854 ns         4434

Not much happening with first commit only, should be pretty much identical.
Second commit reduces time for lengths below 64 bytes.

@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Nov 11, 2025

Develop no-chorba:

   text    data     bss     dec     hex filename
 144129    1336       8  145473   23841 libz-ng.so.2

-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
crc32/braid/1                  3.01 ns         3.01 ns    703427566
crc32/braid/8                  18.7 ns         18.7 ns    112540130
crc32/braid/12                 27.6 ns         27.6 ns     76147496
crc32/braid/16                 36.7 ns         36.7 ns     57235994
crc32/braid/32                 72.8 ns         72.7 ns     28886889
crc32/braid/64                  136 ns          136 ns     15494825
crc32/braid/512                 338 ns          338 ns      6218748
crc32/braid/4096               1856 ns         1855 ns      1132343
crc32/braid/32768             14197 ns        14188 ns       148136
crc32/braid/262144           112857 ns       112784 ns        18619
crc32/braid/4194304         1804420 ns      1803049 ns         1165
crc32/pclmulqdq/1              2.79 ns         2.79 ns    752358003
crc32/pclmulqdq/8              18.6 ns         18.5 ns    113245961
crc32/pclmulqdq/12             27.6 ns         27.5 ns     76247232
crc32/pclmulqdq/16             14.3 ns         14.3 ns    146951624
crc32/pclmulqdq/32             16.8 ns         16.8 ns    125282002
crc32/pclmulqdq/64             21.5 ns         21.5 ns     97768904
crc32/pclmulqdq/512            39.1 ns         39.1 ns     53748734
crc32/pclmulqdq/4096            188 ns          188 ns     11155424
crc32/pclmulqdq/32768          1444 ns         1443 ns      1454893
crc32/pclmulqdq/262144        11494 ns        11486 ns       182833
crc32/pclmulqdq/4194304      183134 ns       182946 ns        11477
crc32/vpclmulqdq/1             2.84 ns         2.84 ns    740692297
crc32/vpclmulqdq/8             18.6 ns         18.6 ns    113090677
crc32/vpclmulqdq/12            27.6 ns         27.6 ns     76146838
crc32/vpclmulqdq/16            13.7 ns         13.6 ns    153888446
crc32/vpclmulqdq/32            15.9 ns         15.9 ns    131837571
crc32/vpclmulqdq/64            20.2 ns         20.2 ns    104089360
crc32/vpclmulqdq/512           32.3 ns         32.3 ns     65030827
crc32/vpclmulqdq/4096          90.8 ns         90.7 ns     23148192
crc32/vpclmulqdq/32768          603 ns          603 ns      3482177
crc32/vpclmulqdq/262144        4704 ns         4701 ns       446710
crc32/vpclmulqdq/4194304     103496 ns       103390 ns        20311

PR no-chorba first commit

   text    data     bss     dec     hex filename
 157553    1336       8  158897   26cb1 libz-ng.so.2
 
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
crc32/braid/1                    2.82 ns         2.82 ns    744440365
crc32/braid/8                    18.5 ns         18.5 ns    113691717
crc32/braid/12                   27.5 ns         27.5 ns     76384998
crc32/braid/16                   36.7 ns         36.7 ns     57225855
crc32/braid/32                   72.7 ns         72.7 ns     28886662
crc32/braid/64                    136 ns          136 ns     15494515
crc32/braid/512                   338 ns          338 ns      6218970
crc32/braid/4096                 1856 ns         1855 ns      1132324
crc32/braid/32768               14198 ns        14188 ns       148043
crc32/braid/262144             112884 ns       112809 ns        18615
crc32/braid/4194304           1804595 ns      1803154 ns         1165
crc32/chorba_sse2/1              3.35 ns         3.34 ns    627973238
crc32/chorba_sse2/8              19.0 ns         19.0 ns    110600734
crc32/chorba_sse2/12             27.8 ns         27.8 ns     75514061
crc32/chorba_sse2/16             37.1 ns         37.1 ns     56584340
crc32/chorba_sse2/32             72.9 ns         72.9 ns     28811706
crc32/chorba_sse2/64              136 ns          136 ns     15461133
crc32/chorba_sse2/512             218 ns          218 ns      9622575
crc32/chorba_sse2/4096            767 ns          767 ns      2738183
crc32/chorba_sse2/32768          5151 ns         5148 ns       407767
crc32/chorba_sse2/262144        40224 ns        40197 ns        52234
crc32/chorba_sse2/4194304      641229 ns       640563 ns         3278
crc32/chorba_sse41/1             3.78 ns         3.78 ns    556491394
crc32/chorba_sse41/8             18.9 ns         18.9 ns    111070516
crc32/chorba_sse41/12            28.0 ns         28.0 ns     74914657
crc32/chorba_sse41/16            37.2 ns         37.1 ns     56522834
crc32/chorba_sse41/32            72.9 ns         72.9 ns     28810936
crc32/chorba_sse41/64             136 ns          136 ns     15445452
crc32/chorba_sse41/512            218 ns          218 ns      9652240
crc32/chorba_sse41/4096           767 ns          766 ns      2740432
crc32/chorba_sse41/32768         3121 ns         3119 ns       677126
crc32/chorba_sse41/262144       40216 ns        40188 ns        52241
crc32/chorba_sse41/4194304     641297 ns       640635 ns         3278
crc32/pclmulqdq/1                2.82 ns         2.82 ns    745889798
crc32/pclmulqdq/8                18.5 ns         18.5 ns    113359607
crc32/pclmulqdq/12               27.6 ns         27.5 ns     76266031
crc32/pclmulqdq/16               15.9 ns         15.9 ns    131744681
crc32/pclmulqdq/32               18.4 ns         18.4 ns    114225006
crc32/pclmulqdq/64               23.1 ns         23.1 ns     90810622
crc32/pclmulqdq/512              40.7 ns         40.7 ns     51632596
crc32/pclmulqdq/4096              177 ns          177 ns     11842518
crc32/pclmulqdq/32768            1268 ns         1267 ns      1657372
crc32/pclmulqdq/262144          10008 ns        10001 ns       209957
crc32/pclmulqdq/4194304        160924 ns       160755 ns        13041
crc32/vpclmulqdq/1               2.84 ns         2.84 ns    739143133
crc32/vpclmulqdq/8               18.5 ns         18.5 ns    113317732
crc32/vpclmulqdq/12              27.5 ns         27.5 ns     76296747
crc32/vpclmulqdq/16              13.7 ns         13.7 ns    153442495
crc32/vpclmulqdq/32              16.0 ns         16.0 ns    131678365
crc32/vpclmulqdq/64              20.2 ns         20.2 ns    104117564
crc32/vpclmulqdq/512             32.2 ns         32.2 ns     65151011
crc32/vpclmulqdq/4096            88.6 ns         88.5 ns     23729204
crc32/vpclmulqdq/32768            601 ns          601 ns      3493756
crc32/vpclmulqdq/262144          4705 ns         4701 ns       446666
crc32/vpclmulqdq/4194304       111702 ns       111587 ns        18818

PR no-chorba w/second commit

   text    data     bss     dec     hex filename
 157625    1336       8  158969   26cf9 libz-ng.so.2

---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
crc32/chorba_sse2/1              3.11 ns         3.11 ns    679513589
crc32/chorba_sse2/8              18.5 ns         18.5 ns    113707388
crc32/chorba_sse2/12             27.5 ns         27.5 ns     76341139
crc32/chorba_sse2/16             36.5 ns         36.4 ns     57614530
crc32/chorba_sse2/32             72.7 ns         72.7 ns     28893747
crc32/chorba_sse2/64              136 ns          136 ns     15483436
crc32/chorba_sse2/512             218 ns          218 ns      9625579
crc32/chorba_sse2/4096            767 ns          767 ns      2738419
crc32/chorba_sse2/32768          5148 ns         5144 ns       408137
crc32/chorba_sse2/262144        40215 ns        40187 ns        52250
crc32/chorba_sse2/4194304      641262 ns       640596 ns         3278
crc32/chorba_sse41/1             3.60 ns         3.60 ns    583701252
crc32/chorba_sse41/8             18.8 ns         18.8 ns    111426580
crc32/chorba_sse41/12            27.9 ns         27.9 ns     75249478
crc32/chorba_sse41/16            36.9 ns         36.9 ns     56866388
crc32/chorba_sse41/32            72.9 ns         72.8 ns     28837437
crc32/chorba_sse41/64             136 ns          136 ns     15466446
crc32/chorba_sse41/512            218 ns          218 ns      9640132
crc32/chorba_sse41/4096           767 ns          767 ns      2739370
crc32/chorba_sse41/32768         3100 ns         3098 ns       677794
crc32/chorba_sse41/262144       40216 ns        40188 ns        52245
crc32/chorba_sse41/4194304     641289 ns       640609 ns         3279

First commit allows SSE2 and SSE4.1 optimized functions to be used, providing a huge speedup even without the chorba_c fallback included in the code.
First commit improves speed of long buffers in pclmulqdq.
Second commit improves speed of buffers shorter than 64 bytes for both chorba_sse* functions.

With this PR, you can also save 15KB on the library size if you disable chorba_c when compiling for modern systems that don't need that fallback. Without chorba_c, chorba_sse2 and chorba_sse41 still perform perfectly up to 256KB/512KB (32/64bit) buffers, above that they only perform at about 73% of the speed compared to with chorba_c. But they still perform at nearly 3x the speed compared to crc32_braid that was used for these cpus before version 2.3.0.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 11, 2025

Walkthrough

Removed the generic crc32_c implementation and build objects; added a Chorba dispatcher crc32_chorba; remapped runtime/native CRC selection to prefer Chorba/braid variants; exposed Chorba x86 SSE entry points under ISA guards; updated headers, templates, arch callsites, tests, and benchmarks.

Changes

Cohort / File(s) Summary
Build system
CMakeLists.txt, Makefile.in, arch/generic/Makefile.in
Removed arch/generic/crc32_c.c from source lists and removed crc32_c.o/crc32_c.lo from OBJ/PIC_OBJ and targets.
Generic CRC implementation removed
arch/generic/crc32_c.c
Deleted the crc32_c(uint32_t, const uint8_t*, size_t) implementation and its includes.
Chorba dispatcher & generic headers
arch/generic/crc32_chorba_c.c, arch/generic/generic_functions.h
Added crc32_chorba() dispatcher; removed crc32_c prototype; added nondestructive Chorba variant prototypes; remapped native_crc32 to crc32_chorba (else crc32_braid when Chorba disabled).
x86 Chorba SSE implementations & declarations
arch/x86/chorba_sse2.c, arch/x86/chorba_sse41.c, arch/x86/x86_functions.h
Simplified ISA guards to rely on X86 feature macros; adjusted alignment/threshold routing and initialization; made Chorba SSE entry declarations/mappings unconditional when ISA macro present; added chorba_small_nondestructive_sse2 declaration.
x86 folding/templates
arch/x86/crc32_fold_pclmulqdq_tpl.h, arch/x86/crc32_pclmulqdq_tpl.h
Removed WITHOUT_CHORBA guards so Chorba folding (including fold_12) is unconditional; added CRC32_FOLD_COPY variant under COPY.
CRC macros & internals
crc32.h
Removed CHORBA_SMALL_THRESHOLD_32BIT; introduced conditional CHORBA_SMALL_THRESHOLD (72 if OPTIMAL_CMP==64 else 80); removed exposure of crc32_braid_internal prototype.
Function table wiring
functable.c
Switched generic fallback from crc32_c to crc32_braid; assigned crc32_chorba/crc32_chorba_sse* in relevant branches, removing prior WITHOUT_CHORBA guards in SSE regions.
Arch-specific callsite updates
arch/riscv/crc32_zbc.c, arch/s390/crc32-vx.c
Replaced calls to crc32_c(...) with crc32_braid(...); removed crc32_c extern in RISC-V and added arch_functions.h include.
Tests & benchmarks
test/benchmarks/benchmark_crc32.cc, test/test_crc32.cc
Replaced generic_chorba/generic benches/tests with chorba_c entries; registered SSE2/SSE41 Chorba benches/tests under ISA guards; simplified compile-time guards.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant native as native_crc32 (macro)
  participant disp as crc32_chorba / crc32_chorba_sse*
  participant chorba as Chorba_variants
  participant braid as crc32_braid

  Caller->>native: call(native_crc32, crc, buf, len)
  alt ISA-specific SSE enabled
    native->>disp: route to crc32_chorba_sse2 / crc32_chorba_sse41
  else generic
    native->>disp: route to crc32_chorba
  end

  disp->>disp: init c = (~crc)&0xffffffff, compute algn_diff
  alt len < algn_diff + CHORBA_SMALL_THRESHOLD
    disp->>braid: crc32_braid(c, buf, len)
  else
    disp->>braid: process head (unaligned) via crc32_braid
    disp->>chorba: select large/medium/small nondestructive variant for aligned body
    chorba-->>disp: processed aligned chunk(s)
    alt tail remains
      disp->>braid: process tail via crc32_braid
    end
  end
  disp-->>Caller: return (c ^ 0xffffffff)
Loading

Possibly related PRs

Suggested labels

Continuous Integration, bug

Suggested reviewers

  • KungFuJesus
  • nmoinvaz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description provides detailed context for both commits, explaining the CMake parameter changes, function reorganization, performance optimizations, and specific improvements. It clearly relates to the changeset.
Title check ✅ Passed The title 'Reorganize Chorba activation and refactor Chorba dispatch' directly and specifically describes the main changes in the PR, which reorganize how the Chorba algorithm is activated and refactor the dispatch mechanisms across generic and architecture-specific implementations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chorba-option

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Dead2 Dead2 force-pushed the chorba-option branch 3 times, most recently from e2b7823 to d2da176 Compare November 11, 2025 22:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 780838b and d2da176.

📒 Files selected for processing (17)
  • CMakeLists.txt (0 hunks)
  • Makefile.in (0 hunks)
  • arch/generic/Makefile.in (0 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/generic/crc32_chorba_c.c (1 hunks)
  • arch/generic/generic_functions.h (2 hunks)
  • arch/riscv/crc32_zbc.c (2 hunks)
  • arch/s390/crc32-vx.c (2 hunks)
  • arch/x86/chorba_sse2.c (2 hunks)
  • arch/x86/chorba_sse41.c (3 hunks)
  • arch/x86/crc32_fold_pclmulqdq_tpl.h (1 hunks)
  • arch/x86/crc32_pclmulqdq_tpl.h (0 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • crc32.h (1 hunks)
  • functable.c (3 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (2 hunks)
💤 Files with no reviewable changes (5)
  • arch/generic/Makefile.in
  • CMakeLists.txt
  • Makefile.in
  • arch/generic/crc32_c.c
  • arch/x86/crc32_pclmulqdq_tpl.h
🧰 Additional context used
🧠 Learnings (21)
📓 Common learnings
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • arch/riscv/crc32_zbc.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
  • test/test_crc32.cc
  • arch/generic/crc32_chorba_c.c
  • arch/generic/generic_functions.h
  • crc32.h
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • arch/riscv/crc32_zbc.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
  • test/test_crc32.cc
  • arch/generic/crc32_chorba_c.c
  • arch/generic/generic_functions.h
  • crc32.h
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
  • test/test_crc32.cc
  • crc32.h
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • arch/riscv/crc32_zbc.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
  • test/test_crc32.cc
  • arch/generic/crc32_chorba_c.c
  • arch/generic/generic_functions.h
  • crc32.h
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • arch/riscv/crc32_zbc.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
  • test/test_crc32.cc
  • arch/generic/crc32_chorba_c.c
  • arch/generic/generic_functions.h
  • crc32.h
📚 Learning: 2025-02-23T16:51:54.545Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse41.c
  • test/test_crc32.cc
📚 Learning: 2024-10-04T03:17:24.773Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: chunkset_tpl.h:164-165
Timestamp: 2024-10-04T03:17:24.773Z
Learning: In `chunkset_tpl.h`, using `goto` in the `CHUNKMEMSET` function aids the compiler in inlining the function, so it should be retained.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-29T02:22:52.846Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • functable.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-08T21:51:45.330Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.

Applied to files:

  • arch/x86/x86_functions.h
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:84-86
Timestamp: 2024-10-08T19:37:14.998Z
Learning: When `INFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR` is not defined, `state->sane` is completely removed from the codebase and does not need to be initialized.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2025-06-09T16:46:28.468Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.

Applied to files:

  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-04-15T09:30:10.081Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2025-06-10T07:38:03.297Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:103-104
Timestamp: 2025-06-10T07:38:03.297Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, when dist < sizeof(chunk_t), the vl variable intentionally becomes 0, causing the while loop to not execute. This is correct behavior because copying full chunks is not safe when the distance is smaller than chunk size, and the function appropriately falls back to memcpy for handling remaining bytes.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2024-10-29T02:22:55.489Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: arch/x86/chunkset_avx512.c:32-34
Timestamp: 2024-10-29T02:22:55.489Z
Learning: In `arch/x86/chunkset_avx512.c`, the `gen_mask` function's `len` parameter cannot exceed 32 because it is only called on the remaining bytes from a 32-byte vector.

Applied to files:

  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Applied to files:

  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:37:54.508Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.

Applied to files:

  • arch/x86/chorba_sse2.c
📚 Learning: 2024-12-22T20:40:03.280Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1833
File: test/test_crc32.cc:0-0
Timestamp: 2024-12-22T20:40:03.280Z
Learning: In the crc32_align test class (test/test_crc32.cc), the alignment offset parameter is always guaranteed to be non-negative because the test parameters are controlled and it never makes sense to have a negative offset.

Applied to files:

  • test/test_crc32.cc
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2024-10-07T22:00:02.180Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.

Applied to files:

  • crc32.h
🧬 Code graph analysis (3)
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)
arch/x86/crc32_pclmulqdq_tpl.h (2)
  • fold_12 (168-208)
  • fold_4 (125-166)
test/benchmarks/benchmark_crc32.cc (1)
test/benchmarks/benchmark_main.cc (1)
  • cpu_features (16-16)
arch/generic/crc32_chorba_c.c (3)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-851)
  • uint32_t (853-880)
arch/x86/chorba_sse41.c (2)
  • uint32_t (59-309)
  • uint32_t (311-340)
arch/generic/crc32_braid_c.c (2)
  • uint32_t (63-213)
  • uint32_t (215-222)

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 11, 2025

Codecov Report

❌ Patch coverage is 97.71689% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.27%. Comparing base (780838b) to head (b7d4138).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
test/benchmarks/benchmark_crc32.cc 0.00% 0 Missing and 3 partials ⚠️
arch/riscv/crc32_zbc.c 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2003      +/-   ##
===========================================
- Coverage    83.31%   83.27%   -0.05%     
===========================================
  Files          161      160       -1     
  Lines        12960    12953       -7     
  Branches      3149     3145       -4     
===========================================
- Hits         10798    10786      -12     
+ Misses        1134     1101      -33     
- Partials      1028     1066      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Dead2 Dead2 force-pushed the chorba-option branch 3 times, most recently from 76ec271 to 6c541fb Compare November 12, 2025 10:07
@Dead2 Dead2 requested a review from Copilot November 12, 2025 10:21
@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Nov 12, 2025

32-bit MSVC does seem to fail with a segfault, not sure what is causing that.
https://github.com/zlib-ng/zlib-ng/actions/runs/19293711806/job/55170333036?pr=2003

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Nov 12, 2025

Only fails in Release build... Output is:

[  FAILED  ] 18 tests, listed below:
[  FAILED  ] crc32/crc32_variant.chorba_sse2/76, where GetParam() = 16-byte object <00-00 00-00 B0-35 0B-00 64-00 00-00 B9-1D C7-E7>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/77, where GetParam() = 16-byte object <00-00 00-00 18-36 0B-00 64-00 00-00 77-27 A5-EA>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/78, where GetParam() = 16-byte object <00-00 00-00 80-36 0B-00 64-00 00-00 48-20 47-CD>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/143, where GetParam() = 16-byte object <D3-4F B3-74 B0-35 0B-00 64-00 00-00 60-80 F9-CC>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/144, where GetParam() = 16-byte object <70-D7 1F-35 18-36 0B-00 64-00 00-00 12-53 B9-D8>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/145, where GetParam() = 16-byte object <77-EF 5A-C4 80-36 0B-00 64-00 00-00 12-99 1C-BB>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/146, where GetParam() = 16-byte object <77-EF 5A-C4 D8-3C 0B-00 58-02 00-00 5B-FA 8A-88>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/147, where GetParam() = 16-byte object <00-00 00-00 A0-E5 0C-00 00-80 00-00 B2-26 77-21>
[  FAILED  ] crc32/crc32_variant.chorba_sse2/148, where GetParam() = 16-byte object <00-00 00-00 A0-E5 0C-00 00-40 00-00 F0-22 17-E8>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/76, where GetParam() = 16-byte object <00-00 00-00 B0-35 0B-00 64-00 00-00 B9-1D C7-E7>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/77, where GetParam() = 16-byte object <00-00 00-00 18-36 0B-00 64-00 00-00 77-27 A5-EA>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/78, where GetParam() = 16-byte object <00-00 00-00 80-36 0B-00 64-00 00-00 48-20 47-CD>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/143, where GetParam() = 16-byte object <D3-4F B3-74 B0-35 0B-00 64-00 00-00 60-80 F9-CC>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/144, where GetParam() = 16-byte object <70-D7 1F-35 18-36 0B-00 64-00 00-00 12-53 B9-D8>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/145, where GetParam() = 16-byte object <77-EF 5A-C4 80-36 0B-00 64-00 00-00 12-99 1C-BB>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/146, where GetParam() = 16-byte object <77-EF 5A-C4 D8-3C 0B-00 58-02 00-00 5B-FA 8A-88>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/147, where GetParam() = 16-byte object <00-00 00-00 A0-E5 0C-00 00-80 00-00 B2-26 77-21>
[  FAILED  ] crc32/crc32_variant.chorba_sse41/148, where GetParam() = 16-byte object <00-00 00-00 A0-E5 0C-00 00-40 00-00 F0-22 17-E8>

@samrussell
Copy link
Copy Markdown
Contributor

samrussell commented Nov 12, 2025 via email

@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Nov 12, 2025

these look like boundaries where we move to a larger polynomial and have a barely large-enough input buffer that we're probably overrunning, upping the thresholds would be a simple workaround (to >78 and >148 respectively)

If I comment out call to chorba_small_nondestructive_sse2, I get half the number of failures, so something in that code path goes wrong...

@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Nov 12, 2025

these look like boundaries where we move to a larger polynomial and have a barely large-enough input buffer that we're probably overrunning, upping the thresholds would be a simple workaround (to >78 and >148 respectively)

I notice that I have used the wrong operator in the threshold check, >= instead of >, but that only explains a change from 73+ vs 72+. I fail to see any changes in this PR that would require it to increase to 78 or 148.

Now WITHOUT_CHORBA will only disable the crc32_chorba C fallback.

SSE2, SSE41 and pclmul variants will still be able to use their Chorba-algorithm based code,
but their fallback to the generic crc32_chorba C code in SSE2 and SSE41 will be disabled,
reducing their performance on really big input buffers.

Remove the crc32_c function (and its file crc32_c.c), instead use the normal functable
routing to select between crc32_braid and crc32_chorba.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2853356 and aa8069a.

📒 Files selected for processing (17)
  • CMakeLists.txt (0 hunks)
  • Makefile.in (0 hunks)
  • arch/generic/Makefile.in (0 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/generic/crc32_chorba_c.c (1 hunks)
  • arch/generic/generic_functions.h (2 hunks)
  • arch/riscv/crc32_zbc.c (2 hunks)
  • arch/s390/crc32-vx.c (2 hunks)
  • arch/x86/chorba_sse2.c (2 hunks)
  • arch/x86/chorba_sse41.c (3 hunks)
  • arch/x86/crc32_fold_pclmulqdq_tpl.h (1 hunks)
  • arch/x86/crc32_pclmulqdq_tpl.h (0 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • crc32.h (1 hunks)
  • functable.c (3 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (2 hunks)
💤 Files with no reviewable changes (5)
  • Makefile.in
  • arch/generic/Makefile.in
  • CMakeLists.txt
  • arch/x86/crc32_pclmulqdq_tpl.h
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • arch/generic/crc32_chorba_c.c
🧰 Additional context used
🧠 Learnings (22)
📓 Common learnings
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/generic/generic_functions.h
  • test/test_crc32.cc
  • arch/x86/x86_functions.h
  • functable.c
  • crc32.h
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/chorba_sse41.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/generic/generic_functions.h
  • test/test_crc32.cc
  • arch/x86/x86_functions.h
  • functable.c
  • crc32.h
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/chorba_sse41.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/generic/generic_functions.h
  • test/test_crc32.cc
  • arch/x86/x86_functions.h
  • functable.c
  • crc32.h
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/chorba_sse41.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/generic/generic_functions.h
  • test/test_crc32.cc
  • arch/x86/x86_functions.h
  • functable.c
  • crc32.h
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/chorba_sse41.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-12-22T20:40:03.280Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1833
File: test/test_crc32.cc:0-0
Timestamp: 2024-12-22T20:40:03.280Z
Learning: In the crc32_align test class (test/test_crc32.cc), the alignment offset parameter is always guaranteed to be non-negative because the test parameters are controlled and it never makes sense to have a negative offset.

Applied to files:

  • test/test_crc32.cc
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • test/test_crc32.cc
  • arch/x86/x86_functions.h
  • crc32.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-23T16:51:54.545Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.

Applied to files:

  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse41.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-10-08T21:51:45.330Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.

Applied to files:

  • arch/x86/x86_functions.h
  • functable.c
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-10-29T02:22:52.846Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

Applied to files:

  • arch/x86/x86_functions.h
  • functable.c
  • arch/x86/chorba_sse41.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:84-86
Timestamp: 2024-10-08T19:37:14.998Z
Learning: When `INFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR` is not defined, `state->sane` is completely removed from the codebase and does not need to be initialized.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-10-07T21:23:13.401Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.

Applied to files:

  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-12-27T10:06:16.184Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:215-217
Timestamp: 2024-12-27T10:06:16.184Z
Learning: There's a business requirement to use stack allocation instead of heap for the Chorba buffer.

Applied to files:

  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-29T02:22:55.489Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: arch/x86/chunkset_avx512.c:32-34
Timestamp: 2024-10-29T02:22:55.489Z
Learning: In `arch/x86/chunkset_avx512.c`, the `gen_mask` function's `len` parameter cannot exceed 32 because it is only called on the remaining bytes from a 32-byte vector.

Applied to files:

  • arch/x86/chorba_sse41.c
  • arch/riscv/crc32_zbc.c
📚 Learning: 2024-10-07T21:18:37.806Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Applied to files:

  • arch/x86/chorba_sse41.c
📚 Learning: 2025-06-09T16:46:28.468Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.

Applied to files:

  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-04-15T09:30:10.081Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2025-06-10T07:38:03.297Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:103-104
Timestamp: 2025-06-10T07:38:03.297Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, when dist < sizeof(chunk_t), the vl variable intentionally becomes 0, causing the while loop to not execute. This is correct behavior because copying full chunks is not safe when the distance is smaller than chunk size, and the function appropriately falls back to memcpy for handling remaining bytes.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.

Applied to files:

  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:37:54.508Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.

Applied to files:

  • arch/x86/chorba_sse2.c
🧬 Code graph analysis (1)
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)
arch/x86/crc32_pclmulqdq_tpl.h (2)
  • fold_12 (168-208)
  • fold_4 (125-166)
🔇 Additional comments (5)
arch/x86/x86_functions.h (2)

27-27: LGTM: Chorba SSE declarations now unconditional.

Removing the WITHOUT_CHORBA guard allows the SSE2 and SSE4.1 Chorba implementations to remain available even when the generic C fallback is disabled, consistent with the PR's objective.

Also applies to: 37-37


104-105: native_crc32 mapping is consistent between static defines and runtime dispatch.

Verification confirms alignment: arch/x86/x86_functions.h lines 105 and 120 define native_crc32 to crc32_chorba_sse2 and crc32_chorba_sse41 respectively, matching the functable.c runtime assignments at lines 125 and 147. The precedence order (SSE2 base, SSE4.1 override) is consistent across both static preprocessor mappings and dynamic function table dispatch.

arch/x86/chorba_sse2.c (1)

854-856: LGTM: Unified initialization and alignment calculation.

The refactored approach computing c and algn_diff upfront simplifies the control flow and aligns with the PR's goal of unifying dispatch logic.

test/benchmarks/benchmark_crc32.cc (1)

65-73: LGTM: Benchmark registrations updated consistently.

The refactored benchmark registrations properly separate chorba_c, chorba_sse2, and chorba_sse41 variants with appropriate feature guards and runtime CPU checks, aligning with the broader reorganization.

test/test_crc32.cc (1)

294-296: LGTM: Test registrations updated consistently.

The refactored test guards now rely directly on feature macros (X86_SSE2, X86_SSE41) rather than Chorba-specific guards, consistent with the unconditional SSE declarations in x86_functions.h.

Note: PR objectives document 18 failing crc32_variant tests on SSE2/SSE4.1 in release builds, concentrated at threshold boundaries. These failures may be related to the threshold condition flagged in chorba_sse2.c.

Also applies to: 317-322

- Fixed alignment diff calculation in crc32_chorba.
- Fixed length check to happen early, avoiding extra branches for too short lengths,
this also allows removing one function call to crc32_braid_internal to handle those.
Gbench shows ~0.15-0.25ns saved per call for lengths shorter than CHORBA_SMALL_THRESHOLD.
- Avoid calculating aligned len if buffer is already aligned
@Dead2 Dead2 changed the title Reorganize Chorba activation. Reorganize Chorba activation and refactor Chorba dispatch Nov 12, 2025

Z_INTERNAL uint32_t crc32_braid_internal(uint32_t c, const uint8_t *buf, size_t len);
#if OPTIMAL_CMP == 64
# define CHORBA_SMALL_THRESHOLD 72
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need two spaces after #.

@Dead2 Dead2 marked this pull request as draft November 13, 2025 19:06
@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Nov 13, 2025

Closing in favor of #2004 and #2006

@Dead2 Dead2 closed this Nov 13, 2025
@Dead2 Dead2 deleted the chorba-option branch November 29, 2025 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Improving maintainability or removing code. optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants