use memcpy/memcmp for unaligned reads by matijaskala · Pull Request #934 · zlib-ng/zlib-ng

matijaskala · 2021-04-24T00:29:01Z

that way any future optimizers won't make false
assumptions that those reads are actually aligned

compiler optimizes out memcpy/memcmp calls so
generated assembly code remains identical

mtl1979 · 2021-04-24T00:37:25Z

-    *(uint32_t *)(&s->pending_buf[s->pending]) = (lld >> 32) & 0xffffffff;
-    s->pending += 4;
+    memcpy(&s->pending_buf[s->pending], lld, 8);
+    s->pending += 8;


This change doesn't make sense... First two cases would need to be combined for little-endian systems.

Has this been resolved? Seems like if the compiler doesn't support 64 bit unaligned it can split it up into 2 x 32-bit reads or perhaps just do memcpy.

mtl1979 · 2021-04-24T03:52:04Z

-        uint32_t sv = *(uint32_t *)src0;
-        uint32_t mv = *(uint32_t *)src1;
+        uint32_t sv;
+        uint32_t mv;


As both variables have same type and they are not assigned a value that depend on each other, they can be declared on same line. Only when the variables are members of a struct, they should be declared on separate lines.

mtl1979 · 2021-04-24T03:52:17Z

-        uint64_t sv = *(uint64_t *)src0;
-        uint64_t mv = *(uint64_t *)src1;
+        uint64_t sv;
+        uint64_t mv;


As both variables have same type and they are not assigned a value that depend on each other, they can be declared on same line.

Looks like this was resolved?

mtl1979 · 2021-04-24T03:53:22Z

-    put_byte(s, ((dw >> 16) & 0xff));
-    put_byte(s, ((dw >> 8) & 0xff));
-    put_byte(s, (dw & 0xff));
+#  error No endian defined


Should be No endianess defined.

I copied the text from functable.c

I haven't reviewed all files... There is three long-time contributors and only two need to review a pull request before it gets merged.

codecov · 2021-04-26T18:33:13Z

Codecov Report

Merging #934 (2f1f56d) into develop (92107a9) will decrease coverage by 1.48%.
The diff coverage is 68.96%.

❗ Current head 2f1f56d differs from pull request most recent head af7c8ad. Consider uploading reports for the commit af7c8ad to get more accurate results

@@             Coverage Diff             @@
##           develop     #934      +/-   ##
===========================================
- Coverage    77.95%   76.47%   -1.49%     
===========================================
  Files           85       78       -7     
  Lines         8700     8250     -450     
  Branches      1396     1354      -42     
===========================================
- Hits          6782     6309     -473     
- Misses        1369     1409      +40     
+ Partials       549      532      -17

Flag	Coverage Δ
macos_clang	`68.26% <8.00%> (-0.80%)`	⬇️
macos_gcc	`67.18% <0.00%> (-0.62%)`	⬇️
ubuntu_clang	`69.30% <40.00%> (+1.16%)`	⬆️
ubuntu_clang_debug	`68.71% <40.00%> (+1.26%)`	⬆️
ubuntu_clang_inflate_allow_invalid_dist	`69.05% <40.00%> (+1.13%)`	⬆️
ubuntu_clang_inflate_strict	`69.30% <40.00%> (+1.16%)`	⬆️
ubuntu_clang_mmap	`69.30% <40.00%> (+1.15%)`	⬆️
ubuntu_clang_msan	`69.30% <40.00%> (+1.16%)`	⬆️
ubuntu_clang_pigz	`35.33% <52.00%> (+3.42%)`	⬆️
ubuntu_clang_pigz_no_optim	`37.79% <56.00%> (-0.25%)`	⬇️
ubuntu_clang_pigz_no_threads	`34.98% <52.00%> (+3.42%)`	⬆️
ubuntu_clang_reduced_mem	`?`
ubuntu_gcc	`68.58% <44.44%> (+1.41%)`	⬆️
ubuntu_gcc_aarch64	`68.78% <10.00%> (-1.63%)`	⬇️
ubuntu_gcc_aarch64_compat_no_opt	`67.01% <0.00%> (-1.78%)`	⬇️
ubuntu_gcc_aarch64_no_acle	`67.61% <0.00%> (-1.86%)`	⬇️
ubuntu_gcc_aarch64_no_neon	`67.96% <0.00%> (-1.73%)`	⬇️
ubuntu_gcc_armhf	`68.76% <18.18%> (-1.63%)`	⬇️
ubuntu_gcc_armhf_compat_no_opt	`66.98% <0.00%> (-1.79%)`	⬇️
ubuntu_gcc_armhf_no_acle	`68.93% <18.18%> (-1.70%)`	⬇️
ubuntu_gcc_armhf_no_neon	`69.16% <18.18%> (-1.66%)`	⬇️
ubuntu_gcc_armsf	`68.76% <16.66%> (-1.64%)`	⬇️
ubuntu_gcc_armsf_compat_no_opt	`66.98% <0.00%> (-1.79%)`	⬇️
ubuntu_gcc_compat_no_opt	`68.42% <0.00%> (-1.73%)`	⬇️
ubuntu_gcc_mingw_i686	`0.00% <0.00%> (ø)`
ubuntu_gcc_mingw_x86_64	`0.00% <0.00%> (ø)`
ubuntu_gcc_no_avx2	`68.64% <0.00%> (-1.75%)`	⬇️
ubuntu_gcc_no_ctz	`?`
ubuntu_gcc_no_ctzll	`?`
ubuntu_gcc_no_pclmulqdq	`66.59% <0.00%> (+1.25%)`	⬆️
ubuntu_gcc_no_sse2	`67.82% <0.00%> (-0.11%)`	⬇️
ubuntu_gcc_no_sse4	`66.79% <0.00%> (-0.51%)`	⬇️
ubuntu_gcc_o3	`68.17% <0.00%> (+0.36%)`	⬆️
ubuntu_gcc_osb	`70.88% <0.00%> (+1.47%)`	⬆️
ubuntu_gcc_pigz	`35.17% <58.82%> (+3.20%)`	⬆️
ubuntu_gcc_pigz_aarch64	`?`
ubuntu_gcc_ppc	`68.86% <100.00%> (+0.52%)`	⬆️
ubuntu_gcc_ppc64	`69.73% <100.00%> (-1.83%)`	⬇️
ubuntu_gcc_ppc64le	`68.43% <10.00%> (-1.52%)`	⬇️
ubuntu_gcc_ppc_no_power8	`?`
ubuntu_gcc_s390x	`67.75% <100.00%> (-2.44%)`	⬇️
ubuntu_gcc_sparc64	`70.02% <100.00%> (-2.00%)`	⬇️
win64_gcc	`70.09% <13.04%> (∅)`
win64_gcc_compat_no_opt	`73.50% <25.00%> (∅)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
chunkset.c	`100.00% <ø> (+40.90%)`	⬆️
compare258.c	`54.63% <30.76%> (-34.03%)`	⬇️
deflate.h	`39.13% <100.00%> (+5.79%)`	⬆️
insert_string_tpl.h	`100.00% <100.00%> (ø)`
match_tpl.h	`100.00% <100.00%> (+24.61%)`	⬆️
zutil_p.h	`40.00% <0.00%> (-31.43%)`	⬇️
inftrees.c	`82.73% <0.00%> (-15.11%)`	⬇️
arch/s390/dfltcc_detail.h	`0.00% <0.00%> (-12.50%)`	⬇️
deflate_stored.c	`86.20% <0.00%> (-9.20%)`	⬇️
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92107a9...af7c8ad. Read the comment docs.

nmoinvaz · 2021-06-12T22:08:43Z

I think this PR is good. I'm not sure what is left to do on it. Benchmarks? It allows the compiler to determine if unaligned read/writes are allowed by using memcpy/memcmp. Also gets rid of my convolution I had in chunkset.c.

Dead2 · 2021-06-13T08:07:40Z

It looked to me like @mtl1979 had requested changes, but perhaps he can clarify.

mtl1979 · 2021-06-13T10:51:19Z

@@ -33,7 +33,7 @@ Z_INTERNAL Pos QUICK_INSERT_STRING(deflate_state *const s, const uint32_t str) {
    uint32_t val, hm, h = 0;

 #ifdef UNALIGNED_OK


Instead of checking if UNALIGNED_OK is defined, we would need to check if system is little-endian or big-endian.

mtl1979 · 2021-06-13T10:51:31Z

@@ -68,7 +68,7 @@ Z_INTERNAL void INSERT_STRING(deflate_state *const s, const uint32_t str, uint32
        uint32_t val, hm, h = 0;

 #ifdef UNALIGNED_OK


Instead of checking if UNALIGNED_OK is defined, we would need to check if system is little-endian or big-endian.

mtl1979

Seems work-in-progress. Needs some logic changes.

nmoinvaz · 2021-06-19T05:17:06Z

This PR seems ok to me. Just should probably be benchmarked so to make sure no degradation.

nmoinvaz · 2021-06-21T06:30:16Z

Using -DWITH_OPTIM=ON

ZLIB-NG `834e7d8`

 Tool: minigzip-ng-develop.exe Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.809%      1.015/1.022/1.030/0.005        0.660/0.670/0.674/0.003      104,067,822
 2     35.391%      1.678/1.700/1.710/0.009        0.673/0.686/0.692/0.005       82,195,416
 3     33.698%      2.320/2.338/2.348/0.009        0.655/0.661/0.666/0.003       78,262,436
 4     32.989%      2.618/2.639/2.649/0.007        0.637/0.648/0.654/0.005       76,616,183
 5     32.483%      2.781/2.803/2.815/0.009        0.638/0.644/0.649/0.004       75,441,984
 6     32.318%      3.392/3.414/3.425/0.009        0.632/0.640/0.645/0.004       75,058,801
 7     32.061%      4.654/4.675/4.688/0.010        0.633/0.644/0.650/0.005       74,461,300
 8     31.978%      7.149/7.192/7.212/0.014        0.635/0.643/0.650/0.005       74,269,032
 9     31.961%      9.729/9.761/9.784/0.016        0.638/0.645/0.650/0.003       74,230,342

 avg1  34.188%                        3.949                          0.654
 tot                               1066.332                        176.447      714,603,316

ZLIB-NG PR 934 `2f1f56d`

 Tool: minigzip-ng-pr934.exe Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.809%      1.018/1.031/1.039/0.006        0.665/0.688/0.739/0.026      104,067,822
 2     35.391%      1.685/1.706/1.721/0.010        0.676/0.688/0.694/0.005       82,195,416
 3     33.698%      2.329/2.375/2.436/0.032        0.652/0.666/0.676/0.007       78,262,436
 4     32.989%      2.628/2.661/2.679/0.014        0.639/0.652/0.658/0.006       76,616,183
 5     32.483%      2.781/2.840/2.915/0.042        0.635/0.647/0.653/0.005       75,441,984
 6     32.318%      3.410/3.450/3.532/0.039        0.632/0.643/0.650/0.005       75,058,801
 7     32.061%      4.684/4.743/4.833/0.055        0.630/0.647/0.654/0.006       74,461,300
 8     31.978%      7.186/7.265/7.379/0.068        0.637/0.646/0.652/0.004       74,269,032
 9     31.961%      9.765/9.845/9.977/0.079        0.634/0.648/0.655/0.005       74,230,342

 avg1  34.188%                        3.991                          0.659
 tot                               1077.460                        177.822      714,603,316

nmoinvaz · 2021-06-21T15:56:53Z

Second run using -DWITH_OPTIM=ON:

ZLIB-NG `834e7d8`

 Tool: minigzip-ng-develop.exe Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.809%      1.016/1.025/1.032/0.005        0.662/0.669/0.674/0.004      104,067,822
 2     35.391%      1.679/1.696/1.704/0.007        0.675/0.683/0.688/0.003       82,195,416
 3     33.698%      2.313/2.338/2.348/0.009        0.647/0.660/0.666/0.004       78,262,436
 4     32.989%      2.618/2.638/2.647/0.008        0.641/0.648/0.654/0.004       76,616,183
 5     32.483%      2.789/2.801/2.811/0.006        0.637/0.643/0.648/0.004       75,441,984
 6     32.318%      3.388/3.407/3.420/0.009        0.633/0.641/0.647/0.004       75,058,801
 7     32.061%      4.643/4.668/4.682/0.011        0.636/0.645/0.650/0.004       74,461,300
 8     31.978%      7.152/7.180/7.195/0.011        0.630/0.643/0.649/0.005       74,269,032
 9     31.961%      9.719/9.752/9.764/0.011        0.637/0.646/0.652/0.004       74,230,342

 avg1  34.188%                        3.945                          0.653
 tot                               1065.164                        176.269      714,603,316

ZLIB-NG `2f1f56d`

 Tool: minigzip-ng-pr934.exe Levels: 1-9
 Runs: 70         Trim worst: 40

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.809%      1.013/1.022/1.027/0.004        0.658/0.670/0.677/0.005      104,067,822
 2     35.391%      1.682/1.699/1.708/0.006        0.672/0.683/0.687/0.004       82,195,416
 3     33.698%      2.328/2.344/2.353/0.006        0.650/0.661/0.667/0.004       78,262,436
 4     32.989%      2.627/2.645/2.655/0.009        0.639/0.649/0.653/0.003       76,616,183
 5     32.483%      2.792/2.808/2.815/0.006        0.635/0.643/0.647/0.004       75,441,984
 6     32.318%      3.396/3.415/3.424/0.008        0.630/0.641/0.646/0.004       75,058,801
 7     32.061%      4.667/4.694/4.704/0.008        0.633/0.642/0.648/0.004       74,461,300
 8     31.978%      7.185/7.207/7.217/0.008        0.634/0.643/0.649/0.004       74,269,032
 9     31.961%      9.752/9.777/9.792/0.011        0.637/0.643/0.649/0.003       74,230,342

 avg1  34.188%                        3.957                          0.653
 tot                               1068.295                        176.240      714,603,316

Dead2 · 2021-06-25T19:01:10Z

This needs a little rebase.

Dead2 · 2021-06-25T19:45:14Z

Baseline `21050e0` x86-64

   text    data     bss     dec     hex filename
 115241    1416      32  116689   1c7d1 libz-ng.so.2

Activated multiple generated file mode. Source: ../silesia-small.tar
Level 1: /tmp/deflatebench-1.tmp  465.2 MiB   487,825,920 B
Level 2: /tmp/deflatebench-2.tmp  270.1 MiB   283,253,760 B
Level 3: /tmp/deflatebench-3.tmp  225.1 MiB   236,044,800 B
Level 4: /tmp/deflatebench-4.tmp  180.1 MiB   188,835,840 B
Level 5: /tmp/deflatebench-5.tmp  165.1 MiB   173,099,520 B
Level 6: /tmp/deflatebench-6.tmp  135.1 MiB   141,626,880 B
Level 7: /tmp/deflatebench-7.tmp  105.1 MiB   110,154,240 B
Level 8: /tmp/deflatebench-8.tmp   90.0 MiB    94,417,920 B
Level 9: /tmp/deflatebench-9.tmp   75.0 MiB    78,681,600 B

 Tool: minigzip   Levels: 1-9
 Runs: 40         Trim worst: 20

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.165%      2.892/2.928/2.946/0.017        1.661/1.669/1.678/0.005      264,231,515
 2     43.875%      2.974/3.002/3.015/0.010        0.984/1.002/1.017/0.010      124,277,644
 3     42.382%      3.003/3.018/3.032/0.009        0.786/0.812/0.825/0.011      100,039,412
 4     41.669%      2.718/2.737/2.743/0.007        0.622/0.634/0.645/0.006       78,685,354
 5     41.215%      2.868/2.911/2.926/0.016        0.568/0.578/0.585/0.006       71,342,944
 6     41.036%      2.763/2.776/2.785/0.007        0.441/0.469/0.478/0.009       58,118,026
 7     40.781%      2.717/2.735/2.742/0.007        0.358/0.367/0.371/0.004       44,921,649
 8     40.702%      2.996/3.009/3.017/0.007        0.297/0.313/0.321/0.007       38,429,965
 9     40.698%      2.783/2.795/2.801/0.005        0.247/0.259/0.266/0.006       32,021,745

 avg1  42.947%                        2.879                          0.678
 tot                                518.191                        122.066      812,068,254

PR #934 `2f1f56d` x86-64

   text    data     bss     dec     hex filename
 116330    1424      32  117786   1cc1a libz-ng.so.2

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.165%      2.831/2.882/2.900/0.019        1.655/1.676/1.685/0.008      264,231,515
 2     43.875%      3.382/3.393/3.407/0.007        0.990/1.006/1.017/0.010      124,277,644
 3     42.382%      3.550/3.572/3.587/0.011        0.775/0.810/0.825/0.013      100,039,412
 4     41.669%      3.482/3.505/3.516/0.010        0.621/0.637/0.648/0.009       78,685,354
 5     41.215%      3.640/3.676/3.687/0.012        0.550/0.579/0.590/0.010       71,342,944
 6     41.036%      3.728/3.744/3.755/0.008        0.457/0.470/0.477/0.005       58,118,026
 7     40.781%      4.407/4.415/4.424/0.005        0.351/0.366/0.375/0.006       44,921,649
 8     40.702%      5.151/5.169/5.177/0.007        0.301/0.312/0.317/0.005       38,429,965
 9     40.698%      4.891/4.900/4.906/0.004        0.250/0.261/0.266/0.004       32,021,745

 avg1  42.947%                        3.918                          0.680
 tot                                705.152                        122.341      812,068,254

This PR unfortunately makes compression 36% slower than the baseline, and it adds quite a bit to the compiled code size (text).

This was tested with GCC 4.8.5 and Glibc 2.17

mscastanho · 2021-06-28T13:13:43Z

This was tested with GCC 4.8.5 and Glibc 2.17

@Dead2 Did you have a chance to test this with a newer GCC+glibc? glibc 2.17 caught my attention, as it is ~8y old. The implementation of core string functions (like memcmp and memcpy) are frequently revisited on glibc, so I wouldn't be surprised if faster implementations are available on newer glibc versions.

Of course this wouldn't change the situation for people using zlib-ng with an older toolchain. But I just wanted to add more info to the discussion anyways =)

mtl1979 · 2021-06-28T13:21:01Z

We don't officially support anything that is 8 years old... Even Visual C++ 2013 is too old for us.. GCC versions from 4.8 and 4.9 had optimizer bugs that I have mentioned a long time ago, not to mention they don't support instruction sets introduced after their release.

On my development machine I have gcc 6.5 and clang 6.0 as oldest version and Visual C++ 2015 for Windows builds. For more recent compiler tests I use gcc 11, clang 12 and Visual C++ 2022.

Dead2 · 2021-06-28T13:39:20Z

@mtl1979 I have officially stated multiple times that RHEL 7 and derived distros are to be supported as a lower-end. The above test was done on such a machine. The compiler not supporting newer features is no problem if the features are optional, but this PR would be much harder to make optional. MSVC is a (very) special case due to them not wanting to really support C until recently.

@mscastanho I suspect that is the problem (that is why I also posted those details), but I have not had the time to run any more tests yet.
But I suspect this could also cause problems for those using uclibc/musl or other clibs, and that makes it all that much harder.

mtl1979 · 2021-06-28T13:58:21Z

@Dead2 I don't care what is the oldest operating system we (as contributors or developers of zlib-ng) use or support...

Officially we support almost all 32-bit and 64-bit operating systems that can run either gcc, clang or Visual C++ compiler... Other compilers might work, but due to bugs or missing features all compilers that are more recent than 6 years or so might not be usable, for example early builds of Visual C++ 2017 and 2019 which were unusable due to critical bugs that were eventually fixed in later builds.

RHEL 7 has gcc 5-8 as optional packages.

I still run BeOS 5.0.3 and Windows Vista on my older computers, both are quite old operating systems.

Dead2 · 2021-06-29T20:46:49Z

Baseline `21050e0` aarch64

   text    data     bss     dec     hex filename
 105234    1576      16  106826   1a14a libz-ng.so.2

Level 1: /tmp/deflatebench-1.tmp   75.0 MiB    78,681,600 B
Level 2: /tmp/deflatebench-2.tmp   45.0 MiB    47,208,960 B
Level 3: /tmp/deflatebench-3.tmp   45.0 MiB    47,208,960 B
Level 4: /tmp/deflatebench-4.tmp   30.0 MiB    31,472,640 B
Level 5: /tmp/deflatebench-5.tmp   30.0 MiB    31,472,640 B
Level 6: /tmp/deflatebench-6.tmp   30.0 MiB    31,472,640 B
Level 7: /tmp/deflatebench-7.tmp   15.0 MiB    15,736,320 B
Level 8: /tmp/deflatebench-8.tmp   15.0 MiB    15,736,320 B
Level 9: /tmp/deflatebench-9.tmp   15.0 MiB    15,736,320 B

 Tool: minigzip   Levels: 1-9
 Runs: 40         Trim worst: 20

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.166%      2.729/2.769/2.790/0.019        0.974/0.998/1.014/0.014       42,618,858
 2     43.879%      2.821/2.856/2.872/0.012        0.583/0.601/0.613/0.009       20,714,938
 3     42.383%      3.722/3.745/3.766/0.016        0.568/0.588/0.600/0.010       20,008,755
 4     41.660%      2.782/2.802/2.818/0.011        0.350/0.376/0.390/0.012       13,111,391
 5     41.216%      3.001/3.035/3.052/0.014        0.339/0.371/0.388/0.013       12,971,746
 6     41.039%      3.511/3.541/3.555/0.013        0.345/0.369/0.379/0.010       12,916,025
 7     40.778%      2.260/2.273/2.286/0.009        0.170/0.181/0.187/0.004        6,416,924
 8     40.704%      2.785/2.805/2.817/0.010        0.149/0.177/0.191/0.011        6,405,244
 9     40.696%      2.997/3.021/3.034/0.009        0.165/0.180/0.190/0.008        6,404,084

 avg1  42.947%                        2.983                          0.427
 tot                                536.961                         76.853      141,567,965

PR #934 aarch64

   text    data     bss     dec     hex filename
 105234    1576      16  106826   1a14a libz-ng.so.2

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     54.166%      2.708/2.764/2.791/0.024        0.938/0.997/1.014/0.017       42,618,858
 2     43.879%      2.820/2.850/2.870/0.014        0.582/0.602/0.616/0.010       20,714,938
 3     42.383%      3.717/3.750/3.774/0.019        0.559/0.585/0.593/0.008       20,008,755
 4     41.660%      2.775/2.800/2.819/0.015        0.357/0.378/0.387/0.008       13,111,391
 5     41.216%      3.008/3.038/3.065/0.017        0.350/0.372/0.381/0.008       12,971,746
 6     41.039%      3.510/3.533/3.554/0.013        0.347/0.370/0.383/0.012       12,916,025
 7     40.778%      2.252/2.279/2.292/0.011        0.165/0.180/0.188/0.007        6,416,924
 8     40.704%      2.772/2.798/2.814/0.013        0.157/0.176/0.184/0.007        6,405,244
 9     40.696%      3.000/3.025/3.035/0.009        0.160/0.174/0.186/0.006        6,404,084

 avg1  42.947%                        2.982                          0.426
 tot                                536.761                         76.709      141,567,965

This is aarch64 and GCC 8.3.0. So no slowdown there.

Dead2 · 2021-10-12T12:29:18Z

Proposal for discussion:
What do you think about making our own zmemcpy/zmemcmp that on older/unoptimized libc will default to the current way of doing things, but on modern/optimized libc/gcc it is just an alias for the built-in.

Implementing that should be pretty easy, performance would be best of both worlds, cleaner code than today, easily maintainable and also easy to get rid of when/if we decide to in a few years.
The biggest challenge is probably making the detection/selection logic that decides when to use what.

mtl1979 · 2021-10-12T12:35:06Z

@Dead2 Because memcpy() is implemented inside the compiler, making own version that would actually be better would require pretty much rewriting the code generation part of compiler... This is because different architectures and even processors have different internal bus widths and compiler can choose the best implementation depending on both bus width and data length/alignment.

Dead2 · 2021-10-12T12:48:54Z

@mtl1979 I am not talking about implementing full memcpy/memcmp, but rather macros that resolve into the same cast+assignments and such that we use today. My proposed names might be misleading though.

Something like this:

// Pseudocode
if OLD:
    define zmemcmp_2(str1, str2) *(uint16_t *)str1 != *(uint16_t)str2
else:
    define zmemcmp_2(str1, str2) memcmp(str1, str2, 2)
endif

mtl1979 · 2021-10-12T12:56:52Z

@Dead2 That would just overlap what we already use UNALIGNED_OK for... Implementing memcmp() is a lot easier than implementing memcpy().

that way any future optimizers won't make false assumptions that those reads are actually aligned compiler optimizes out memcpy/memcmp calls so generated assembly code remains identical keep old behaviour with gcc <8

nmoinvaz · 2021-10-12T18:54:27Z

@Dead2 I like your idea about macros.

Dead2 · 2022-01-08T17:29:44Z

This is obsoleted by #1086 and #1102

mtl1979 reviewed Apr 24, 2021

View reviewed changes

matijaskala requested a review from mtl1979 April 24, 2021 01:18

mtl1979 reviewed Apr 24, 2021

View reviewed changes

Dead2 added the Next Devel Targeting next devel release (does not mean accepted) label Apr 27, 2021

Dead2 requested a review from mtl1979 June 13, 2021 08:07

mtl1979 reviewed Jun 13, 2021

View reviewed changes

mtl1979 suggested changes Jun 13, 2021

View reviewed changes

Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Jun 25, 2021

use memcpy/memcmp for unaligned reads

af7c8ad

that way any future optimizers won't make false assumptions that those reads are actually aligned compiler optimizes out memcpy/memcmp calls so generated assembly code remains identical keep old behaviour with gcc <8

nmoinvaz mentioned this pull request Oct 27, 2021

AVX2 optimizations #1053

Merged

nmoinvaz mentioned this pull request Dec 28, 2021

Use memcpy/memcmp for unaligned reads. #1086

Closed

Dead2 closed this Jan 8, 2022

		@@ -33,7 +33,7 @@ Z_INTERNAL Pos QUICK_INSERT_STRING(deflate_state *const s, const uint32_t str) {
		uint32_t val, hm, h = 0;

		#ifdef UNALIGNED_OK

		@@ -68,7 +68,7 @@ Z_INTERNAL void INSERT_STRING(deflate_state *const s, const uint32_t str, uint32
		uint32_t val, hm, h = 0;

		#ifdef UNALIGNED_OK

Uh oh!

Conversation

matijaskala commented Apr 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nmoinvaz commented Jun 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dead2 commented Jun 13, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtl1979 left a comment

Choose a reason for hiding this comment

Uh oh!

nmoinvaz commented Jun 19, 2021

Uh oh!

nmoinvaz commented Jun 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ZLIB-NG 834e7d8

ZLIB-NG PR 934 2f1f56d

Uh oh!

nmoinvaz commented Jun 21, 2021

ZLIB-NG 834e7d8

ZLIB-NG 2f1f56d

Uh oh!

Dead2 commented Jun 25, 2021

Uh oh!

Dead2 commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Baseline 21050e0 x86-64

PR #934 2f1f56d x86-64

Uh oh!

mscastanho commented Jun 28, 2021

Uh oh!

mtl1979 commented Jun 28, 2021

Uh oh!

Dead2 commented Jun 28, 2021

Uh oh!

mtl1979 commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dead2 commented Jun 29, 2021

Baseline 21050e0 aarch64

PR #934 aarch64

Uh oh!

Dead2 commented Oct 12, 2021

Uh oh!

mtl1979 commented Oct 12, 2021

Uh oh!

Dead2 commented Oct 12, 2021

Uh oh!

mtl1979 commented Oct 12, 2021

Uh oh!

nmoinvaz commented Oct 12, 2021

Uh oh!

Dead2 commented Jan 8, 2022

Uh oh!

Reviewers

codecov Bot commented Apr 26, 2021 •

edited

Loading

nmoinvaz commented Jun 12, 2021 •

edited

Loading

nmoinvaz commented Jun 21, 2021 •

edited

Loading

ZLIB-NG `834e7d8`

ZLIB-NG PR 934 `2f1f56d`

ZLIB-NG `834e7d8`

ZLIB-NG `2f1f56d`

Dead2 commented Jun 25, 2021 •

edited

Loading

Baseline `21050e0` x86-64

PR #934 `2f1f56d` x86-64

mtl1979 commented Jun 28, 2021 •

edited

Loading

Baseline `21050e0` aarch64