CVE-2016-9841 by azherebtsov · Pull Request #36 · cloudflare/zlib

azherebtsov · 2023-01-04T19:29:24Z

Cherry-picked from: madler@9aaec95

This is (CVE-2016-9841)[https://github.com/advisories/GHSA-5f8r-846v-423w] fix.

on x86-64 architecture. Speedup the performance by some 7% on my linux box with corei7 archiecture. The original loop is legal to be vectorized; gcc 4.7.* and 4.8.* somehow fail to catch this case. There are still have room to squeeze from the vectorized code. However, since these loops now account for about 1.5% of execution time, it is not worthwhile to sequeeze the performance via hand-writing assembly. The original loops are guarded with "#ifdef NOT_TWEAK_COMPILER". By default, the modified version is picked up unless the code is compiled explictly with -DNOT_TWEAK_COMPILER.

… the speedup arises from: o. Remove the conditional branch in the loop o. Remove some indirection memory accesses: The memory accesses to "s->prev_length" s->strstart" cannot be promoted to register because the compiler is not able to disambiguate them with store-operation in INSERT_STRING() o. Convert non-countable loop to countable loop. I'm not sure if this change really contribute, in general, countable loop is lots easier to optimized than non-countable loop.

See the header of "contrib/amd64/longest-match.inc" for details

conditional-directive for target architecture.

…hange sees about 5% speedup to compression at level of 3, 5, 6. Level-1/9 sees significantly higher/lower speedup. Decompression shows about 20% speedup. The file "contrib/amd64/crc32-pclmul_asm.S" is stolen from linux kerel 3.14 with minor change. In the near future, this file will be replaced with C implementation.

(about 0% to 3% depending on the compression level and input). I guess the speedup likely arises from following facts: 1) "s->pending" now is loaded once, and stored once. In the original implementation, it needs to be loaded and stored twice as the compiler isn't able to disambiguate "s->pending" and "s->pending_buf[]" 2) better code generations: 2.1) no instruction are needed for extracting two bytes from a short. 2.2) need less registers 2.3) stores to adjacent bytes are merged into a single store, albeit at the cost of penalty of potentially unaligned access.

…ast 4 bytes, instead the 3 the spec allows. It allows for faster and more effective compression because there are less hash collisions, and false matches are discarded faster. In theory it could lead to worse compression in some cases, but for the most common files there is an improvement for all levels both in speed and in compression ratio.

It changes the hash and longest match functions to always match at least 4 bytes, instead the 3 the spec allows. It allows for faster and more effective compression because there are less hash collisions, and false matches are discarded faster. In theory it could lead to worse compression in some cases, but for the most common files there is an improvement for all levels both in speed and in compression ratio.

…ons we do not require (FASTEST, 64K, NOT_TWEAK_COMPILER). Remove contribs we don't require, integrate the hash func and longest_match funcs into deflate.c. Improve output buffer performance by using 64 bit buffer instead 16 bit. All in all ~10% performance gain for lvl 4 and ~5% performance gain for lvl 5

Refactor trees.c, deflate.c and deflate.h for gcc x86-64.

see http://public.kitware.com/Bug/view.php?id=11240

1) MSVC doesn't have __builtin_expect. Just ignore likely/unlikely and hope the compiler optimizes for the best. 2) MSVC also doesn't have __builtin_ctzl. Implement using _BitScanForward().

-MSVC (2010) only supports C89 for c files. All variable declarations need to be moved to the top of blocks. -MSVC also doesn't support GCC's typeof. No real need to call it in deflate.c anyway, so I removed it.

Changes to allow compilation using MSVC 2010

* Add aarch64 support, with over 2x faster compression than generic zlib * Support for OS X * Support for clang * Better configuration (no longer need to pass CFLAGS)

Support for aarch64 with crc extension

* faster crc32 of last <8 bytes on aarch64

On FreeBSD amd64 is named... 'amd64', not 'x86_64'. This is what 'uname -m' prints.

* .type and .size are ELF/COFF specific so drop them * .globl + .hidden equivalent for macOS is .private_extern * symbol name are not mangled on macOS, so we need to prefix _

* PS-237 Don't disable target optimizations when cross-compiling. AFAICT there's no reason to check the host platform here. The test only runs the compiler; it doesn't execute the output. * Add misc intermediate files to .gitignore.

configure mostly generates this file by copying from zconf.h.in.

* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc * Westmere detection * Update configure for Westmere (InsightSoftwareConsortium/ITK#416) * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416) * remove gpl code * Improve support for compiling using Windows (https://github.com/ningfei/zlib) * Import ucm.cmake from https://github.com/ningfei/zlib * crc32_simd as separate file (#18) * atomic and SKIP_CPUID_CHECK (#18) * Remove unused code * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25) * Atomic does not compile on Ubuntu 14.04 * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution. * Restore Windows compilation using "nmake -f win32\Makefile.msc" * zconf.h required for Windows nmake * support crc intrinsics for ARM CPUs * Requested changes (#19)

Add SIMD NEON implementation of the adler32 checksum. Inflate speed increased by ~10% with the Silesia corpus for ARMv8. Tested with a modified zpipe.c to run inflate, deflate for a stream of size 100M. Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib. 17bbb3d73c84 ("zlib adler_simd.c")

…e) (#22) * When windowBits is zero, the size of the sliding window comes from the zlib header. The allowed values of the four-bit field are 0..7, but when windowBits is zero, values greater than 7 are permitted and acted upon, resulting in large, mostly unused memory allocations. This fix rejects such invalid zlib headers. * Add option to not compute or check check values. The undocumented (except in these commit comments) function inflateValidate(strm, check) can be called after an inflateInit(), inflateInit2(), or inflateReset2() with check equal to zero to turn off the check value (CRC-32 or Adler-32) computation and comparison. Calling with check not equal to zero turns checking back on. This should only be called immediately after the init or reset function. inflateReset() does not change the state, so a previous inflateValidate() setting will remain in effect. This also turns off validation of the gzip header CRC when present. This should only be used when a zlib or gzip stream has already been checked, and repeated decompressions of the same stream no longer need to be validated. * This verifies that the state has been initialized, that it is the expected type of state, deflate or inflate, and that at least the first several bytes of the internal state have not been clobbered. * Use macros to represent magic numbers This combines two patches which help in improving the readability and maintainability of the code by making magic numbers into #defines. Based on Chris Blume's (cblume@chromium) patches for zlib chromium: 8888511 - "Zlib: Use defines for inffast" b9c1566 - "Share inffast names in zlib" These patches are needed when introducing chunk SIMD NEON enchancements. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> * Port inflate chunk SIMD NEON patches for cloudflare Based on 2 patches from zlib chromium fork: * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com) 3060dcb - "zlib: inflate using wider loads and stores" * Noel Gordon (noel@chromium.org) 64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy The two patches combined provide around 5-25% increase in inflate performance, based on the workload, when checked with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> * Increase inflate speed: read decode input into a uint64_t Update the chunk-copy code with a wide input data reader, which consumes input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the wide reader. Based on Noel Gordon's (noel@chromium.org) patch for the zlib chromium fork 8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t" This patch provides 7-10% inflate performance improvement when tested with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> Co-authored-by: Mark Adler <madler@alumni.caltech.edu>

* Add SIMD SSSE3 implementation of the adler32 checksum Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib. 17bbb3d73c84 ("zlib adler_simd.c") Signed-off-by: Janakarajan Natarajan <janakan@amazon.com> * Port inflate chunk SIMD SSE2 improvements for cloudflare Based on 2 patches from zlib chromium fork: * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com) 3060dcb - "zlib: inflate using wider loads and stores" * Noel Gordon (noel@chromium.org) 64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy The improvement in inflate performance is around 15-35%, based on the workload, when checked with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

* Fix architecture macro on MSVC. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Update '.gitignore'. Also remove 'zconf.h', it will be generated during building. * Add 'build' subdir to .gitignore * Fix architecture macro on MSVC. * Update CMakeLists.txt Clean up. Add option to set runtime (useful for MSVC). Add 'SSE4.2' and 'AVX' option. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Define '_LARGEFILE64_SOURCE' by default in CMakeLists.txt. Also remove checking fseeko. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Set 'CMAKE_MACOSX_RPATH' to TRUE. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Update SSE4.2 and PCLMUL setting in CMakeLists.txt Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Add 'HAVE_HIDDEN' to CMakeLists.txt Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Fix building dll on MSVC. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Fix zlib.pc file generation. Signed-off-by: Ningfei Li <ningfei.li@gmail.com> * Refine cmake settings. * Change cmake function to lowercase. * Fix zlib.pc * Fix crc32_pclmul_le_16 type. * Set POSITION_INDEPENDENT_CODE flag to ON by default. * Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc * Westmere detection * Update configure for Westmere (InsightSoftwareConsortium/ITK#416) * use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416) * remove gpl code * Improve support for compiling using Windows (https://github.com/ningfei/zlib) * Import ucm.cmake from https://github.com/ningfei/zlib * crc32_simd as separate file (#18) * atomic and SKIP_CPUID_CHECK (#18) * Suppress some MSVC warnings. * Remove unused code * Fix ucm_set_runtime when only C is enabled for the project. * Removed configured header file. * Unify zconf.h template. * Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25) * Atomic does not compile on Ubuntu 14.04 * Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution. * Restore Windows compilation using "nmake -f win32\Makefile.msc" * Do not set SOVERSION on Cygwin. * Fix file permission. * Refine PCLMUL CMake option. * zconf.h required for Windows nmake * Do not set visibility flag on Cygwin. * Fix compiling dll resource. * Fix compiling using MinGW. SSE4.2 and PCLMUL are also supported. * Fix zconf.h for Windows. * support crc intrinsics for ARM CPUs * Add gzip -k option to minigzip (to aid benchmarks) * Support Intel and ARMv8 optimization in CMake. * Clean up. * Fix compiling on Apple M1. check_c_compiler_flag detects SSE2, SSE3, SSE42 and PCLMUL but compiling fails. Workaround fix, need further investigation. * Restore MSVC warnings. Co-authored-by: neurolabusc <rorden@sc.edu> Co-authored-by: Chris Rorden <rorden@mailbox.sc.edu>

This ports a21a4e8 - "Handling undefined behavior in inffast_chunk" from the Chromium fork of zlib.

Port of the patch for CVE-2018-25032 from madler@5c44459 This bug was reported by Danilo Ramos of Eideticom, Inc. It has lain in wait 13 years before being found! The bug was introduced in zlib 1.2.2.2, with the addition of the Z_FIXED option. That option forces the use of fixed Huffman codes. For rare inputs with a large number of distant matches, the pending buffer into which the compressed data is written can overwrite the distance symbol table which it overlays. That results in corrupted output due to invalid distances, and can result in out-of-bound accesses, crashing the application. The fix here combines the distance buffer and literal/length buffers into a single symbol buffer. Now three bytes of pending buffer space are opened up for each literal or length/distance pair consumed, instead of the previous two bytes. This assures that the pending buffer cannot overwrite the symbol table, since the maximum fixed code compressed length/distance is 31 bits, and since there are four bytes of pending space for every three bytes of symbol space. Co-authored-by: Mark Adler <madler@alumni.caltech.edu>

A windowBits value of 0, 16, or 32 gets the window bits from the zlib header. However there is no zlib header for 16, or for 32 when the input is gzip. This commit sets the window bits for inflate to 15 if a gzip stream is detected and windowBits was 16 or 32. Co-authored-by: Mark Adler <madler@alumni.caltech.edu>

* Fix a bug when getting a gzip header extra field with inflate(). If the extra field was larger than the space the user provided with inflateGetHeader(), and if multiple calls of inflate() delivered the extra header data, then there could be a buffer overflow of the provided space. This commit assures that provided space is not exceeded. * Fix extra field processing bug that dereferences NULL state->head. The recent commit to fix a gzip header extra field processing bug introduced the new bug fixed here. Co-authored-by: Mark Adler <fork@madler.net>

Port the change for Chromium issue 1302606 which avoids writing padding bytes for debug purposes except for debug builds. In particular, this ensures that the JDK `InflaterInputStream::read(byte[] b, int off, int len)` contract can be satisfied by this implementation. See: - https://bugs.chromium.org/p/chromium/issues/detail?id=1302606 - https://chromium.googlesource.com/chromium/src/+/36be3c7a465f00f4edefccc10bba00bbebe1843d%5E%21/ This is complementary to the work Volker Simonis is doing upstream on making `InflaterInputStream` more tolerant of this kind of behavior, providing compatibility for earlier JDK releases. - https://bugs.openjdk.org/browse/JDK-8281962 - https://bugs.openjdk.org/browse/JDK-8283758 Signed-off-by: Danny Thomas <dannyt@netflix.com> Signed-off-by: Danny Thomas <dannyt@netflix.com>

shuxinyang and others added 30 commits March 9, 2014 17:20

longest_match() implementation for x86-family platform.

3104330

See the header of "contrib/amd64/longest-match.inc" for details

Remove Makefile which is derived from Makefile.in

587a8e7

Pass target architecture info to Makefile.in to render it possible using

a6698d3

conditional-directive for target architecture.

correct the way of checking PCLMUL support

24ce498

Use "crc32" (of SSE 4.2) instruction for better hash function.

f64eee6

Detech hardware support from CFLAGS.

8fe233d

Additional 4%-6% speedup for all levels

a17deee

Merge pull request #2 from cloudflare/refactor.gcc.amd64

83be2a1

Refactor trees.c, deflate.c and deflate.h for gcc x86-64.

Fix build on OSX, and better hadling of dictionary

e07ac7d

Fix DEBUG build

a80420c

work around to CMake bug hich affects 64-bit Windows

c876310

see http://public.kitware.com/Bug/view.php?id=11240

Changes for MSVC compiler for __builtin_expect and __builtin_ctzl:

f884c8e

1) MSVC doesn't have __builtin_expect. Just ignore likely/unlikely and hope the compiler optimizes for the best. 2) MSVC also doesn't have __builtin_ctzl. Implement using _BitScanForward().

MSVC fixes for C89 declarations and typeof:

6040d4b

-MSVC (2010) only supports C89 for c files. All variable declarations need to be moved to the top of blocks. -MSVC also doesn't support GCC's typeof. No real need to call it in deflate.c anyway, so I removed it.

Merge pull request #5 from BarryLevinson/gcc.amd64

836eb11

Changes to allow compilation using MSVC 2010

Support for aarch64 with crc extension

cdc0f91

* Add aarch64 support, with over 2x faster compression than generic zlib * Support for OS X * Support for clang * Better configuration (no longer need to pass CFLAGS)

Merge pull request #7 from cloudflare/vlad/aarch64

f04f4ed

Support for aarch64 with crc extension

faster crc32 of last <8 bytes on aarch64 (#9)

9465261

* faster crc32 of last <8 bytes on aarch64

Correctly check architecture on FreeBSD (#10)

634a6b9

On FreeBSD amd64 is named... 'amd64', not 'x86_64'. This is what 'uname -m' prints.

Fix compile of crc32-pclmul_asm on macOS (#8)

0c6bb2b

* .type and .size are ELF/COFF specific so drop them * .globl + .hidden equivalent for macOS is .private_extern * symbol name are not mangled on macOS, so we need to prefix _

Set SSE4.2 flag (#13)

3f36c2b

Revert accidental hardcoding of Unix-only header (#14)

e55212b

zconf.h is generated by configure; it should not be in git. (#16)

af9ef2e

configure mostly generates this file by copying from zconf.h.in.

neurolabusc and others added 14 commits February 26, 2020 12:17

Add gzip -k option to minigzip (to aid benchmarks)

c43185e

Fix typo in configure file (#24)

ab50ae8

Create cmake.yml (#26)

c2c6e2f

Create make.yml (#27)

6931277

Handling undefined behavior in inffast_chunk

959b4ea

This ports a21a4e8 - "Handling undefined behavior in inffast_chunk" from the Chromium fork of zlib.

azherebtsov closed this Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CVE-2016-9841#36

CVE-2016-9841#36
azherebtsov wants to merge 44 commits intomasterfrom
gcc.amd64

azherebtsov commented Jan 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Conversation

azherebtsov commented Jan 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants