Skip to content

CVE-2016-9841#36

Closed
azherebtsov wants to merge 44 commits intomasterfrom
gcc.amd64
Closed

CVE-2016-9841#36
azherebtsov wants to merge 44 commits intomasterfrom
gcc.amd64

Conversation

@azherebtsov
Copy link
Copy Markdown

Cherry-picked from: madler@9aaec95

This is (CVE-2016-9841)[https://github.com/advisories/GHSA-5f8r-846v-423w] fix.

shuxinyang and others added 30 commits March 9, 2014 17:20
on x86-64 architecture. Speedup the performance by some 7% on my linux box
with corei7 archiecture.

  The original loop is legal to be vectorized; gcc 4.7.* and 4.8.*
somehow fail to catch this case. There are still have room to squeeze
from the vectorized code. However, since these loops now account for about
1.5% of execution time, it is not worthwhile to sequeeze the performance
via hand-writing assembly.

  The original loops are guarded with "#ifdef NOT_TWEAK_COMPILER". By
default, the modified version is picked up unless the code is compiled
explictly with -DNOT_TWEAK_COMPILER.
… the

speedup arises from:
    o. Remove the conditional branch in the loop
    o. Remove some indirection memory accesses:
       The memory accesses to "s->prev_length" s->strstart" cannot be promoted
       to register because the compiler is not able to disambiguate them with
       store-operation in INSERT_STRING()
    o. Convert non-countable loop to countable loop.
       I'm not sure if this change really contribute, in general, countable
       loop is lots easier to optimized than non-countable loop.
See the header of "contrib/amd64/longest-match.inc" for details
conditional-directive for target architecture.
…hange

sees about 5% speedup to compression at level of 3, 5, 6. Level-1/9 sees
significantly higher/lower speedup. Decompression shows about 20% speedup.

The file "contrib/amd64/crc32-pclmul_asm.S" is stolen from linux kerel 3.14
with minor change. In the near future, this file will be replaced with C
implementation.
(about 0% to 3% depending on the compression level and input). I guess
the speedup likely arises from following facts:

 1) "s->pending" now is loaded once, and stored once. In the original
    implementation, it needs to be loaded and stored twice as the
    compiler isn't able to disambiguate "s->pending" and
    "s->pending_buf[]"

 2) better code generations:
   2.1) no instruction are needed for extracting two bytes from a short.
   2.2) need less registers
   2.3) stores to adjacent bytes are merged into a single store, albeit
        at the cost of penalty of potentially unaligned access.
…ast 4 bytes, instead the 3 the spec allows. It allows for faster and more effective compression because there are less hash collisions, and false matches are discarded faster.

In theory it could lead to worse compression in some cases, but for the most common files there is an improvement for all levels both in speed and in compression ratio.
It changes the hash and longest match functions to always match at least 4 bytes, instead the 3 the spec allows. It allows for faster and more effective compression because there are less hash collisions, and false matches are discarded faster.
In theory it could lead to worse compression in some cases, but for the most common files there is an improvement for all levels both in speed and in compression ratio.
…ons we do not require (FASTEST, 64K, NOT_TWEAK_COMPILER). Remove contribs we don't require, integrate the hash func and longest_match funcs into deflate.c.

Improve output buffer performance by using 64 bit buffer instead 16 bit.
All in all ~10% performance gain for lvl 4 and ~5% performance gain for lvl 5
Refactor trees.c, deflate.c and deflate.h for gcc x86-64.
1) MSVC doesn't have __builtin_expect.  Just ignore
   likely/unlikely and hope the compiler optimizes for the best.
2) MSVC also doesn't have __builtin_ctzl.  Implement using
   _BitScanForward().
-MSVC (2010) only supports C89 for c files.  All variable
 declarations need to be moved to the top of blocks.
-MSVC also doesn't support GCC's typeof.  No real need to call it
 in deflate.c anyway, so I removed it.
Changes to allow compilation using MSVC 2010
* Add aarch64 support, with over 2x faster compression than generic zlib
* Support for OS X
* Support for clang
* Better configuration (no longer need to pass CFLAGS)
Support for aarch64 with crc extension
* faster crc32 of last <8 bytes on aarch64
On FreeBSD amd64 is named... 'amd64', not 'x86_64'. This is what
'uname -m' prints.
* .type and .size are ELF/COFF specific so drop them
* .globl + .hidden equivalent for macOS is .private_extern
* symbol name are not mangled on macOS, so we need to prefix _
* PS-237 Don't disable target optimizations when cross-compiling.

AFAICT there's no reason to check the host platform here. The test only runs the compiler; it doesn't execute the output.

* Add misc intermediate files to .gitignore.
configure mostly generates this file by copying from zconf.h.in.
neurolabusc and others added 14 commits February 26, 2020 12:17
* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc

* Westmere detection

* Update configure for Westmere (InsightSoftwareConsortium/ITK#416)

* use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416)

* remove gpl code

* Improve support for compiling using Windows (https://github.com/ningfei/zlib)

* Import ucm.cmake from https://github.com/ningfei/zlib

* crc32_simd as separate file (#18)

* atomic and SKIP_CPUID_CHECK (#18)

* Remove unused code

* Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25)

* Atomic does not compile on Ubuntu 14.04

* Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution.

* Restore Windows compilation using "nmake -f win32\Makefile.msc"

* zconf.h required for Windows nmake

* support crc intrinsics for ARM CPUs

* Requested changes (#19)
Add SIMD NEON implementation of the adler32 checksum.

Inflate speed increased by ~10% with the Silesia corpus for ARMv8. Tested with
a modified zpipe.c to run inflate, deflate for a stream of size 100M.

Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib.
17bbb3d73c84 ("zlib adler_simd.c")
…e) (#22)

* When windowBits is zero, the size of the sliding window comes from

the zlib header.  The allowed values of the four-bit field are
0..7, but when windowBits is zero, values greater than 7 are
permitted and acted upon, resulting in large, mostly unused memory
allocations.  This fix rejects such invalid zlib headers.

* Add option to not compute or check check values.

The undocumented (except in these commit comments) function
inflateValidate(strm, check) can be called after an inflateInit(),
inflateInit2(), or inflateReset2() with check equal to zero to
turn off the check value (CRC-32 or Adler-32) computation and
comparison. Calling with check not equal to zero turns checking
back on. This should only be called immediately after the init or
reset function. inflateReset() does not change the state, so a
previous inflateValidate() setting will remain in effect.

This also turns off validation of the gzip header CRC when
present.

This should only be used when a zlib or gzip stream has already
been checked, and repeated decompressions of the same stream no
longer need to be validated.

* This verifies that the state has been initialized, that it is the

expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.

* Use macros to represent magic numbers

This combines two patches which help in improving the readability and
maintainability of the code by making magic numbers into #defines.

Based on Chris Blume's (cblume@chromium) patches for zlib chromium:
8888511 - "Zlib: Use defines for inffast"
b9c1566 - "Share inffast names in zlib"

These patches are needed when introducing chunk SIMD NEON enchancements.

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

* Port inflate chunk SIMD NEON patches for cloudflare

Based on 2 patches from zlib chromium fork:

* Adenilson Cavalcanti (adenilson.cavalcanti@arm.com)
  3060dcb - "zlib: inflate using wider loads and stores"

* Noel Gordon (noel@chromium.org)
  64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy

The two patches combined provide around 5-25% increase in inflate
performance, based on the workload, when checked with a modified
zpipe.c and the Silesia corpus.

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

* Increase inflate speed: read decode input into a uint64_t

Update the chunk-copy code with a wide input data reader, which consumes
input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the
wide reader.

Based on Noel Gordon's (noel@chromium.org) patch for the zlib chromium fork
8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t"

This patch provides 7-10% inflate performance improvement when tested with a
modified zpipe.c and the Silesia corpus.

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
* Add SIMD SSSE3 implementation of the adler32 checksum

Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib.
17bbb3d73c84 ("zlib adler_simd.c")

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

* Port inflate chunk SIMD SSE2 improvements for cloudflare

Based on 2 patches from zlib chromium fork:

* Adenilson Cavalcanti (adenilson.cavalcanti@arm.com)
  3060dcb - "zlib: inflate using wider loads and stores"

* Noel Gordon (noel@chromium.org)
  64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy

The improvement in inflate performance is around 15-35%, based
on the workload, when checked with a modified zpipe.c and the
Silesia corpus.

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
* Fix architecture macro on MSVC.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Update '.gitignore'.

Also remove 'zconf.h', it will be generated during building.

* Add 'build' subdir to .gitignore

* Fix architecture macro on MSVC.

* Update CMakeLists.txt

Clean up.
Add option to set runtime (useful for MSVC).
Add 'SSE4.2' and 'AVX' option.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Define '_LARGEFILE64_SOURCE' by default in CMakeLists.txt.

Also remove checking fseeko.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Set 'CMAKE_MACOSX_RPATH' to TRUE.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Update SSE4.2 and PCLMUL setting in CMakeLists.txt

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Add 'HAVE_HIDDEN' to CMakeLists.txt

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Fix building dll on MSVC.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Fix zlib.pc file generation.

Signed-off-by: Ningfei Li <ningfei.li@gmail.com>

* Refine cmake settings.

* Change cmake function to lowercase.

* Fix zlib.pc

* Fix crc32_pclmul_le_16 type.

* Set POSITION_INDEPENDENT_CODE flag to ON by default.

* Replace GPL CRC with BSD CRC (zlib-ng/zlib-ng#42), for validation see https://github.com/neurolabusc/simd_crc

* Westmere detection

* Update configure for Westmere (InsightSoftwareConsortium/ITK#416)

* use cpu_has_pclmul() to autodetect CPU hardware (InsightSoftwareConsortium/ITK#416)

* remove gpl code

* Improve support for compiling using Windows (https://github.com/ningfei/zlib)

* Import ucm.cmake from https://github.com/ningfei/zlib

* crc32_simd as separate file (#18)

* atomic and SKIP_CPUID_CHECK (#18)

* Suppress some MSVC warnings.

* Remove unused code

* Fix ucm_set_runtime when only C is enabled for the project.

* Removed configured header file.

* Unify zconf.h template.

* Only allow compiler to use clmul instructions to crc_simd unit (intel/zlib#25)

* Atomic does not compile on Ubuntu 14.04

* Allow "cmake -DSKIP_CPUID_CHECK=ON .." to not check SIMD CRC support on execution.

* Restore Windows compilation using "nmake -f win32\Makefile.msc"

* Do not set SOVERSION on Cygwin.

* Fix file permission.

* Refine PCLMUL CMake option.

* zconf.h required for Windows nmake

* Do not set visibility flag on Cygwin.

* Fix compiling dll resource.

* Fix compiling using MinGW.

SSE4.2 and PCLMUL are also supported.

* Fix zconf.h for Windows.

* support crc intrinsics for ARM CPUs

* Add gzip -k option to minigzip (to aid benchmarks)

* Support Intel and ARMv8 optimization in CMake.

* Clean up.

* Fix compiling on Apple M1.

check_c_compiler_flag detects SSE2, SSE3, SSE42 and PCLMUL but compiling
fails. Workaround fix, need further investigation.

* Restore MSVC warnings.

Co-authored-by: neurolabusc <rorden@sc.edu>
Co-authored-by: Chris Rorden <rorden@mailbox.sc.edu>
This ports a21a4e8 - "Handling undefined behavior in inffast_chunk" from the Chromium fork of zlib.
Port of the patch for CVE-2018-25032 from
madler@5c44459

This bug was reported by Danilo Ramos of Eideticom, Inc. It has
lain in wait 13 years before being found! The bug was introduced
in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
option forces the use of fixed Huffman codes. For rare inputs with
a large number of distant matches, the pending buffer into which
the compressed data is written can overwrite the distance symbol
table which it overlays. That results in corrupted output due to
invalid distances, and can result in out-of-bound accesses,
crashing the application.

The fix here combines the distance buffer and literal/length
buffers into a single symbol buffer. Now three bytes of pending
buffer space are opened up for each literal or length/distance
pair consumed, instead of the previous two bytes. This assures
that the pending buffer cannot overwrite the symbol table, since
the maximum fixed code compressed length/distance is 31 bits, and
since there are four bytes of pending space for every three bytes
of symbol space.

Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
A windowBits value of 0, 16, or 32 gets the window bits from the
zlib header.  However there is no zlib header for 16, or for 32
when the input is gzip.  This commit sets the window bits for
inflate to 15 if a gzip stream is detected and windowBits was 16
or 32.

Co-authored-by: Mark Adler <madler@alumni.caltech.edu>
* Fix a bug when getting a gzip header extra field with inflate().

If the extra field was larger than the space the user provided with
inflateGetHeader(), and if multiple calls of inflate() delivered
the extra header data, then there could be a buffer overflow of the
provided space. This commit assures that provided space is not
exceeded.

* Fix extra field processing bug that dereferences NULL state->head.

The recent commit to fix a gzip header extra field processing bug
introduced the new bug fixed here.

Co-authored-by: Mark Adler <fork@madler.net>
Port the change for Chromium issue 1302606 which avoids writing padding bytes for debug purposes except for debug builds. In particular, this ensures that the JDK `InflaterInputStream::read(byte[] b, int off, int len)` contract can be satisfied by this implementation. See:

- https://bugs.chromium.org/p/chromium/issues/detail?id=1302606
- https://chromium.googlesource.com/chromium/src/+/36be3c7a465f00f4edefccc10bba00bbebe1843d%5E%21/

This is complementary to the work Volker Simonis is doing upstream on making `InflaterInputStream` more tolerant of this kind of behavior, providing compatibility for earlier JDK releases.

- https://bugs.openjdk.org/browse/JDK-8281962
- https://bugs.openjdk.org/browse/JDK-8283758

Signed-off-by: Danny Thomas <dannyt@netflix.com>

Signed-off-by: Danny Thomas <dannyt@netflix.com>
@azherebtsov azherebtsov closed this Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.