Skip to content

Always run CMake compiler feature tests without LTO.#1622

Merged
Dead2 merged 1 commit intodevelopfrom
no-lto-for-tests
Dec 24, 2023
Merged

Always run CMake compiler feature tests without LTO.#1622
Dead2 merged 1 commit intodevelopfrom
no-lto-for-tests

Conversation

@Dead2
Copy link
Copy Markdown
Member

@Dead2 Dead2 commented Dec 23, 2023

NOLTOFLAG is only used by tests, this makes sure it is always set for gcc/clang to avoid running CMake tests for compiler features with LTO.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 23, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a0356fa) 83.02% compared to head (edea531) 83.04%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1622      +/-   ##
===========================================
+ Coverage    83.02%   83.04%   +0.01%     
===========================================
  Files          133      133              
  Lines        10895    10895              
  Branches      2816     2816              
===========================================
+ Hits          9046     9048       +2     
  Misses        1146     1146              
+ Partials       703      701       -2     
Flag Coverage Δ
macos_clang 42.97% <ø> (ø)
macos_gcc 74.50% <ø> (ø)
ubuntu_clang 81.91% <ø> (ø)
ubuntu_clang_debug 81.58% <ø> (ø)
ubuntu_clang_inflate_allow_invalid_dist 81.57% <ø> (ø)
ubuntu_clang_inflate_strict 81.90% <ø> (ø)
ubuntu_clang_mmap 82.23% <ø> (ø)
ubuntu_clang_pigz 13.69% <ø> (ø)
ubuntu_clang_pigz_no_optim 11.29% <ø> (ø)
ubuntu_clang_pigz_no_threads 13.64% <ø> (ø)
ubuntu_clang_reduced_mem 82.31% <ø> (ø)
ubuntu_clang_toolchain_riscv ∅ <ø> (∅)
ubuntu_gcc 75.08% <ø> (ø)
ubuntu_gcc_aarch64 77.25% <ø> (ø)
ubuntu_gcc_aarch64_compat_no_opt 75.47% <ø> (ø)
ubuntu_gcc_aarch64_no_acle 76.00% <ø> (ø)
ubuntu_gcc_aarch64_no_neon 76.00% <ø> (ø)
ubuntu_gcc_armhf 77.04% <ø> (ø)
ubuntu_gcc_armhf_compat_no_opt 75.44% <ø> (ø)
ubuntu_gcc_armhf_no_acle 76.96% <ø> (ø)
ubuntu_gcc_armhf_no_neon 77.11% <ø> (ø)
ubuntu_gcc_armsf 74.43% <ø> (ø)
ubuntu_gcc_armsf_compat_no_opt 73.90% <ø> (ø)
ubuntu_gcc_benchmark 73.21% <ø> (ø)
ubuntu_gcc_compat_no_opt 76.68% <ø> (ø)
ubuntu_gcc_compat_sprefix 73.54% <ø> (ø)
ubuntu_gcc_m32 73.20% <ø> (ø)
ubuntu_gcc_mingw_i686 73.46% <ø> (ø)
ubuntu_gcc_mingw_x86_64 73.47% <ø> (ø)
ubuntu_gcc_mips 74.76% <ø> (ø)
ubuntu_gcc_mips64 74.78% <ø> (ø)
ubuntu_gcc_no_avx2 74.15% <ø> (ø)
ubuntu_gcc_no_ctz 74.44% <ø> (ø)
ubuntu_gcc_no_ctzll 74.43% <ø> (ø)
ubuntu_gcc_no_pclmulqdq 74.06% <ø> (ø)
ubuntu_gcc_no_sse2 74.33% <ø> (ø)
ubuntu_gcc_no_sse42 74.01% <ø> (ø)
ubuntu_gcc_o1 73.97% <ø> (ø)
ubuntu_gcc_osb ∅ <ø> (∅)
ubuntu_gcc_pigz 37.67% <ø> (+0.02%) ⬆️
ubuntu_gcc_pigz_aarch64 38.87% <ø> (-0.03%) ⬇️
ubuntu_gcc_ppc 73.72% <ø> (ø)
ubuntu_gcc_ppc64 74.17% <ø> (ø)
ubuntu_gcc_ppc64_power9 74.34% <ø> (ø)
ubuntu_gcc_ppc64le 74.23% <ø> (ø)
ubuntu_gcc_ppc64le_novsx 74.55% <ø> (ø)
ubuntu_gcc_ppc64le_power9 74.12% <ø> (ø)
ubuntu_gcc_ppc_no_power8 74.43% <ø> (ø)
ubuntu_gcc_s390x 74.59% <ø> (ø)
ubuntu_gcc_s390x_dfltcc 71.71% <ø> (ø)
ubuntu_gcc_s390x_dfltcc_compat 73.82% <ø> (ø)
ubuntu_gcc_s390x_no_crc32 74.39% <ø> (ø)
ubuntu_gcc_sparc64 74.58% <ø> (ø)
ubuntu_gcc_sprefix 73.21% <ø> (ø)
win64_gcc 73.84% <ø> (ø)
win64_gcc_compat_no_opt 74.50% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nmoinvaz
Copy link
Copy Markdown
Member

nmoinvaz commented Dec 23, 2023

I'm not seeing NOLTOFLAG being used in check_compiler_... checks. Also will this work when -DCMAKE_C_FLAG="-flto=thin" in CMake configure? My reading of the cmake docs for check_c_source_compiles is that CMAKE_C_FLAGS is prepended to CMAKE_REQUIRED_FLAGS.

@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Dec 23, 2023

I'm not seeing NOLTOFLAG being used in check_compiler_... checks. Also will this work when -DCMAKE_C_FLAG="-flto=thin" in CMake configure? My reading of the cmake docs for check_c_source_compiles is that CMAKE_C_FLAGS is prepended to CMAKE_REQUIRED_FLAGS.

Prepending is ok, as long as we disable it later on the command line.

But you are right, I will revise the patch.

endif()
# Check whether compiler supports loading 4 neon vecs into a register range
set(CMAKE_REQUIRED_FLAGS "${NEONFLAG}")
set(CMAKE_REQUIRED_FLAGS "${NEONFLAG} ${NATIVEFLAG} ${ZNOLTOFLAG}")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this test lacked the NATIVEFLAG, and I don't think it should, since the -march is conditional above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOW: If NativeInst is enabled, this would always fail unless -march=native is already part of cflags from outside of CMake.

@Dead2 Dead2 merged commit 0b080ed into develop Dec 24, 2023
@Dead2 Dead2 deleted the no-lto-for-tests branch January 3, 2024 13:13
@Dead2 Dead2 mentioned this pull request Jan 7, 2024
thesamesam added a commit to thesamesam/zlib-ng that referenced this pull request Jan 9, 2025
Some of zlib-ng's configure tests define a function expecting it to be compiled but
don't call that function, or don't use its return value. This is risky with
LTO where the whole thing may be optimised out, which has happened before:
* zlib-ng#1616
* zlib-ng#1622
* https://gitlab.kitware.com/cmake/cmake/-/issues/26103

Closes: zlib-ng#1841
Dead2 pushed a commit that referenced this pull request Jan 19, 2025
Some of zlib-ng's configure tests define a function expecting it to be compiled but
don't call that function, or don't use its return value. This is risky with
LTO where the whole thing may be optimised out, which has happened before:
* #1616
* #1622
* https://gitlab.kitware.com/cmake/cmake/-/issues/26103

Closes: #1841
fneddy pushed a commit to fneddy/zlib-ng that referenced this pull request Jan 23, 2025
Some of zlib-ng's configure tests define a function expecting it to be compiled but
don't call that function, or don't use its return value. This is risky with
LTO where the whole thing may be optimised out, which has happened before:
* zlib-ng#1616
* zlib-ng#1622
* https://gitlab.kitware.com/cmake/cmake/-/issues/26103

Closes: zlib-ng#1841
rkausch-fender added a commit to cclsoftware/zlib-ng that referenced this pull request Jun 13, 2025
commit 860e4cf
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Feb 9 13:19:01 2025 +0100

    2.2.4 Release

commit 43b2703
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Sun Jan 26 21:31:36 2025 +0200

    Fix shift overflow in inflate and send_code.

commit 287c4dc
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sun Feb 2 21:05:37 2025 -0500

    Fix an unfortunate bug with Visual Studio 2015

    Evidently this instruction, despite the intrinsic having a register operand,
    is a memory-register instruction. There seems to be no alignment requirement
    for the source operand. Because of this, compilers when not optimized are doing
    the unaligned load and then dumping back to the stack to do the broadcasting load.
    In doing this, MSVC seems to be dumping to the stack with an aligned move at an
    unaligned address, causing a segfault.  GCC does not seem to make this mistake, as
    it stashes to an aligned address.

    If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128
    bit load followed by a vinserti128. This _should_ fix this (issue zlib-ng#1861).

commit a3c0430
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Jan 29 18:46:34 2025 +0100

    Fix -Wmaybe-uninitialized warnings in benchmarks.

commit 057104f
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Jan 29 16:54:36 2025 +0100

    Add uncompress benchmark

commit a0fa247
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Jan 26 15:05:24 2025 +0100

    s390x: Add workaround to install custom Clang 19.1.5 rpms to actions-runner
    image in order to avoid the VX compiler bug in older clang versions.

commit 05305ed
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Fri Jan 24 01:45:41 2025 +0500

    Remove unused include directories

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 69a60bf
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Fri Jan 24 01:45:26 2025 +0500

    Rename "arch/power/fallback_builtins.h" to avoid possible conflict with "fallback_builtins.h" in zlib-ng sources directory

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 7701ce9
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Sun Jan 26 13:19:08 2025 +0200

    [abicheck] Regenerate ABI files for zlib
    * Generate using Ubuntu 24.04.1 LTS to fix mismatch in function signatures of gzseek() and gztell()

commit 5e3510e
Author: Eduard Stefes <eduard.stefes@ibm.com>
Date:   Tue Jan 21 10:48:07 2025 +0100

    Disable CRC32-VX Extention for some Clang versions
    We have to disable the CRC32-VX implementation for some Clang versions
    (18 <= version < 19.1.2) that generate bad code for the IBM S390 VGFMA intrinsics.

commit 8cebc9c
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Thu Jan 23 23:25:09 2025 +0500

    Increase cmake workflow timeout

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 608871e
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Mon Jan 20 10:26:51 2025 -0800

    Use Ubuntu 20.04 for PPC64LE tests due to broken qemu.

commit 62d52a5
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Thu Jan 9 15:47:06 2025 -0800

    Use Ubuntu 22.04 for AARCH64 tests

    It seems that qemu might be failing. Tests on Raspberry Pi 5 with Ubuntu 24.04
    appear to work just fine.

commit b7dc018
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Sun Jan 5 08:01:41 2025 -0800

    Add missing compiler-rt libraries for Ubuntu 24. zlib-ng#1840

commit a95ee9e
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 16:20:17 2025 -0800

    Ignore gcovr parser errors.

commit bdfe700
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 14:41:27 2025 -0800

    Don't pin gcovr version any longer. zlib-ng#1840

commit 2ffbbdb
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Sat Jan 4 22:05:25 2025 -0800

    Use correct version of gcov for cross-compilers.

commit 6286088
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Thu Jan 2 15:17:33 2025 -0800

    Use Ubuntu 24 crossbuild-essential packages.

commit fbba9cb
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 14:46:59 2025 -0800

    Remove package qemu for Ubuntu 24. zlib-ng#1840

commit 7077052
Author: Nathan Moinvaziri <nathan@nathanm.com>
Date:   Wed Jan 1 14:38:12 2025 -0800

    Upgrade CI from Clang-11 to Clang 15 for Ubuntu 24. zlib-ng#1840

commit 212563d
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sat Jan 4 21:19:42 2025 +0100

    Improve image/container rebuild script to work properly under cron.

commit 9064a25
Author: Dmitry Kurtaev <dmitry.kurtaev@gmail.com>
Date:   Wed Jan 15 20:28:44 2025 +0300

    Workaround error G6E97C40B

    Warning as an error with GCC from Uubuntu 24.04:
    ```
    /home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/external/zlib-ng/arch/riscv/riscv_features.c(25,33): error G6E97C40B: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] [/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/libs/build-native.proj]
    ```

commit 6d24fb8
Author: Sam James <sam@gentoo.org>
Date:   Thu Jan 9 11:36:40 2025 +0000

    cmake: disable LTO for some configure checks

    Some of zlib-ng's configure tests define a function expecting it to be compiled but
    don't call that function, or don't use its return value. This is risky with
    LTO where the whole thing may be optimised out, which has happened before:
    * zlib-ng#1616
    * zlib-ng#1622
    * https://gitlab.kitware.com/cmake/cmake/-/issues/26103

    Closes: zlib-ng#1841

commit 787c7f6
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Wed Jan 1 13:53:16 2025 +0500

    Force use of latest Windows SDK with 32-bit ARM support for release workflows

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit cbb6ec1
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Dec 29 19:01:35 2024 +0100

    2.2.3 Release

commit bf05e88
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Fri Dec 20 23:31:37 2024 +0100

    Continued cleanup of old UNALIGNED_OK checks
    - Remove obsolete checks
    - Fix checks that are inconsistent
    - Stop compiling compare256/longest_match variants that never gets called
    - Improve how the generic compare256 functions are handled.
    - Allow overriding OPTIMAL_CMP

    This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.

commit 1aeb291
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Sun Dec 22 13:25:27 2024 +0100

    Rename functions to get rid of old and now misleading "unaligned" naming

commit d7e121e
Author: Cameron Cawley <ccawley2011@gmail.com>
Date:   Thu Jul 27 21:07:29 2023 +0100

    Use GCC's may_alias attribute for unaligned memory access

commit fc90e7b
Author: Cameron Cawley <ccawley2011@gmail.com>
Date:   Sun Dec 22 13:43:30 2024 +0000

    Improved setting of OPTIMAL_CMP on ARM

commit 06bba67
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sat Dec 21 11:04:47 2024 -0500

    Fix unaligned access in ACLE based crc32

    This fixes a rightful complaint from the alignment sanitizer that we
    alias memory in an unaligned fashion. A nice added bonus is that this
    improves performance a tiny bit on the larger buffers, perhaps due to
    loops that idiomatically decrement a count and increment a single buffer
    pointer rather than the maze of conditional pointer reassignments.

    While here, let's write a unit test just for this. Since this is the only
    variant that accesses memory in a potentially unaligned fashion that doesn't
    explicitly go byte by byte or use intrinsics that don't require alignment,
    we'll enable it only for this function for now. Adding more tests later if
    need be should be possible. For everything else not crc, we're relying on
    ubsan to hopefully catch things by chance.

commit 87d8e95
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Mon Sep 16 13:15:46 2024 +0200

    Update s390x actions-runner docker

commit 005c2d3
Author: Cameron Cawley <ccawley2011@gmail.com>
Date:   Sat Dec 21 17:30:18 2024 +0000

    Set OPTIMAL_CMP for 32-bit PowerPC

commit 90913e8
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sat Dec 21 10:09:58 2024 -0500

    Fix "RLE" compression with big endian architectures

    This was missed in zlib-ng#1831. The RLE methods compare a string of bytes
    directly with itself to directly derive a simple run length encoding.
    They use similar but not identical methods to compare256. This needs
    a similar endianness check at compile time to know which compare bit
    count to use (leading or trailing).

commit 04d1b75
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Fri Dec 20 18:53:51 2024 -0500

    Make big endians first class citizens again

    No longer do the big iron on yore which lack SIMD optimized loads need
    to search strings a byte at a time like primitive machines of the vax
    era. This guard here was mostly due to the fact that the string
    comparison was searched with "count trailing zero", which assumes an
    endianness.  We can just conditionally use leading zeros when on big
    endian and stop using the extremely naive C implementation. This makes
    things a tad bit faster.

commit dbccbd1
Author: Icenowy Zheng <uwu@icenowy.me>
Date:   Sun Dec 15 01:31:48 2024 +0800

    adler32_rvv: Fix some overflow problems

    There are currently some overflow problems in adler32_rvv
    implementation, which can lead to wrong results for some input, and
    these problems could be easily exhibited when running `git fsck` with
    zlib-ng suitituting the system zlib on a big git repository.

    These problems and the solutions are the following:

    - When the input data is long enough, the v_buf32_accu can overflow too.
      Add it to the modulo code that happens per ~NMAX bytes.
    - When the vector data is reduced to scalar ones, the resulting scalar
      value (and the proceeded length) may lead to the calculation of sum2
      to overflow. Add mod BASE to all these reductions and initial
      calculation of sum2.
    - When the remaining data less than vl bytes, the code falls back to a
      scalar implementation; however the sum2 and alder2 values are just
      reduced from vectors and could be very big that makes sum2 overflows
      in the scalar code. Modulo them before the scalar code to prevent such
      overflow (because vl is surely quite smaller than NMAX).

    Signed-off-by: Icenowy Zheng <uwu@icenowy.me>

commit 509f6b5
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Tue Dec 17 23:02:32 2024 +0100

    Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
    it is time to replace the UNALIGNED_OK checks that have since really only been
    used to select the optimal comparison sizes for the arch instead.

commit 4fa76be
Author: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Date:   Sat Dec 21 00:35:50 2024 +0200

    Fix typos (zlib-ng#1825)

commit c295c28
Author: Eduard Stefes <eduard.stefes@ibm.com>
Date:   Wed Dec 4 09:15:27 2024 +0100

    added in-tree build artifacts to .gitignore

commit 037ab0f
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Tue Dec 17 23:09:31 2024 +0100

    Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics),"

    This reverts commit 80fffd7.
    It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews.

commit 80fffd7
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Tue Dec 17 23:02:32 2024 +0100

    Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
    it is time to replace the UNALIGNED_OK checks that have since really only been
    used to select the optimal comparison sizes for the arch instead.

commit 43d74a2
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sat Nov 30 09:23:28 2024 -0500

    Improve pipeling for AVX512 chunking

    For reasons that aren't quite so clear, using the masked writes here
    did not pipeline very well. Either setting up the mask stalled things
    or masked moves have issues overlapping regular moves. Simply putting
    the masked moves behind a branch that is rarely taken seemed to do the
    trick in improving the ILP. While here, put masked loads behind the same
    branch in case there were ever a hazard for overreading.

commit a4e7c34
Author: Detlef Riekenberg <wine.dev@web.de>
Date:   Fri Nov 29 22:59:52 2024 +0100

    zbuild: Provide a fallback for "ALIGNED_(x)" for other compiler

commit 7020cb3
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Wed Nov 27 19:00:52 2024 -0500

    Enable AVX2 functions to be built with BMI2 instructions

    While these are technically different instructions, no such CPU exists
    that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to
    eliminate several flag stalls by having flagless versions of shifts, and
    allows us to not clobber and move around GPRs so much in scalar code.
    There's usually a sizeable benefit for enabling it. Since we're building
    with BMI2 for AVX2 functions, let's also just make sure the CPU claims
    to support it (just to cover our bases).

commit 11bef87
Author: Bradley Lowekamp <blowekamp@mail.nih.gov>
Date:   Tue Nov 26 09:12:49 2024 -0500

    Address deprecated cmake version warning.

    Use cmake_minimum_required(VERSION <min>...<policy_max>) syntax to set
    the policy at the same time as the compatibile CMake version.

commit 2562fd1
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Sun Dec 1 07:13:42 2024 +0000

    Bump codecov/codecov-action from 4 to 5

    Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5.
    - [Release notes](https://github.com/codecov/codecov-action/releases)
    - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
    - [Commits](codecov/codecov-action@v4...v5)

    ---
    updated-dependencies:
    - dependency-name: codecov/codecov-action
      dependency-type: direct:production
      update-type: version-update:semver-major
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

commit 785444d
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Thu Nov 28 14:05:32 2024 -0500

    Fix native detection of CRC instruction

    It's unclear if raspberry pi OS's shipped GCC doesn't properly detect
    ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the
    preprocessor macro for that flag is not defined with -march=native on a
    raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do
    not get a fast CRC function.  The CRC32 preprocessor macro _IS_ defined,
    and the auto detection when built without NATIVE support does properly
    get dispatched to. Since we only need the scalar CRC32 and not the polynomial
    stuff anyhow, let's make it be an || condition and not a && one.

commit 3c11f65
Author: Pavel P <pavlov.pavel@gmail.com>
Date:   Thu Nov 28 01:18:20 2024 +0200

    Remove unused HAVE_CHUNKMEMSET_1 define

commit 7fdc3aa
Author: Pavel P <pavlov.pavel@gmail.com>
Date:   Wed Nov 27 23:13:34 2024 +0200

    Fix casting warning/error in test_compress_bound.cc

    Fixes the following error when building with msvc compiler
    ```
    test_compress_bound.cc
    D:\zlib-ng\test\test_compress_bound.cc(41,50): error C2220: the following warning is treated as an error
    D:\zlib-ng\test\test_compress_bound.cc(41,50): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
    D:\zlib-ng\test\test_compress_bound.cc(43,68): warning C4267: 'argument': conversion from 'size_t' to 'unsigned long', possible loss of data
    ```

commit 5456966
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Sun Nov 24 18:34:40 2024 +0500

    Force use of latest Windows SDK with 32-bit ARM support

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 0ed5ac8
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Wed Sep 25 17:56:36 2024 -0400

    Make an AVX512 inflate fast with low cost masked writes

    This takes advantage of the fact that on AVX512 architectures, masked
    moves are incredibly cheap. There are many places where we have to
    fallback to the safe C implementation of chunkcopy_safe because of the
    assumed overwriting that occurs. We're to sidestep most of the branching
    needed here by simply controlling the bounds of our writes with a mask.

commit 94aacd8
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Mon Sep 23 18:26:04 2024 -0400

    Try to simply the inflate loop by collapsing most cases to chunksets

commit e874b34
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Thu Sep 12 17:47:30 2024 -0400

    Make chunkset_avx2 half chunk aware

    This gives us appreciable gains on a number of fronts.  The first being
    we're inlining a pretty hot function that was getting dispatched to
    regularly. Another is that we're able to do a safe lagged copy of a
    distance that is smaller, so CHUNKCOPY gets its teeth back here for
    smaller sizes, without having to do another dispatch to a function.

    We're also now doing two overlapping writes at once and letting the CPU
    do its store forwarding. This was an enhancement @dougallj had suggested
    a while back.

    Additionally, the "half chunk mag" here is fundamentally less
    complicated because it doesn't require sythensizing cross lane permutes
    with a blend operation, so we can optimistically do that first if the
    len is small enough that a full 32 byte chunk doesn't make any sense.

commit b52e703
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Wed Sep 11 18:34:54 2024 -0400

    Simplify avx2 chunkset a bit

    Put length 16 in the length checking ladder and take care of it there
    since it's also a simple case to handle. We kind of went out of our way
    to pretend 128 bit vectors didn't exist when using avx2 but this can be
    handled in a single instruction. Strangely the intrinsic uses vector
    register operands but the instruction itself assumes a memory operand
    for the source. This also means we don't have to handle this case in our
    "GET_CHUNK_MAG" function.

commit dae668d
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Oct 9 16:27:43 2024 +0200

    Reorder variables in inflate functions to reduce padding holes
    due to variable alignment requirements.

commit 1ec47b7
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Sat Sep 28 08:09:17 2024 +0300

    configure: add --mandir to override $mandir on command line.

commit 22a4cbb
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Fri Sep 27 17:09:22 2024 +0300

    configure: Fix linker flags for Haiku.

commit 18af700
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 17:25:19 2024 +0200

    Reorder 'inflate_state' struct to improve cache-locality of variables
    needed by inffast (from 6 cachelines to 1).
    Also fill in some unnecessary holes.

commit a5c20ed
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 17:21:28 2024 +0200

    Add variable 'wbufsize' to track window buffer including padding, to allow
    the chunkset code to spill garbage data into the padding area if available.

commit 39e9c86
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 17:18:49 2024 +0200

    Don't use 'dmax' and 'sane' variables unless their checks have been compiled in.

commit 3297953
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Thu Oct 3 17:17:44 2024 -0400

    Compute the "safe" distance properly

    The safe pointer that is computed is an exclusive, not inclusive bounds.
    While we were probably rarely ever bit this, if ever, it still makes
    sense to apply the limit, properly.

commit 8d10c30
Author: FantasqueX <fantasquex@gmail.com>
Date:   Fri Sep 20 00:53:18 2024 +0800

    Explicitly set CMake policy 0169 to silence warning

    The recommended `FetchContent_MakeAvailable()` is introduced in CMake
    3.14 which is greater than `cmake_minimum_required()`.

    CMake policy will effects subdirectories.

    The `cmake_minimum_required(VERSION)` command implicitly calls
    `cmake_policy(VERSION)`.

    Closes zlib-ng#1788

commit b80eb4c
Author: Adam Stylinski <kungfujesus06@gmail.com>
Date:   Sun Sep 15 12:23:50 2024 -0400

    Simplify chunking in the copy ladder here

    As it turns out, trying to peel off the remainder with so many branches
    caused the code size to inflate a bit too much that this function
    wouldn't inline without some fairly aggressive optimization flags. Only
    catching vector sized chunks here makes the loop body small enough and
    having the byte by byte copy idiom at the bottom gives the compiler some
    flexibility that it is likely to do something there.

commit 8a1205f
Author: Hans Kristian Rosbach <hk-git@circlestorm.org>
Date:   Wed Sep 25 20:52:26 2024 +0200

    Disable MSVC warning 4324 (struct padded due to alignment)

commit 13d0a89
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Wed Sep 18 21:55:40 2024 +0300

    Force Visual C++ to treat source files as UTF-8.

commit a689e10
Author: FantasqueX <fantasquex@gmail.com>
Date:   Fri Sep 20 00:05:26 2024 +0800

    Replace non-ascii characters to fix MSVC warning

commit 8e19f15
Author: Mika Lindqvist <postmaster@raasu.org>
Date:   Fri Feb 23 13:21:28 2024 +0200

    [CI] Don't try to use macOS 11 as it's no longer supported.

commit 09f8404
Author: Letu Ren <fantasquex@gmail.com>
Date:   Tue Sep 17 21:49:27 2024 +0800

    Use target include instead of raw include

commit efca012
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Tue Sep 17 20:10:34 2024 +0500

    Fix overridde CMAKE_C_STANDARD, CMAKE_C_STANDARD_REQUIRED, CMAKE_C_EXTENSIONS. False value is allowed for CMAKE_C_STANDARD_REQUIRED and CMAKE_C_EXTENSIONS.

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit ce93943
Author: Vladislav Shchapov <vladislav@shchapov.ru>
Date:   Tue Sep 17 20:08:41 2024 +0500

    Allow overridde CMAKE_CXX_STANDARD, CMAKE_CXX_STANDARD_REQUIRED, CMAKE_CXX_EXTENSIONS variables for tests and benchmarks.

    Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>

commit 68e31fa
Author: Bartosz Taudul <wolf@nereid.pl>
Date:   Tue Sep 17 12:46:11 2024 +0200

    Fix build on aarch64 android.

    When building with CMake toolchain provided by NDK, the ARCH variable is
    not "aarch64", but "aarch64-none-linux-android26" (or similar). The
    strict string match check causes the WITH_ARMV6 option to be enabled in
    such a case. In result, arch/arm/slide_hash_armv6.c is compiled, which
    is not intended to be used on aarch64, and fails.

    Relax the check and assume aarch64 if the ARCH variable contains aarch64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants