Don't build C-fallback functions that never get used on x86_64#1984
Don't build C-fallback functions that never get used on x86_64#1984
Conversation
WalkthroughAdds a build-time option Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Config as configure / CMake
participant Build as Build system
participant Runtime as init_functable()
note over Config,Build: Configure/CMake decide which sources/flags to compile
Config->>Build: Provide flags: WITH_ALL_FALLBACKS, WITH_OPTIM, arch, HAVE_BUILTIN_CTZ
alt WITH_ALL_FALLBACKS = ON or WITH_OPTIM = OFF
Build-->>Build: Include full fallback source set (ZLIB_ALL_FALLBACK_SRCS)
else WITH_OPTIM = ON and !WITH_ALL_FALLBACKS
Build-->>Build: Include conditional subset of fallback sources (arch/HAVE_BUILTIN_CTZ dependent)
end
note over Build,Runtime: Runtime pointer selection depends on compile-time macros
Runtime->>Runtime: Check __x86_64__, WITH_ALL_FALLBACKS, WITH_OPTIM, HAVE_BUILTIN_CTZ
alt Broad fallbacks enabled (WITH_ALL_FALLBACKS)
Runtime-->>Runtime: Assign full generic C fallbacks (slide_hash, inflate_fast, chunkmemset_safe, longest_match*, compare256, crc/adler)
else Reduced x86_64 fallbacks
Runtime-->>Runtime: Assign limited C fallbacks (select longest_match variants, crc/adler)
opt if !HAVE_BUILTIN_CTZ
Runtime-->>Runtime: Also enable compare256 variant(s)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
4e24a5f to
18206d9
Compare
Develop 10 OctPR #19843.7% smaller compiled zlib-ng library on x86_64 |
e34656f to
75420bc
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CMakeLists.txt(4 hunks)README.md(2 hunks)functable.c(2 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
test/benchmarks/benchmark_slidehash.cc (2)
arch/x86/slide_hash_sse2.c (1)
void(54-63)test/benchmarks/benchmark_main.cc (1)
cpu_features(16-16)
functable.c (1)
functable.h (3)
adler(26-41)struct crc32_fold_s(36-36)struct crc32_fold_s(33-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
- GitHub Check: macOS GCC UBSAN (ARM64)
🔇 Additional comments (7)
CMakeLists.txt (4)
57-57: LGTM! Clear documentation and appropriate default.The option is correctly scoped to benchmarks and has clear documentation explaining its purpose.
1172-1175: LGTM! Correct fallback behavior when optimizations disabled.When
WITH_OPTIMis OFF, forcingWITH_ALL_FALLBACKSto ON ensures all generic fallbacks are available, since architecture-specific optimizations won't be built.
1177-1179: LGTM!Standard CMake pattern for propagating build options to compile definitions.
1361-1361: LGTM!Correctly uses the conditionally-populated
ZLIB_GENERIC_SRCSvariable.README.md (1)
207-207: LGTM!Cosmetic formatting change with no semantic impact.
test/benchmarks/benchmark_slidehash.cc (1)
71-73: LGTM! Correct conditional registration.The benchmark is only registered when
slide_hash_cis actually built, which matches the CMakeLists.txt logic that excludesslide_hash_c.cfrom x86_64 builds unlessWITH_ALL_FALLBACKSis enabled.functable.c (1)
51-80: LGTM! Correct fallback strategy for x86_64.The refactored logic correctly leverages the fact that x86_64 always has SSE2:
- On x86_64, SSE2 implementations (
slide_hash_sse2,chunkmemset_safe_sse2,inflate_fast_sse2) serve as the base fallbacks instead of the generic C implementations- C fallbacks are still needed for functions that SSE2 doesn't optimize (adler32, crc32) until higher SIMD levels (AVX2, PCLMULQDQ) are detected
- When
HAVE_BUILTIN_CTZis not available, C fallbacks forlongest_matchandcompare256are also needed since the SSE2 versions require this builtinThis aligns perfectly with the PR objective and the build changes in CMakeLists.txt.
75420bc to
cb959b2
Compare
|
It looks like some CI is failing or needs |
Package checks are expected to fail, since configure does not have this feature, and I am leaning towards not implementing it there either. Using Not quite sure what fails with |
b3d0b04 to
723fac9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(4 hunks)README.md(2 hunks)functable.c(3 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
test/benchmarks/benchmark_slidehash.cc (2)
arch/x86/slide_hash_sse2.c (1)
void(54-63)arch/x86/slide_hash_avx2.c (1)
void(33-40)
functable.c (1)
functable.h (4)
adler(26-41)struct crc32_fold_s(36-36)struct crc32_fold_s(33-33)struct crc32_fold_s(34-34)
🔇 Additional comments (10)
.github/workflows/pkgcheck.yml (2)
140-140: LGTM! CI configuration correctly enables all fallbacks.The addition of
-DWITH_ALL_FALLBACKS=ONensures package checks exercise the new fallback mechanism across all architectures.
150-150: LGTM! Compat builds also enable all fallbacks.Consistent with the non-compat build configuration, ensuring thorough testing.
README.md (2)
207-207: LGTM! Formatting correction improves table alignment.The spacing adjustment ensures consistent table formatting with other rows.
220-220: LGTM! Documentation accurately reflects the new option.The WITH_ALL_FALLBACKS option is correctly documented with:
- Clear description of purpose (useful for Gbench comparisons)
- Accurate default value (OFF, matching CMakeLists.txt line 57)
- Appropriate placement in the Advanced Build Options table
CMakeLists.txt (4)
57-57: LGTM! Option definition is well-structured.The
cmake_dependent_optioncorrectly:
- Defaults to OFF (appropriate for most builds)
- Makes the option available only when WITH_BENCHMARKS is ON (since it's primarily useful for benchmark comparisons)
- Provides a clear description
1172-1179: LGTM! Fallback forcing logic is correct.When WITH_OPTIM is disabled, forcing WITH_ALL_FALLBACKS to ON ensures all generic implementations are available, which is necessary since no architecture-specific optimizations are built. The subsequent preprocessor definition propagates this to source files.
1358-1358: LGTM! Source list composition correctly includes generic sources.The updated
ZLIB_ALL_SRCScomposition now includesZLIB_GENERIC_SRCS, which contains the conditionally-selected fallback sources. This ensures the right fallback implementations are built based on architecture and build options.
1309-1338: All C fallback functions are implemented
Verified that every function referenced in functable.c has a corresponding definition in the source.functable.c (2)
52-82: LGTM! Fallback initialization logic is well-structured.The new architecture-specific fallback initialization correctly:
- On x86_64 (when NOT WITH_ALL_FALLBACKS): Sets only the C fallbacks that are actually used, since SSE2 implementations serve as fallbacks for chunkmemset_safe, inflate_fast, and slide_hash
- When WITH_ALL_FALLBACKS: Sets all C fallback implementations, enabling comprehensive benchmark comparisons
- Conditionally sets compare256/longest_match fallbacks based on HAVE_BUILTIN_CTZ availability
This aligns with the PR objective to avoid building unused C-fallback functions on x86_64 while maintaining flexibility for benchmarking.
85-322: LGTM! Architecture-specific optimizations are properly guarded.The
#ifdef WITH_OPTIMguard ensures that architecture-specific function table assignments are only performed when optimizations are enabled. This is consistent with the CMakeLists.txt logic that forces WITH_ALL_FALLBACKS when WITH_OPTIM is disabled, ensuring fallbacks are always available.
723fac9 to
bdf1473
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
functable.c (1)
52-83: Uninitialized function pointers when WITH_ALL_FALLBACKS=0 (non‑x86_64) and when WITH_OPTIM=OFF (x86_64).
- For non‑x86_64 with WITH_ALL_FALLBACKS unset, no C fallbacks are assigned at all → ft fields remain uninitialized.
- On x86_64 with WITH_ALL_FALLBACKS unset and WITH_OPTIM=OFF, many ft fields remain uninitialized (chunkmemset_safe, inflate_fast, slide_hash, etc.). Calls into stubs will reenter init_functable and can recurse forever.
Initialize full C fallbacks for non‑x86_64 in this block, and fail fast for the invalid x86_64 config (or ensure build system forces WITH_ALL_FALLBACKS when WITH_OPTIM=OFF).
Apply this diff to fix non‑x86_64 and guard the invalid x86_64 config:
// Set up generic C code fallbacks -#ifndef WITH_ALL_FALLBACKS -# ifdef __x86_64__ +#ifndef WITH_ALL_FALLBACKS +# ifdef __x86_64__ +/* On x86_64 we omit most C fallbacks; SSE2 is used instead. */ ft.adler32 = &adler32_c; ft.adler32_fold_copy = &adler32_fold_copy_c; ft.crc32 = &crc32_c; ft.crc32_fold = &crc32_fold_c; ft.crc32_fold_copy = &crc32_fold_copy_c; ft.crc32_fold_final = &crc32_fold_final_c; ft.crc32_fold_reset = &crc32_fold_reset_c; # ifndef HAVE_BUILTIN_CTZ ft.longest_match = &longest_match_c; ft.longest_match_slow = &longest_match_slow_c; ft.compare256 = &compare256_c; # endif -# endif +# ifndef WITH_OPTIM +# error "Building on x86_64 with WITH_OPTIM=OFF requires WITH_ALL_FALLBACKS=ON to provide complete C fallbacks." +# endif +# else + /* Non-x86_64: provide complete C fallbacks */ + ft.adler32 = &adler32_c; + ft.adler32_fold_copy = &adler32_fold_copy_c; + ft.chunkmemset_safe = &chunkmemset_safe_c; + ft.crc32 = &crc32_c; + ft.crc32_fold = &crc32_fold_c; + ft.crc32_fold_copy = &crc32_fold_copy_c; + ft.crc32_fold_final = &crc32_fold_final_c; + ft.crc32_fold_reset = &crc32_fold_reset_c; + ft.inflate_fast = &inflate_fast_c; + ft.slide_hash = &slide_hash_c; + ft.longest_match = &longest_match_c; + ft.longest_match_slow = &longest_match_slow_c; + ft.compare256 = &compare256_c; +# endif #elseAdditionally ensure the build system sets WITH_ALL_FALLBACKS=ON automatically when WITH_OPTIM=OFF on x86_64, to avoid this configuration entirely. Based on PR objectives.
🧹 Nitpick comments (2)
README.md (1)
220-221: Clarify configure behavior for WITH_ALL_FALLBACKS.The configure script now unconditionally defines WITH_ALL_FALLBACKS, so configure builds always include C fallbacks. Please note this here to avoid confusion (CMake default remains OFF).
For example, append a short note below the table: “Note: configure builds always enable all C fallbacks; the CMake default is OFF unless -DWITH_ALL_FALLBACKS=ON.”
configure (1)
1780-1784: Only define WITH_ALL_FALLBACKS when needed (preserve x86_64 size win).Unconditionally adding -DWITH_ALL_FALLBACKS undermines the PR’s x86_64 size reduction for configure builds. Gate it to cases that require C fallbacks (non‑x86_64 or WITH_OPTIM=OFF).
Apply this diff:
-# CMake can exclude building some of the generic fallback functions, -# configure does not have the detection code to do so. -CFLAGS="${CFLAGS} -DWITH_ALL_FALLBACKS" -SFLAGS="${SFLAGS} -DWITH_ALL_FALLBACKS" +# CMake can exclude some generic fallbacks; configure lacks equivalent detection. +# Keep C fallbacks when not on x86_64, or when optimizations are disabled. +if [ "${ARCH}" != "x86_64" ] || [ ${without_optimizations} -ne 0 ]; then + CFLAGS="${CFLAGS} -DWITH_ALL_FALLBACKS" + SFLAGS="${SFLAGS} -DWITH_ALL_FALLBACKS" +fi
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(4 hunks)README.md(2 hunks)configure(1 hunks)functable.c(3 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pkgcheck.yml
🔇 Additional comments (2)
README.md (1)
207-207: LGTM on minor formatting.Entry renders correctly.
test/benchmarks/benchmark_slidehash.cc (1)
71-73: Correct macro guard usage.Using defined(WITH_ALL_FALLBACKS) avoids undefined-macro pitfalls. Condition reads correctly.
cd418e3 to
b55be08
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
functable.c (1)
52-82: LGTM: Fallback initialization correctly restructured.The new fallback initialization logic properly handles three scenarios:
- x86_64 without
WITH_ALL_FALLBACKS(lines 53-67): Minimal C fallbacks, relying on SSE2 implementations forchunkmemset_safe,inflate_fast, andslide_hash(line 93, 97-98)- With
WITH_ALL_FALLBACKSdefined (lines 68-82): All C fallback implementationsThe logic is sound: on x86_64, the SSE2 block (lines 88-104) unconditionally executes, providing the missing initializations. When
WITH_OPTIMis OFF,WITH_ALL_FALLBACKSis automatically enabled (CMakeLists.txt line 1176), ensuring all fallbacks are initialized.Consider adding inline comments to clarify the initialization strategy:
ft.force_init = &force_init_empty; - // Set up generic C code fallbacks + // Set up generic C fallback functions + // On x86_64: minimal set (SSE2 provides chunkmemset_safe, inflate_fast, slide_hash) + // Other platforms: full set (required as ultimate fallbacks) #ifndef WITH_ALL_FALLBACKS # if defined(__x86_64__) || defined(_M_X64) + // x86_64 always has SSE2, so only need fallbacks that SSE2 doesn't provide ft.adler32 = &adler32_c;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(7 hunks)README.md(2 hunks)configure(1 hunks)functable.c(3 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- test/benchmarks/benchmark_slidehash.cc
- configure
- .github/workflows/pkgcheck.yml
- README.md
🧰 Additional context used
🧬 Code graph analysis (1)
functable.c (1)
functable.h (2)
adler(26-41)struct crc32_fold_s(36-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
- GitHub Check: Windows GCC Native Instructions (AVX)
🔇 Additional comments (7)
CMakeLists.txt (6)
93-93: LGTM: New build option properly declared.The new
WITH_ALL_FALLBACKSoption is correctly declared, marked as advanced, and added to the feature summary.Also applies to: 155-155, 1584-1584
718-718: LGTM: Proper gating definition added.This correctly defines
WITH_OPTIMfor the preprocessor, enabling the optimization gate infunctable.c(line 85).
1307-1316: LGTM: Fallback sources well organized.Defining all generic fallback sources in a single list improves maintainability and makes the conditional inclusion logic clearer.
1318-1338: LGTM: Fallback selection logic correctly implemented.The conditional logic properly handles three cases:
- Explicit
WITH_ALL_FALLBACKS=ON: includes all fallbacks- x86_64 with SSE2 (implicit): includes only necessary C fallbacks, excluding
chunkset_c.c- Other architectures: includes all fallbacks and defines
WITH_ALL_FALLBACKSNote: The past review comment about excluding
chunkset_c.cfrom x86_64 builds has been correctly addressed. On x86_64,chunkmemset_safe_sse2is always used (functable.c line 93), sochunkset_c.cis properly excluded (line 1323-1329).
1358-1358: LGTM: Source composition updated correctly.The change from using separate source lists to consolidating with
ZLIB_GENERIC_SRCSensures the conditionally-selected fallback sources are properly included in the build.
1174-1177: Verify build & tests with WITH_OPTIM=OFF
CMakeLists.txt correctly forces WITH_ALL_FALLBACKS=ON and functable.c enables fallback code under that flag; confirm the project builds and all tests pass with -DWITH_OPTIM=OFF.functable.c (1)
85-85: LGTM: Optimization code properly gated.The
WITH_OPTIMguards correctly wrap all architecture-specific optimization code (lines 85-322), ensuring that when optimizations are disabled, only the C fallback implementations are used. This aligns with the automaticWITH_ALL_FALLBACKS=ONsetting in CMakeLists.txt (line 1176) whenWITH_OPTIM=OFF.Also applies to: 322-322
b55be08 to
8804f69
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
CMakeLists.txt (1)
1318-1339: x86_64 fallback pruning is correct; resolves prior feedback.The x86_64 branch excludes chunkset_c.c and only includes compare256_c.c when NOT HAVE_BUILTIN_CTZ. This matches the “SSE2 as fallback” rationale and removes unused C fallbacks on x86_64.
🧹 Nitpick comments (2)
.github/workflows/pkgcheck.yml (1)
150-151: Consistent with pkgcheck; consider documenting why ABI steps differ.Compat job also enables WITH_ALL_FALLBACKS. If ABI steps are meant to differ, a brief comment in this workflow would prevent regressions.
CMakeLists.txt (1)
1174-1177: Force WITH_ALL_FALLBACKS in cache when WITH_OPTIM=OFF.set(WITH_ALL_FALLBACKS ON) overrides the normal variable, but not the cache entry. Force-setting the cache makes GUIs/feature summaries consistent and avoids user cache overrides lingering.
-else() - # If WITH_OPTIM is disabled, we need all the fallbacks. - set(WITH_ALL_FALLBACKS ON) -endif() +else() + # If WITH_OPTIM is disabled, we need all the fallbacks. + set(WITH_ALL_FALLBACKS ON CACHE BOOL "Build all generic fallback functions (Useful for Gbench)" FORCE) +endif()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/workflows/pkgcheck.yml(2 hunks)CMakeLists.txt(7 hunks)README.md(2 hunks)configure(1 hunks)functable.c(4 hunks)test/benchmarks/benchmark_slidehash.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- test/benchmarks/benchmark_slidehash.cc
- configure
- README.md
- functable.c
🔇 Additional comments (3)
.github/workflows/pkgcheck.yml (1)
140-141: Align ABI steps or confirm intentional divergence.Good to enable WITH_ALL_FALLBACKS for pkgcheck. The ABI check steps still use CMAKE_ARGS without this flag. If parity is desired, mirror the flag there to avoid config drift; otherwise, confirm it’s intentional that ABI checks run without fallbacks.
CMakeLists.txt (2)
93-94: New build toggle looks good.WITH_ALL_FALLBACKS is a sensible, narrowly-scoped option. No issues.
1584-1585: Feature summary entry added appropriately.Exposes WITH_ALL_FALLBACKS in FeatureSummary; matches the new option.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1984 +/- ##
============================================
+ Coverage 39.28% 82.44% +43.16%
============================================
Files 73 161 +88
Lines 7904 13762 +5858
Branches 1280 3131 +1851
============================================
+ Hits 3105 11346 +8241
+ Misses 4575 1382 -3193
- Partials 224 1034 +810 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
x86_64 always has SSE2, therefore we can use the SSE2 functions as fallbacks instead of
Summary by CodeRabbit
New Features
Refactor
Tests
Documentation
Chores