Skip to content

Check ZMM support when detecting VPCLMULQDQ support#1932

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
ccawley2011:vpclmulqdq
Jul 11, 2025
Merged

Check ZMM support when detecting VPCLMULQDQ support#1932
Dead2 merged 1 commit intozlib-ng:developfrom
ccawley2011:vpclmulqdq

Conversation

@ccawley2011
Copy link
Copy Markdown
Contributor

@ccawley2011 ccawley2011 commented Jul 10, 2025

Summary by CodeRabbit

  • Refactor
    • Improved internal handling and comments for CPU feature detection to enhance clarity and maintainability. No changes to visible functionality.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 10, 2025

Walkthrough

The code refactors the CPU feature detection logic in the x86_check_features function. Assignments for BMI2 and VPCLMULQDQ feature flags are reorganized for clarity, with comments updated for accuracy. No changes are made to the logic or control flow, and no public interfaces are altered.

Changes

File(s) Change Summary
arch/x86/x86_features.c Refactored feature flag assignments in x86_check_features; updated comments and code ordering.

Sequence Diagram(s)

No sequence diagram generated due to the nature of the changes (internal refactoring and comment updates only).


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f891bd3 and e8f2e37.

📒 Files selected for processing (1)
  • arch/x86/x86_features.c (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: arch/x86/chunkset_avx512.c:28-30
Timestamp: 2024-10-29T02:18:25.966Z
Learning: In `chunkset_avx512.c`, the `gen_half_mask` function does not require validation for `len` since it will never exceed 16 due to computing the remainder for a 16-byte load.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
arch/x86/x86_features.c (8)
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: arch/x86/chunkset_avx512.c:28-30
Timestamp: 2024-10-29T02:18:25.966Z
Learning: In `chunkset_avx512.c`, the `gen_half_mask` function does not require validation for `len` since it will never exceed 16 due to computing the remainder for a 16-byte load.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
🔇 Additional comments (3)
arch/x86/x86_features.c (3)

100-100: Good addition of reference documentation.

The reference URL provides valuable context for the CPU feature detection method being used.


103-104: Correct comment and logical code organization.

The comment correction from "BMI1 bit" to "BMI2 bit" is accurate, and moving the BMI2 assignment immediately after the cpuidex(7, 0, ...) call improves code organization by grouping related operations together.


124-124: Properly scoped VPCLMULQDQ detection within ZMM support check.

Moving the VPCLMULQDQ feature detection inside the if (features->has_os_save_zmm) block correctly ensures that VPCLMULQDQ support is only checked when ZMM registers are supported by the OS. This aligns with the PR objective to "Check ZMM support when detecting VPCLMULQDQ support" since VPCLMULQDQ is an AVX512 instruction that requires ZMM register support.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 10, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.67%. Comparing base (f891bd3) to head (e8f2e37).
Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
arch/x86/x86_features.c 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1932       +/-   ##
============================================
+ Coverage    39.39%   81.67%   +42.28%     
============================================
  Files           74      150       +76     
  Lines         7885    13618     +5733     
  Branches      1303     3052     +1749     
============================================
+ Hits          3106    11122     +8016     
+ Misses        4536     1512     -3024     
- Partials       243      984      +741     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 7a3e0aa into zlib-ng:develop Jul 11, 2025
146 checks passed
@ccawley2011 ccawley2011 deleted the vpclmulqdq branch July 11, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants