Skip to content

Simplify inflate fast by dispatching to chunkmemset for all chunkcopy cases#1802

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:chunking_experiments
Oct 23, 2024
Merged

Simplify inflate fast by dispatching to chunkmemset for all chunkcopy cases#1802
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:chunking_experiments

Conversation

@KungFuJesus
Copy link
Copy Markdown
Collaborator

@KungFuJesus KungFuJesus commented Oct 4, 2024

Pretty much all chunkcopy, chunkmemset, and _safe variants are just special optimization cases of while (len--) *to++ = *from++;

We call chunk_memset in scenarios where we are repeating a previous element as we would in a self modifying copy loop (i.e. when there's overlap in the to & from pointers, and from points before to). We call chunkcopy, with or without overlap, as a special case of overlap is such that the distance between the pointers is >= or vector register width. We dispatched mostly to chunkcopy_safe in the case that we could not write more than len bytes (the edge of a buffer). It also was intentionally agnostic to the positions of to and from.

By modifying the code such that we have a an architecture specific CHUNKCOPY_SAFE that we call into, we can make use of the architecture-specific CHUNKCOPY and CHUNKMEMSET routines. This version avoids dispatching to memcpy, which was probably more expensive particularly for short strings. It also is likely to save us several reads, as many chunkcopy_safe calls can now detect overlap and leverage the optimized chunkmemset in doing so.

Summary by CodeRabbit

  • New Features

    • Enhanced memory operations with the introduction of half-sized chunks and new inline functions for handling them.
    • Added CHUNKCOPY_SAFE function for improved data copying with additional safety checks.
  • Bug Fixes

    • Improved safety checks in memory copying operations to prevent data overwrites.
  • Documentation

    • Updated function signatures across multiple files to reflect changes in parameters for better memory handling.
  • Refactor

    • Streamlined function pointer assignments for architecture-specific optimizations based on CPU capabilities.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 4, 2024

Walkthrough

The changes in this pull request involve modifications to several source files related to memory chunk operations across different architectures (ARM, x86, RISC-V, POWER). Key alterations include updates to function signatures, replacing distance parameters with pointers to uint8_t, and the introduction of new functions and types for handling half-sized chunks. The GET_CHUNK_MAG function's signature has been updated across multiple files to accommodate a larger distance type. These changes enhance the flexibility and efficiency of memory operations, particularly in the context of chunk memory manipulation.

Changes

File Change Summary
arch/arm/arm_functions.h Updated chunkmemset_safe_neon function signature to change dist from unsigned to uint8_t *from.
arch/generic/generic_functions.h Updated chunkmemset_safe_c function signature to change dist from unsigned to uint8_t *from.
arch/power/power_functions.h Updated chunkmemset_safe_power8 function signature to change dist from unsigned to uint8_t *from.
arch/riscv/riscv_functions.h Updated chunkmemset_safe_rvv function signature to change dist from unsigned to uint8_t *from.
arch/x86/chunkset_avx2.c Added halfchunk_t type and several functions for handling half-sized chunks; updated GET_CHUNK_MAG function signature; expanded chunkmemset functions.
arch/x86/x86_functions.h Updated function signatures for chunkmemset_safe_sse2, chunkmemset_safe_ssse3, and chunkmemset_safe_avx2 to replace unsigned dist with uint8_t *from.
chunkset_tpl.h Updated CHUNKMEMSET and CHUNKMEMSET_SAFE signatures to accept uint8_t *from instead of unsigned dist; added new inline function CHUNKCOPY_SAFE.
functable.c Updated chunkmemset_safe_stub signature to replace unsigned dist with uint8_t *from.
functable.h Updated chunkmemset_safe function pointer signature to replace unsigned dist with uint8_t *from.
inffast_tpl.h Replaced instances of chunkcopy_safe with CHUNKCOPY_SAFE for enhanced memory safety.
inflate.c Changed the second argument of chunkmemset_safe from state->offset to put - state->offset.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Memory
    participant ChunkSet

    User->>ChunkSet: Request memory operation
    ChunkSet->>Memory: Perform chunk memory set
    Memory-->>ChunkSet: Return success
    ChunkSet-->>User: Confirm operation completed
Loading

Possibly related PRs

  • Improve chunkset_avx2 performance #1778: The changes in this PR involve modifying the function signatures of chunk memory operations, similar to the main PR's updates to chunkmemset_safe_neon, chunkmemset_safe_c, and others, which also change the parameter from unsigned dist to uint8_t *from.
  • Compute the "safe" distance properly #1801: This PR adjusts the parameters passed to the CHUNKMEMSET_SAFE function, which is related to the changes made in the main PR regarding the handling of memory operations and the transition from using a distance parameter to a pointer.
  • Reorder variables in inflate functions to reduce padding holes #1803: While primarily focused on reordering variables to reduce padding, this PR also touches on the inflate function, which is relevant to the overall memory handling context that the main PR addresses, particularly in relation to the inflate process.

Suggested labels

enhancement, cleanup

Suggested reviewers

  • nmoinvaz

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from 8fc5dad to 2e8beb1 Compare October 4, 2024 04:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (6)
functable.h (1)

Line range hint 1-30: Overall assessment of changes

The modification to the chunkmemset_safe function pointer signature in functable.h is consistent with the PR's objective of simplifying and optimizing chunk operations in the zlib-ng library. This change allows for more direct memory manipulation, which could lead to performance improvements, especially for short strings and overlapping memory regions as mentioned in the PR description.

However, it's crucial to ensure that this change is reflected consistently throughout the codebase. Any code that calls chunkmemset_safe will need to be updated to match the new signature. Additionally, thorough testing should be conducted to verify that this change doesn't introduce any regressions or unexpected behavior in the library's functionality.

Consider adding a comment near the chunkmemset_safe function pointer to explain the rationale behind this change and any specific usage guidelines for the new signature. This will help future maintainers understand the design decision and use the function correctly.

arch/x86/x86_functions.h (2)

36-36: LGTM. Consistent across all x86 extensions.

The change to chunkmemset_safe_avx2 completes the set of modifications across all supported x86 architecture extensions (SSE2, SSSE3, AVX2). This consistency is crucial for maintaining the codebase and ensuring optimal performance across different CPU capabilities.

Consider adding or updating a comprehensive test suite that covers all these chunkmemset_safe variants to ensure they behave consistently across different x86 extensions.


11-11: Summary: Consistent optimization across x86 extensions

The changes to chunkmemset_safe functions for SSE2, SSSE3, and AVX2 are consistent and align well with the PR objectives of optimizing chunk copying and memory setting operations. By changing the second parameter from unsigned dist to uint8_t *from, the functions can now work directly with memory addresses, potentially leading to more efficient operations.

However, it's important to note that this change might require updates to all code that calls these functions. To ensure a smooth transition:

  1. Update all call sites to these functions throughout the codebase.
  2. Adjust any documentation or comments related to these functions to reflect the new parameter.
  3. Update and expand the test suite to cover these changes thoroughly.
  4. Consider adding a deprecation notice or wrapper functions for the old signature to ease migration if this is a public API.

To verify the impact and ensure all necessary updates have been made, please run the following script:

#!/bin/bash
# Search for all calls to chunkmemset_safe functions and verify they've been updated
rg --type c --type cpp "chunkmemset_safe_(sse2|ssse3|avx2)\s*\(" -A 3

# Search for any remaining uses of the old 'dist' parameter name in relation to these functions
rg --type c --type cpp "chunkmemset_safe_(sse2|ssse3|avx2).*dist"

# Check if there are any comments or documentation that need updating
rg --type c --type cpp "chunkmemset_safe_(sse2|ssse3|avx2)" -C 5 | rg "(/\*|//|@param|@brief)"

This will help identify any areas that might need further attention during this refactoring process.

Also applies to: 24-24, 36-36

arch/x86/chunkset_avx2.c (2)

56-58: Add explanatory comments for half_rem_vals array

The half_rem_vals array contains magic numbers whose purpose may not be immediately clear to future maintainers. Adding comments to explain how these values are derived and how they are used in the code would enhance readability and maintainability.


133-149: Consider adding unit tests for halfchunkmemset_* functions

The newly introduced functions halfchunkmemset_2, halfchunkmemset_4, and halfchunkmemset_8 are critical for handling half-sized chunks. Adding unit tests for these functions would help ensure their correctness and prevent future regressions.

🧰 Tools
🪛 GitHub Check: CodeQL

[notice] 133-133: Unused static function
Static function halfchunkmemset_2 is unreachable


[notice] 139-139: Unused static function
Static function halfchunkmemset_4 is unreachable


[notice] 145-145: Unused static function
Static function halfchunkmemset_8 is unreachable

chunkset_tpl.h (1)

7-7: Remove unnecessary inclusion of <stdio.h>

The <stdio.h> header is included but not used in this file. Removing it can reduce compilation time and prevent potential namespace pollution.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 8d10c30 and 2e8beb1.

📒 Files selected for processing (11)
  • arch/arm/chunkset_neon.c (0 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (5 hunks)
  • arch/x86/chunkset_sse2.c (0 hunks)
  • arch/x86/chunkset_ssse3.c (0 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (5 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
💤 Files with no reviewable changes (3)
  • arch/arm/chunkset_neon.c
  • arch/x86/chunkset_sse2.c
  • arch/x86/chunkset_ssse3.c
🧰 Additional context used
🪛 GitHub Check: CodeQL
arch/x86/chunkset_avx2.c

[notice] 60-60: Unused static function
Static function chunkmemset_1 is unreachable


[notice] 133-133: Unused static function
Static function halfchunkmemset_2 is unreachable


[notice] 139-139: Unused static function
Static function halfchunkmemset_4 is unreachable


[notice] 145-145: Unused static function
Static function halfchunkmemset_8 is unreachable

🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (13)
functable.h (1)

30-30: Approve change with verification needed

The modification to the chunkmemset_safe function pointer signature aligns with the PR objectives of simplifying and optimizing chunk operations. Changing from unsigned dist to uint8_t *from allows for direct pointer manipulation, which can potentially improve performance.

However, this change may impact other parts of the codebase that call this function.

Please run the following script to verify all calls to chunkmemset_safe have been updated to match the new signature:

If any mismatched calls are found, please update them to use the new signature.

✅ Verification successful

Change Verified Successfully

All calls to chunkmemset_safe have been updated to match the new signature.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all calls to chunkmemset_safe match the new signature

# Test: Search for function calls to chunkmemset_safe
# Expect: All calls should use a pointer as the second argument, not an unsigned int
rg --type c --type cpp -e 'chunkmemset_safe\s*\([^)]*\)' -g '!functable.h'

Length of output: 149

arch/generic/generic_functions.h (1)

25-25: Approved: Function signature change aligns with PR objectives.

The modification of chunkmemset_safe_c function signature from (uint8_t *out, unsigned dist, unsigned len, unsigned left) to (uint8_t *out, uint8_t *from, unsigned len, unsigned left) aligns well with the PR's goal of optimizing chunk copying and memory setting operations. This change could potentially improve performance by directly using a pointer to the source memory instead of a distance value.

To ensure this change doesn't introduce any issues, please run the following script to check for any remaining uses of the old function signature:

This script will help identify any places in the codebase where the function might still be called with the old signature, allowing you to update them accordingly.

✅ Verification successful

Verified: No remaining uses of the old chunkmemset_safe_c function signature found.

All references have been updated to use the new signature successfully.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining uses of the old chunkmemset_safe_c function signature

# Test: Search for the old function signature
rg --type c --type cpp "chunkmemset_safe_c\s*\(\s*\w+\s*,\s*unsigned\s+dist\s*," -g '!arch/generic/generic_functions.h'

# Test: Search for all uses of chunkmemset_safe_c to manually verify
rg --type c --type cpp "chunkmemset_safe_c" -g '!arch/generic/generic_functions.h'

Length of output: 333

arch/x86/x86_functions.h (2)

24-24: LGTM. Consistent with SSE2 changes.

The modification to chunkmemset_safe_ssse3 is consistent with the changes made to the SSE2 version, maintaining uniformity across different x86 architecture extensions.


11-11: LGTM. Verify impact on existing code.

The change from unsigned dist to uint8_t *from aligns with the PR objectives of optimizing chunk copying and memory setting operations. This modification allows for more direct memory access, potentially improving performance.

Please run the following script to check for any unintended side effects:

✅ Verification successful

Verified: No existing calls to chunkmemset_safe_sse2 found. The change does not impact current codebase functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all calls to chunkmemset_safe_sse2 and verify they've been updated
rg --type c --type cpp "chunkmemset_safe_sse2\s*\(" -A 3

Length of output: 347

inflate.c (2)

1084-1084: Improved handling of overlapping memory regions

The change from state->offset to put - state->offset as the second argument in the chunkmemset_safe function call is a good improvement. This modification calculates a relative address based on the current put position, which should enhance the handling of overlapping memory regions during decompression. This aligns well with the PR objective of optimizing the handling of short strings and overlapping cases.

The new implementation is likely to be more robust in scenarios where the source and destination memory areas overlap, potentially reducing the number of memory reads and improving overall efficiency.


Line range hint 1-1984: Overall assessment of changes in inflate.c

The modification made to the inflate function in this file is focused and aligns well with the PR objectives. The change improves the handling of overlapping memory regions during decompression, which should enhance performance for short strings and overlapping cases.

No other significant changes or issues were identified in the file. The implementation appears to be sound and maintains the overall structure and logic of the zlib decompression process.

chunkset_tpl.h (2)

254-272: ⚠️ Potential issue

Ensure valid pointer arithmetic in CHUNKCOPY_SAFE

In CHUNKCOPY_SAFE, pointer arithmetic is performed with safe, out, and from. Subtracting pointers that do not refer to the same array or object can lead to undefined behavior. Confirm that these pointers are within the same array or valid memory region before performing arithmetic.

Run the following script to identify all calls to CHUNKCOPY_SAFE and check the relationships between the pointers:

#!/bin/bash
# Description: Verify that 'safe', 'out', and 'from' pointers in CHUNKCOPY_SAFE are valid for arithmetic operations.

rg --type c 'CHUNKCOPY_SAFE\(([^,]+),\s*([^,]+),\s*([^,]+),\s*([^)]+)\)' -o -r 'out=\1 from=\2 len=\3 safe=\4' | while read -r call; do
    echo "Call: $call"
    # Further static analysis can be performed to ensure pointers are in the same object
done

128-133: Verify the correctness of sdist handling in CHUNKMEMSET

The condition if (sdist < 0 && dist < len) handles cases where from is ahead of out, and memmove is used to accommodate overlapping regions. Ensure that this logic correctly covers all potential overlap scenarios and does not introduce unintended side effects.

Run the following script to examine all usages of CHUNKMEMSET and check for overlapping regions:

✅ Verification successful

CHUNKMEMSET sdist Handling Verified Successfully

All usages of CHUNKMEMSET correctly handle overlapping memory regions with the current sdist condition.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to CHUNKMEMSET and analyze the distance between 'out' and 'from'

rg --type c --word-regexp 'CHUNKMEMSET\(([^,]+),\s*([^,]+),\s*[^)]+\)' -o -r '$0' | while read -r line; do
    # Extract 'out' and 'from' parameters
    out_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\1/')
    from_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\2/')
    echo "Call: $line"
    echo "out: $out_var, from: $from_var"
    # Additional analysis can be performed here
done

Length of output: 929


Script:

#!/bin/bash
# Description: Find all calls to CHUNKMEMSET and extract 'out' and 'from' parameters for analysis

rg --type c --word-regexp 'CHUNKMEMSET\(([^,]+),\s*([^,]+),\s*[^)]+\)' -o -r '$0' | while read -r line; do
    # Extract 'out' and 'from' parameters
    out_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\1/')
    from_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\2/')
    echo "Call: $line"
    echo "out: $out_var, from: $from_var"
    # Additional analysis can be performed here
done

Length of output: 1980

functable.c (1)

276-278: The updated function signature is appropriate

The change from unsigned dist to uint8_t *from in chunkmemset_safe_stub aligns with the updated implementation. The stub function properly initializes the function table and delegates the call to the optimized function.

inffast_tpl.h (4)

241-241: Appropriate Use of CHUNKCOPY_SAFE to Prevent Overwriting Window Data

The use of CHUNKCOPY_SAFE ensures safe copying of data when there is a risk of overwriting the window buffer. This change maintains data integrity during decompression by leveraging optimized, architecture-specific routines.


251-258: Conditional Handling Enhances Safety and Performance

The conditional selection between CHUNKCOPY_SAFE and chunkcopy_safe based on the extra_safe flag is appropriate. It ensures that optimized functions are used when it's safe, improving performance without compromising safety. This approach maintains correctness when out and window may overlap.


260-264: Consistent Application of Safe Copy Functions Based on extra_safe

Using CHUNKCOPY_SAFE when extra_safe is false and chunkcopy_safe otherwise ensures that the most efficient function is used without sacrificing safety. This consistency enhances code clarity and maintainability.


277-277: Utilization of CHUNKMEMSET for Optimized Memory Set Operations

The change to use CHUNKMEMSET leverages architecture-specific optimizations for memory setting operations. This enhancement is expected to improve performance during decompression.

chunkset_tpl.h Outdated
Comment on lines +263 to +272
if (UNLIKELY((safe - from) < sizeof(chunk_t))) {
while (len--) {
*out++ = *from++;
}

return out;
}

return CHUNKMEMSET(out, from, len);
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle overlap correctly in CHUNKCOPY_SAFE

The check if (UNLIKELY((safe - from) < sizeof(chunk_t))) in CHUNKCOPY_SAFE assumes that safe is ahead of from. If this is not the case, the subtraction could result in incorrect values or undefined behavior. Ensure that safe is always greater than or equal to from or adjust the logic to handle all possible pointer relationships.

Apply this diff to add a safety check:

     uint64_t safelen = (safe - out);
     len = MIN(len, safelen);

-    if (UNLIKELY((safe - from) < sizeof(chunk_t))) {
+    if (UNLIKELY(safe <= from || (safe - from) < sizeof(chunk_t))) {
         while (len--) {
             *out++ = *from++;
         }
         return out;
     }

Committable suggestion was skipped due to low confidence.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe is the marker for the end of the window. We're only trying to detect the presence of a read length from from that is greater than the allocated window, as that will cause a segmentation fault. We wrap the window after dealing with these boundary conditions such that we're still looking at the same allocation. I don't believe that from can ever be ahead of safe. But please verify that for me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KungFuJesus, thank you for the clarification. I've verified that from cannot be ahead of safe since safe represents the end of the window and from is always within the bounds of the allocated window. Therefore, the subtraction safe - from is valid, and the existing logic correctly handles all possible pointer relationships. I'll make note of this for future reviews.


✏️ Learnings added
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:263-271
Timestamp: 2024-10-07T21:56:31.719Z
Learning: In the `CHUNKCOPY_SAFE` function within `chunkset_tpl.h`, `from` cannot be ahead of `safe` because `safe` marks the end of the window, and `from` is always within the allocated window bounds. Thus, the subtraction `safe - from` is valid without additional checks.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errm, @nmoinvaz, @Dead2 am I being gaslit here or is that actually true? I presumed it to be true but the wrap around behavior is a little bit hard to grok. Do we have separate allocations here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 4, 2024

Codecov Report

Attention: Patch coverage is 3.03030% with 96 lines in your changes missing coverage. Please review.

Project coverage is 31.29%. Comparing base (e874b34) to head (30860bb).
Report is 1 commits behind head on develop.

Current head 30860bb differs from pull request most recent head d63912a

Please upload reports for the commit d63912a to get more accurate results.

Files with missing lines Patch % Lines
chunkset_tpl.h 0.00% 66 Missing and 1 partial ⚠️
arch/x86/chunkset_avx2.c 0.00% 18 Missing ⚠️
inffast_tpl.h 27.27% 6 Missing and 2 partials ⚠️
functable.c 0.00% 2 Missing ⚠️
inflate.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1802      +/-   ##
===========================================
- Coverage    33.03%   31.29%   -1.74%     
===========================================
  Files           66       66              
  Lines         5546     5538       -8     
  Branches      1224     1228       +4     
===========================================
- Hits          1832     1733      -99     
- Misses        3453     3547      +94     
+ Partials       261      258       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nmoinvaz
Copy link
Copy Markdown
Member

nmoinvaz commented Oct 5, 2024

Looks like MSVC broken.

chunkset_tpl.h(190,50): error C2220: the following warning is treated as an error 
  (compiling source file '../arch/generic/chunkset_c.c')

chunkset_tpl.h(190,50): warning C4242: 'function': conversion from 'uint64_t' to 'uint32
_t', possible loss of data 
  (compiling source file '../arch/generic/chunkset_c.c')

chunkset_tpl.h(261,11): warning C4242: '=': conversion from 'uint64_t' to 'unsigned int'
, possible loss of data
  (compiling source file '../arch/generic/chunkset_c.c')

@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

Looks like MSVC broken.

chunkset_tpl.h(190,50): error C2220: the following warning is treated as an error 
  (compiling source file '../arch/generic/chunkset_c.c')

chunkset_tpl.h(190,50): warning C4242: 'function': conversion from 'uint64_t' to 'uint32
_t', possible loss of data 
  (compiling source file '../arch/generic/chunkset_c.c')

chunkset_tpl.h(261,11): warning C4242: '=': conversion from 'uint64_t' to 'unsigned int'
, possible loss of data
  (compiling source file '../arch/generic/chunkset_c.c')

Those are valid warnings. I'll remedy soon.

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from 2e8beb1 to d50ddbb Compare October 7, 2024 21:27
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2e8beb1 and d50ddbb.

📒 Files selected for processing (9)
  • arch/arm/chunkset_neon.c (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (6 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (6 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • arch/generic/generic_functions.h
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (1)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (9)
chunkset_tpl.h (4)

113-139: Approve CHUNKMEMSET changes with a minor suggestion

The changes to the CHUNKMEMSET function significantly improve its flexibility and handling of edge cases. The use of memmove for overlapping regions is correct, and the optimizations for different chunk sizes can improve performance.

The changes look good overall. However, consider adding a comment explaining the llabs function usage for clarity:

uint64_t dist = llabs(sdist);  // Use absolute value to handle both forward and backward distances

This comment would help future maintainers understand the reasoning behind using llabs.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/


Line range hint 222-250: Approve CHUNKMEMSET_SAFE changes

The changes to the CHUNKMEMSET_SAFE function are consistent with the updates to CHUNKMEMSET. The function maintains its safety checks while adapting to the new signature.

The changes look good and maintain the safety guarantees of the function.


Line range hint 1-272: Overall assessment of changes in chunkset_tpl.h

The changes made to chunkset_tpl.h significantly improve the flexibility and performance of chunk operations. The introduction of new functions like HALFCHUNKCOPY and CHUNKCOPY_SAFE, along with the updates to existing functions, provide better handling of various memory scenarios.

The modifications are generally well-implemented and align with the PR objectives of simplifying and optimizing chunk operations. The suggested improvements in the previous comments will further enhance the safety and robustness of these functions.

To ensure the changes don't introduce any regressions, consider adding or updating unit tests to cover the new functionality and edge cases, particularly for the new HALFCHUNKCOPY and CHUNKCOPY_SAFE functions.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/


255-272: ⚠️ Potential issue

Improve safety check in CHUNKCOPY_SAFE

The new CHUNKCOPY_SAFE function provides a safe way to copy chunks, which is good. However, there's a potential issue with the safety check for overlapping regions.

The current check (safe - from) < sizeof(chunk_t) assumes that safe is always greater than from. This might not always be true, especially if from points to a location after safe. Consider updating the check to handle all cases:

- if (UNLIKELY((safe - from) < sizeof(chunk_t))) {
+ if (UNLIKELY(safe <= from || (safe - from) < sizeof(chunk_t))) {

This change ensures that the function handles all possible pointer relationships correctly, improving its safety.

To verify this change, we can check for any existing uses of CHUNKCOPY_SAFE in the codebase:

This will help us understand the impact of the proposed change on existing code.

arch/x86/chunkset_avx2.c (5)

78-81: LGTM

The chunkmemset_16 function correctly uses _mm256_broadcastsi128_si256 to broadcast a 128-bit value across a 256-bit AVX2 register. Unaligned memory access with _mm_loadu_si128 is acceptable on AVX2-capable CPUs, as per the learnings.


129-131: LGTM

The loadhalfchunk function correctly loads a 128-bit half-size chunk using _mm_loadu_si128. Unaligned memory access is acceptable for AVX2-capable CPUs.


133-136: LGTM

The storehalfchunk function correctly stores a 128-bit half-size chunk using _mm_storeu_si128. Unaligned memory access is acceptable.


137-142: LGTM

The halfchunk2whole function correctly zero-extends a halfchunk_t to a chunk_t using _mm256_zextsi128_si256. This approach is appropriate and helps appease memory sanitizers.


143-154: ⚠️ Potential issue

Ensure dist is within the valid range in GET_HALFCHUNK_MAG

In the function GET_HALFCHUNK_MAG, dist is used as dist - 3 to index into arrays perm_idx_lut and half_rem_vals. To prevent potential out-of-bounds array accesses, ensure that dist is always greater than or equal to 3 and does not exceed the maximum expected value based on the sizes of these arrays.

Consider adding an assertion or input validation to enforce that dist is within the valid range:

assert(dist >= 3 && dist <= MAX_VALID_DIST);

Replace MAX_VALID_DIST with the appropriate maximum value based on the array sizes.

⛔ Skipped due to learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch 2 times, most recently from 28e9c80 to d1205a7 Compare October 7, 2024 22:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
arch/x86/chunkset_avx2.c (2)

56-59: New constant array for half remainder values

A new constant array half_rem_vals has been added to store remainder values for half chunks. This array will be used in the GET_HALFCHUNK_MAG function.

The values seem to follow a pattern, but it's not immediately clear why these specific values were chosen. Consider adding a comment explaining the purpose and derivation of these values for better maintainability.


78-81: New chunkmemset_16 function added

The chunkmemset_16 function has been implemented to set a 256-bit chunk using a 128-bit value. This function uses AVX2 intrinsics to broadcast a 128-bit value across a 256-bit chunk.

The implementation looks correct and efficient. However, consider adding a brief comment explaining the purpose of this function for better readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d50ddbb and d1205a7.

📒 Files selected for processing (10)
  • arch/arm/chunkset_neon.c (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (6 hunks)
  • arch/x86/chunkset_ssse3.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (6 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • arch/arm/chunkset_neon.c
  • arch/generic/generic_functions.h
  • arch/x86/chunkset_ssse3.c
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (4)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:263-271
Timestamp: 2024-10-07T21:56:31.870Z
Learning: In the `CHUNKCOPY_SAFE` function within `chunkset_tpl.h`, `from` cannot be ahead of `safe` because `safe` marks the end of the window, and `from` is always within the allocated window bounds. Thus, the subtraction `safe - from` is valid without additional checks.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (13)
arch/x86/chunkset_avx2.c (5)

9-9: New type and function declarations added

The changes introduce new types and function declarations:

  • halfchunk_t is defined as __m128i.
  • HAVE_CHUNKMEMSET_16 and HAVE_HALF_CHUNK macros are added.

These additions seem to enhance the functionality for handling half-sized chunks. The changes look good and align with the PR objectives.

Also applies to: 12-12, 17-18, 20-20


90-90: Updated GET_CHUNK_MAG function signature

The GET_CHUNK_MAG function signature has been updated to use uint64_t for the dist parameter instead of uint32_t. This change aligns with the PR objectives and improves consistency across different implementations.


113-113: Simplified conditional logic in GET_CHUNK_MAG

The conditional logic for handling cases where dist is equal to 16 has been removed. This simplification is good for code readability and performance.


129-141: New functions for handling half chunks

Three new functions have been added for handling half chunks:

  1. loadhalfchunk: Loads a 128-bit chunk from memory.
  2. storehalfchunk: Stores a 128-bit chunk to memory.
  3. halfchunk2whole: Converts a half chunk to a whole chunk by zero-extending it.

These functions look well-implemented and provide necessary functionality for working with half-sized chunks. The comment in halfchunk2whole explains the reason for zero-extension, which is good for maintainability.


143-154: 💡 Codebase verification

Consistency in 'dist' Parameter Type

The dist parameter in GET_HALFCHUNK_MAG is of type uint32_t, whereas in GET_CHUNK_MAG it is uint64_t. To maintain consistency, consider changing the type in GET_HALFCHUNK_MAG to uint64_t.

🔗 Analysis chain

New GET_HALFCHUNK_MAG function added

The GET_HALFCHUNK_MAG function has been implemented to retrieve the magnitude of a half chunk. It uses a lookup table for remainder values and performs shuffling operations on the data.

The implementation looks correct and efficient. However, there are a couple of points to consider:

  1. The function uses __msan_unpoison, which is good for memory sanitization. However, ensure that this doesn't introduce any performance overhead in production builds.
  2. The dist parameter is of type uint32_t, while in GET_CHUNK_MAG it was changed to uint64_t. Consider using uint64_t here as well for consistency.

To verify the consistency of the dist parameter type, run the following script:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the type of 'dist' parameter in GET_HALFCHUNK_MAG and GET_CHUNK_MAG functions

rg --type c --no-heading --line-number "GET_(HALF)?CHUNK_MAG.*\(.*dist\)" arch/x86/chunkset_avx2.c

Length of output: 291

⛔ Skipped due to learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
chunkset_tpl.h (8)

70-74: Improved handling of larger distances in GET_CHUNK_MAG

The changes to use uint64_t for the dist parameter and internal variables enhance the function's capability to handle larger memory ranges. This modification aligns well with the overall goal of improving flexibility in memory operations.


92-108: Efficient handling of smaller data sizes with HALFCHUNKCOPY

The addition of the HALFCHUNKCOPY function is a good optimization for handling smaller data sizes. This can potentially improve performance in scenarios where full chunk copies are unnecessary. The implementation follows the established pattern and looks correct.


113-133: Enhanced flexibility and safety in CHUNKMEMSET

The modifications to CHUNKMEMSET significantly improve its flexibility by accepting a source pointer instead of a distance. The addition of logic to handle cases where the source is ahead of the destination, using memmove for overlapping regions, enhances the function's safety and reliability. These changes align well with the goal of improving memory manipulation functions.


138-139: Optimized dispatch to CHUNKCOPY in CHUNKMEMSET

The updated condition for using CHUNKCOPY when the distance is greater than or equal to the chunk size is a good optimization. This change simplifies the logic and potentially improves performance by leveraging the specialized CHUNKCOPY function for larger distances.


149-168: Optimized handling for systems with half-chunk support

The addition of the HAVE_HALF_CHUNK conditional block provides excellent optimization for systems supporting half-chunk operations. This specialized handling for smaller memory operations can potentially improve performance in scenarios where full chunk operations are unnecessary. The implementation correctly handles various edge cases, aligning well with the overall goal of enhancing memory manipulation efficiency.


Line range hint 222-250: Consistent updates to CHUNKMEMSET_SAFE

The modifications to CHUNKMEMSET_SAFE are consistent with the changes made to CHUNKMEMSET, now accepting a source pointer instead of a distance. The function maintains its important safety checks while incorporating the new pointer-based approach. These changes ensure that the safe version of the function remains in sync with the optimized version, maintaining both safety and performance.


255-272: Enhanced safety with new CHUNKCOPY_SAFE function

The addition of the CHUNKCOPY_SAFE function is a valuable improvement to the library's safety features. This function provides a secure alternative to CHUNKCOPY, implementing crucial checks for overlapping memory regions and buffer boundaries. The implementation correctly handles various edge cases, ensuring safe memory operations even in complex scenarios. This addition aligns perfectly with the goal of enhancing both performance and safety in memory manipulations.


135-135: Regarding memset usage: Performance considerations

The static analysis tool flagged the use of memset as potentially unsafe. However, in this performance-critical section, the use of memset is intentional and appropriate. The function already implements necessary safety checks, and len is enforced with safelen early on. Changing to memset_s would introduce additional overhead without providing significant safety benefits in this context.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch 4 times, most recently from 5b13bea to 5e41366 Compare October 7, 2024 23:09
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
chunkset_tpl.h (3)

92-108: LGTM! Consider using unsigned for alignment calculation.

The new HALFCHUNKCOPY function is a great addition for optimizing operations on smaller data sizes. The implementation looks solid and follows a similar structure to CHUNKCOPY.

However, there's a minor suggestion for improvement:

Consider changing the alignment calculation to use unsigned instead of int32_t:

-    int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1;
+    unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;

This change would make the type consistent with the len parameter and avoid potential issues with large len values.


113-133: LGTM! Consider optimizing distance calculation.

The changes to CHUNKMEMSET are well-implemented and improve the function's flexibility. The new logic to handle cases where the source pointer is ahead of the destination pointer is a great addition for safety.

Consider optimizing the distance calculation:

-    int64_t sdist = out - from;
-    uint64_t dist = llabs(sdist);
+    uint64_t dist = (out > from) ? (out - from) : (from - out);

This change avoids the use of llabs, which might introduce unnecessary overhead for most cases where out > from. The suggested approach directly calculates the absolute difference without branching.


255-273: LGTM! Consider adding more detailed comments.

The new CHUNKCOPY_SAFE function is a valuable addition, providing a safe way to copy chunks while handling various edge cases and potential overlaps. The implementation looks solid and uses appropriate checks to ensure safety.

Consider adding more detailed comments to explain the function's behavior, especially for the following aspects:

  1. The purpose of the safe parameter.
  2. The rationale behind the checks for from_dist and safelen.
  3. The fallback to byte-by-byte copy when conditions aren't met.

These comments would improve the function's maintainability and help future developers understand its intricacies.

arch/x86/chunkset_avx2.c (1)

143-154: Consider adding an assertion to enforce dist < 16

Since dist is expected to always be less than 16 in GET_HALFCHUNK_MAG, consider adding an assertion to enforce this assumption. This can help catch unexpected usage and prevent potential undefined behavior.

Example:

 static inline halfchunk_t GET_HALFCHUNK_MAG(uint8_t *buf, uint32_t *chunk_rem, uint32_t dist) {
+    assert(dist < 16);
     lut_rem_pair lut_rem = perm_idx_lut[dist - 3];
     __m128i perm_vec, ret_vec;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d1205a7 and 5e41366.

📒 Files selected for processing (12)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/arm/chunkset_neon.c (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (6 hunks)
  • arch/x86/chunkset_ssse3.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (6 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • arch/arm/chunkset_neon.c
  • arch/generic/generic_functions.h
  • arch/x86/chunkset_ssse3.c
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (13)
arch/arm/arm_functions.h (1)

11-11: Approve function signature change with verification suggestion.

The modification of chunkmemset_safe_neon function signature from using unsigned dist to uint8_t *from aligns with the PR objectives of simplifying and optimizing chunk operations. This change allows for more direct memory access, potentially improving performance.

However, ensure that:

  1. The function implementation has been updated accordingly.
  2. All calls to this function across the codebase have been adjusted to match the new signature.

To verify the consistent update of this function across the codebase, run the following script:

This script will help ensure that the function signature has been updated consistently across the codebase and identify any calls that might need adjustment.

✅ Verification successful

Function signature change verified successfully.

The chunkmemset_safe_neon function signature has been consistently updated across the codebase. No instances of the old signature were found, and all function calls are confined to arch/arm/arm_functions.h. The change aligns with the PR objectives and poses no issues.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the consistent update of chunkmemset_safe_neon function signature

# Test 1: Check for any remaining occurrences of the old function signature
echo "Checking for old function signature..."
rg "chunkmemset_safe_neon\s*\(\s*uint8_t\s*\*\s*out\s*,\s*unsigned\s+dist\s*," --type c

# Test 2: Verify the new function signature is used consistently
echo "Checking for new function signature..."
rg "chunkmemset_safe_neon\s*\(\s*uint8_t\s*\*\s*out\s*,\s*uint8_t\s*\*\s*from\s*," --type c

# Test 3: Check for any calls to chunkmemset_safe_neon that might need updating
echo "Checking for function calls that might need updating..."
rg "chunkmemset_safe_neon\s*\(" --type c

Length of output: 751

arch/power/power_functions.h (1)

18-18: Approve the function signature change and verify its usage.

The modification of chunkmemset_safe_power8 function signature from using unsigned dist to uint8_t *from aligns with the PR objectives of optimizing chunk copying and memory setting operations. This change potentially improves performance by using a direct pointer instead of calculating an offset.

To ensure this change doesn't introduce any issues, please verify all calls to this function across the codebase. Run the following script to find all occurrences:

Ensure that all calls to this function have been updated to pass a pointer instead of a distance value.

✅ Verification successful

Function signature change verified and approved.

The chunkmemset_safe_power8 function signature has been updated correctly. No direct function calls were found in the codebase, and the function pointer assignment in functable.c aligns with the new signature.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all occurrences of chunkmemset_safe_power8 function calls

# Search for function calls
echo "Searching for chunkmemset_safe_power8 function calls:"
rg --type c --type cpp "chunkmemset_safe_power8\s*\(" -A 3

# Search for function pointer assignments
echo "Searching for chunkmemset_safe_power8 function pointer assignments:"
rg --type c --type cpp "chunkmemset_safe_power8\s*[^(]"

Length of output: 832

chunkset_tpl.h (3)

135-135: Acknowledge static analysis hint about memset

The static analysis tool suggests using memset_s() instead of memset() for better security. However, in this performance-critical code, extra bounds checks are intentionally avoided to maintain optimal performance.

The current use of memset() is appropriate in this context, as the function already ensures proper bounds checking through other means (e.g., the safelen calculation). Adding the overhead of memset_s() would negatively impact performance without providing significant additional safety in this specific implementation.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/


Line range hint 1-273: Overall, the changes look good and improve the codebase.

The modifications to chunkset_tpl.h enhance the flexibility and safety of chunk memory operations while maintaining performance. Key improvements include:

  1. Expanded support for larger memory ranges in GET_CHUNK_MAG.
  2. Addition of HALFCHUNKCOPY for optimizing smaller data sizes.
  3. Improved CHUNKMEMSET to handle more complex memory layouts.
  4. Introduction of CHUNKCOPY_SAFE for safer chunk copying.

These changes should positively impact the overall performance and reliability of the zlib-ng library.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/


70-74: LGTM! Consider verifying the impact of uint64_t change.

The change from uint32_t to uint64_t for the dist parameter and internal variables is a good improvement, allowing for larger distance values. This change enhances the function's capability to handle larger memory ranges.

To ensure this change doesn't introduce any unexpected behavior in the rest of the codebase, please run the following verification script:

This will help identify any potential issues with existing code that might still be passing uint32_t values.

arch/x86/chunkset_avx2.c (8)

9-9: Inclusion of x86_intrins.h is appropriate

The inclusion of "x86_intrins.h" is necessary for the AVX2 intrinsic functions utilized in this file.


12-12: Definition of halfchunk_t is appropriate

Defining halfchunk_t as __m128i correctly establishes a type for handling 128-bit data chunks.


17-20: Declarations for chunkmemset and half-chunk support are appropriate

The added macros HAVE_CHUNKMEMSET_16, HAVE_CHUNKMEMSET_1, and HAVE_HALF_CHUNK appropriately indicate the availability of these functions and features.


56-59: Initialization of half_rem_vals array is correct

The half_rem_vals array is correctly initialized with the appropriate remainder values for half-chunk operations.


78-81: Implementation of chunkmemset_16 is correct

The function chunkmemset_16 correctly uses _mm_loadu_si128 to load a 128-bit value and _mm256_broadcastsi128_si256 to broadcast it across a 256-bit chunk.


129-131: Implementation of loadhalfchunk is appropriate

The function loadhalfchunk correctly loads 128 bits from the source address into a halfchunk_t using _mm_loadu_si128.


133-135: Implementation of storehalfchunk is appropriate

The function storehalfchunk correctly stores a halfchunk_t into the destination address using _mm_storeu_si128.


137-141: Zero-extension in halfchunk2whole is appropriate

The function halfchunk2whole correctly zero-extends a halfchunk_t to a chunk_t using _mm256_zextsi128_si256, which is suitable for combining half-sized chunks into full-sized chunks.

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from 5e41366 to c44806d Compare October 8, 2024 02:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
chunkset_tpl.h (2)

92-108: Good addition of HALFCHUNKCOPY for smaller data sizes

The new HALFCHUNKCOPY function is a good optimization for handling smaller data sizes. The implementation looks correct and follows the pattern of CHUNKCOPY.

One minor suggestion:
Consider changing the align variable type to unsigned to match the len parameter type:

- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1;
+ unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;

This change would make the types consistent, although it's not critical given that len is known to be small and won't exceed int32_t limits.


149-168: Good addition of half-chunk handling in CHUNKMEMSET

The new code block for handling half-chunk operations is a valuable addition:

  1. It provides more granular optimizations for different memory sizes.
  2. The conditional compilation allows for platform-specific optimizations.
  3. Edge cases for different distances are handled correctly.

Note: The use of goto on line 165 is acceptable in this context for optimization purposes, but it's worth documenting the reason for its use to prevent future misunderstandings.

Consider adding a brief comment explaining the rationale behind using goto here, as it's generally discouraged but justified in this performance-critical code.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5e41366 and c44806d.

📒 Files selected for processing (12)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/arm/chunkset_neon.c (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (6 hunks)
  • arch/x86/chunkset_ssse3.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (6 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • arch/arm/arm_functions.h
  • arch/arm/chunkset_neon.c
  • arch/generic/generic_functions.h
  • arch/power/power_functions.h
  • arch/x86/chunkset_ssse3.c
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (14)
arch/x86/chunkset_avx2.c (8)

9-12: New include and type definition for enhanced functionality

The addition of "x86_intrins.h" and the new halfchunk_t type definition enhance the file's capabilities:

  1. Including "x86_intrins.h" provides access to additional x86-specific intrinsics.
  2. The halfchunk_t typedef introduces support for 128-bit chunks, complementing the existing 256-bit chunk_t.

These changes lay the groundwork for more flexible chunk handling in the subsequent code.


17-20: New macro definitions for extended chunk operations

The addition of new macro definitions enhances the file's capabilities:

  1. HAVE_CHUNKMEMSET_16 and HAVE_CHUNKMEMSET_1 indicate support for 16-byte and 1-byte chunk memory set operations.
  2. HAVE_HALF_CHUNK signifies the implementation of half-chunk (128-bit) operations.

These macros provide compile-time information about the available functionality, which can be useful for conditional compilation or feature detection.


78-81: New chunkmemset_16 function for efficient 16-byte chunk setting

The chunkmemset_16 function is a valuable addition:

  1. It efficiently sets a 256-bit chunk using a 128-bit value.
  2. It utilizes AVX2 intrinsics (_mm256_broadcastsi128_si256) for optimal performance.
  3. The implementation aligns with the previously added HAVE_CHUNKMEMSET_16 macro.

This function enhances the library's capability to handle 16-byte chunk operations efficiently.


129-136: New loadhalfchunk and storehalfchunk functions for efficient half-chunk operations

The addition of loadhalfchunk and storehalfchunk functions enhances the library's capabilities:

  1. They provide efficient loading and storing of half-chunks (128-bit) using SSE2 intrinsics.
  2. These functions complement the existing full-chunk operations, allowing for more flexible memory handling.
  3. The implementation aligns with the previously added halfchunk_t type and HAVE_HALF_CHUNK macro.

These functions contribute to the PR's objective of optimizing chunk copying operations by providing fine-grained control over memory operations.


137-141: New halfchunk2whole function for safe half-chunk to whole-chunk conversion

The halfchunk2whole function is a valuable addition:

  1. It efficiently converts a half-chunk (128-bit) to a whole chunk (256-bit) using AVX2 intrinsics.
  2. The zero-extension approach ensures that the upper 128 bits are always cleared, preventing potential issues with uninitialized memory.
  3. The comment about appeasing memory sanitizers demonstrates thoughtful consideration for code safety and debugging tools.

This function enhances the library's flexibility in handling different chunk sizes while maintaining memory safety.


143-154: New GET_HALFCHUNK_MAG function for efficient half-chunk magnitude retrieval

The GET_HALFCHUNK_MAG function is a well-implemented addition:

  1. It complements the existing GET_CHUNK_MAG function, providing similar functionality for half-chunks (128-bit).
  2. The function efficiently uses SSE2 intrinsics for 128-bit data manipulation.
  3. It utilizes the previously defined half_rem_vals array, which now has a clear purpose in this context.
  4. The inclusion of memory unpoisoning (__msan_unpoison) demonstrates consideration for compatibility with memory sanitizers.

This function aligns well with the PR's objective of optimizing chunk copying operations by providing efficient handling of half-chunks.


Line range hint 1-155: Overall assessment of changes in chunkset_avx2.c

The modifications to chunkset_avx2.c significantly enhance the zlib-ng library's chunk handling capabilities:

  1. Introduction of half-chunk (128-bit) operations alongside existing full-chunk (256-bit) operations.
  2. New functions for efficient loading, storing, and manipulation of both full and half chunks.
  3. Improved flexibility in chunk size handling, potentially leading to better performance in various scenarios.
  4. Consistent use of AVX2 and SSE2 intrinsics for optimal performance on supported hardware.
  5. Careful consideration of memory safety and sanitizer compatibility.

These changes align well with the PR's objective of optimizing chunk copying and memory setting operations. The added flexibility and efficiency in handling different chunk sizes should contribute to improved performance in the zlib-ng library.


90-90: Updated GET_CHUNK_MAG function signature for larger distances

The modification of the dist parameter type from uint32_t to uint64_t in the GET_CHUNK_MAG function is a positive change:

  1. It allows for larger distance values, potentially supporting bigger data chunks.
  2. This change aligns with the PR objectives to optimize handling of chunk copying operations.

To ensure this change doesn't introduce any compatibility issues, let's verify that all callers of this function have been updated accordingly:

#!/bin/bash
# Search for calls to GET_CHUNK_MAG and display their context
rg --type c --context 3 "GET_CHUNK_MAG\s*\(" 

Please review the results of this search and ensure all callers are using the correct type for the dist parameter.

chunkset_tpl.h (6)

70-74: Improved handling of larger distances in GET_CHUNK_MAG

The changes to use uint64_t for dist, cpy_dist, and bytes_remaining are good improvements. This allows the function to handle larger distance values consistently, which can be beneficial for working with larger memory ranges.


113-133: Improved CHUNKMEMSET with better handling of memory overlaps

The changes to CHUNKMEMSET are well-implemented:

  1. The new function signature allows for more flexible memory operations.
  2. The added logic correctly handles cases where the source pointer is ahead of the destination.
  3. The use of memmove for overlapping memory regions is appropriate and prevents undefined behavior.
  4. The assertion on line 116 is a good safeguard against invalid input.

These improvements make the function more robust and versatile.


138-139: Efficient handling of larger distances in CHUNKMEMSET

The change to use CHUNKCOPY when dist >= sizeof(chunk_t) is a good optimization. This simplifies the logic for larger distances and likely improves performance by leveraging the specialized CHUNKCOPY function.


Line range hint 222-250: CHUNKMEMSET_SAFE updated consistently with CHUNKMEMSET changes

The modifications to CHUNKMEMSET_SAFE are well-implemented:

  1. The function signature change is consistent with the changes made to CHUNKMEMSET.
  2. The core safety checks are maintained, ensuring the function remains safe to use.
  3. The function now leverages the improved CHUNKMEMSET, which should provide better performance while maintaining safety.

These changes improve consistency and potentially enhance performance while preserving the safety guarantees of the function.


255-273: Excellent addition of CHUNKCOPY_SAFE function

The new CHUNKCOPY_SAFE function is a valuable addition to the codebase:

  1. It provides a safe way to copy chunks, with appropriate bounds checking.
  2. The function correctly handles the case where source and destination pointers are the same.
  3. The use of llabs for calculating from_dist ensures correct handling of both forward and backward copies.
  4. The safety checks before calling CHUNKMEMSET prevent potential buffer overflows.

This function enhances the safety of chunk copying operations while still leveraging the performance benefits of CHUNKMEMSET when possible.


135-135: Regarding memset usage: Performance considerations in critical code

The static analysis tool flagged the use of memset on line 135, suggesting memset_s for better buffer boundary consideration. However, in this performance-critical code:

  1. Extra bounds checks are intentionally avoided to maintain performance.
  2. The len parameter is enforced with safelen early in the function, mitigating the risk of buffer overflows.
  3. Adding additional checks here could negatively impact performance in this critical section.

Given these factors, the current use of memset is appropriate and doesn't introduce security risks in this context.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from c44806d to b2101f7 Compare October 8, 2024 03:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (9)
arch/x86/chunkset_avx2.c (5)

56-59: New constant array for half-chunk operations

The half_rem_vals array is introduced to support half-chunk operations. However, its purpose is not immediately clear from the context.

Consider adding a brief comment explaining the purpose and usage of this array to improve code readability.


78-81: Efficient implementation of chunkmemset_16

The chunkmemset_16 function efficiently uses AVX2 intrinsics to broadcast a 128-bit value across a 256-bit chunk. This implementation is well-optimized for performance.

Consider adding a brief comment explaining the purpose of this function, similar to the comments for other chunkmemset_* functions above it.


Line range hint 90-126: Improved GET_CHUNK_MAG implementation

The GET_CHUNK_MAG function has been updated to handle larger distances with uint64_t and optimized for different cases using AVX2 intrinsics. The implementation is well-structured and efficient.

Consider adding comments to explain the purpose of the different branches (when dist < 16 and when it's not) to improve code readability.


143-154: Efficient implementation of GET_HALFCHUNK_MAG

The GET_HALFCHUNK_MAG function provides an efficient implementation for getting the magnitude of half-sized chunks using SSE2 intrinsics. It follows a similar pattern to GET_CHUNK_MAG but is adapted for half-sized chunks.

For consistency with GET_CHUNK_MAG, consider using uint64_t for the dist parameter instead of uint32_t.


Line range hint 1-155: Overall improvements in chunk handling and performance

The changes in this file significantly enhance the functionality and efficiency of chunk operations:

  1. Introduction of half-chunk support with new types and functions.
  2. Optimization of existing functions using AVX2 and SSE2 intrinsics.
  3. Improved handling of larger distances in GET_CHUNK_MAG.

These improvements should lead to better performance in handling various chunk sizes and operations.

Consider adding unit tests to verify the correctness and performance of the new and modified functions, especially for edge cases and different chunk sizes.

chunkset_tpl.h (2)

91-109: LGTM with suggestions: New HALFCHUNKCOPY function

The addition of the HALFCHUNKCOPY function is a good optimization for smaller data sizes. However, consider the following improvements for robustness:

  1. Change the align variable type to unsigned to match the len parameter type:
- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1;
+ unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;
  1. Add a check to handle cases where len is smaller than align:
if (len < align) {
    align = len;
}

These changes would prevent potential issues with small len values and ensure consistent behavior across all input sizes.


138-168: LGTM: Enhanced optimization for smaller chunk sizes

The addition of HAVE_HALF_CHUNK support and the new logic for handling smaller chunk sizes is a good optimization. It provides more efficient paths for different memory layouts, particularly for smaller data sizes.

Suggestion: Consider adding inline comments or a brief documentation block explaining the rationale behind the different code paths, especially for the conditions like (dist % 2) != 0 || dist == 6. This would improve code maintainability and help future developers understand the optimizations.

inffast_tpl.h (2)

251-258: Enhanced copying strategy with safety considerations

This change introduces a more nuanced approach to copying operations, balancing safety and performance. The use of CHUNKUNROLL in the !extra_safe case potentially optimizes performance while maintaining safety.

Consider using braces for the single-line if-else statements to improve readability:

 if (!extra_safe) {
     out = CHUNKCOPY_SAFE(out, from, op, safe);
     out = CHUNKUNROLL(out, &dist, &len);
     out = CHUNKCOPY_SAFE(out, out - dist, len, safe);
 } else {
     out = chunkcopy_safe(out, from, op, safe);
     out = chunkcopy_safe(out, out - dist, len, safe);
 }

260-263: Consistent safety-based copying strategy

This change maintains consistency with the previous block, applying appropriate copying strategies based on the extra_safe condition.

For improved readability and consistency with the surrounding code, consider using braces and aligning the statements:

-if (!extra_safe)
-    out = CHUNKCOPY_SAFE(out, from, len, safe);
-else
-    out = chunkcopy_safe(out, from, len, safe);
+if (!extra_safe) {
+    out = CHUNKCOPY_SAFE(out, from, len, safe);
+} else {
+    out = chunkcopy_safe(out, from, len, safe);
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c44806d and b2101f7.

📒 Files selected for processing (13)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/arm/chunkset_neon.c (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/riscv/riscv_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (6 hunks)
  • arch/x86/chunkset_ssse3.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (6 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • arch/arm/arm_functions.h
  • arch/arm/chunkset_neon.c
  • arch/generic/generic_functions.h
  • arch/power/power_functions.h
  • arch/x86/chunkset_ssse3.c
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T03:09:54.717Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (15)
arch/riscv/riscv_functions.h (1)

16-16: Approve function signature change with verification suggestion.

The modification of chunkmemset_safe_rvv function signature from using unsigned dist to uint8_t *from aligns with the PR objectives of simplifying and optimizing chunk operations. This change allows for more direct memory access, potentially improving performance.

To ensure consistency across the codebase, please verify all calls to this function have been updated accordingly. Run the following script to check for any remaining uses of the old signature:

✅ Verification successful

Function signature change verified and approved.

The chunkmemset_safe_rvv function signature has been successfully updated to use uint8_t *from in all instances. No remaining uses of the old signature were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining uses of the old chunkmemset_safe_rvv signature

# Test: Search for function calls with the old signature
rg --type c --type cpp 'chunkmemset_safe_rvv\s*\([^,]+,\s*[^*]+,'

# Note: If this returns any results, those calls need to be updated to use the new signature

Length of output: 67


Script:

#!/bin/bash
# Description: Verify all calls to chunkmemset_safe_rvv use the new signature with uint8_t *from

# Using ast-grep to find all calls to chunkmemset_safe_rvv and check their second argument type
ast-grep --lang c --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)'
ast-grep --lang cpp --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)'

# Note: If any calls are found that do not match the new signature, they need to be updated accordingly.

Length of output: 369


Script:

#!/bin/bash
# Description: Verify all calls to chunkmemset_safe_rvv use the new signature with uint8_t *from

# Using ast-grep to find all calls to chunkmemset_safe_rvv and check their second argument type
ast-grep match --lang c --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)'
ast-grep match --lang cpp --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)'

# If ast-grep is not functioning as expected, use ripgrep with an improved regex
rg --type c --type cpp 'chunkmemset_safe_rvv\s*\([^,]+,\s*uint8_t\s*\*\s*[^,]+,'

Length of output: 576

arch/x86/chunkset_avx2.c (4)

12-12: New typedef for half-sized chunks

The addition of typedef __m128i halfchunk_t; introduces a clear type for half-sized chunks, which improves code readability and maintainability.


17-18: New chunk operations defined

The addition of HAVE_CHUNKMEMSET_16 and HAVE_HALF_CHUNK macros indicates new functionality for handling 16-byte chunks and half-sized chunks. This expands the library's capabilities for different chunk sizes.

Also applies to: 20-20


129-132: Efficient implementation of loadhalfchunk

The loadhalfchunk function provides a simple and efficient way to load half-sized chunks using SSE2 intrinsics. This aligns well with the new halfchunk_t type and enhances the library's capabilities for handling different chunk sizes.


133-136: Efficient implementation of storehalfchunk

The storehalfchunk function provides a simple and efficient way to store half-sized chunks using SSE2 intrinsics. This complements the loadhalfchunk function and enhances the library's capabilities for handling different chunk sizes.

chunkset_tpl.h (7)

70-74: LGTM: Improved support for larger memory ranges

The changes to GET_CHUNK_MAG function, including updating the dist parameter to uint64_t and adjusting internal variables, improve support for larger memory ranges. This is a good enhancement for systems dealing with large amounts of data.

Note that this change might have a slight performance impact on 32-bit systems, but it's likely a necessary trade-off for the added functionality.


113-133: LGTM: Improved CHUNKMEMSET function

The changes to the CHUNKMEMSET function significantly improve its flexibility and safety:

  1. The new signature allows for more flexible memory operations by accepting a source pointer instead of a distance.
  2. The added logic to handle cases where the source pointer is ahead of the destination pointer is a great improvement, preventing potential data corruption.
  3. The use of memmove for overlapping memory regions is correct and ensures safe memory operations.
  4. The assertion Assert(from != out, "chunkmemset cannot have a distance 0"); is a good safety check.

These changes make the function more robust and capable of handling a wider range of memory scenarios safely.


185-213: LGTM: Improved chunk handling and optimization

The changes in this section enhance the function's ability to handle different chunk sizes efficiently:

  1. The addition of HAVE_CHUNKMEMSET_16 support allows for optimized handling of 16-byte chunks.
  2. The modified main loop now processes larger chunks more efficiently, potentially improving performance for larger data sets.
  3. The addition of the rem_bytes label improves code organization and readability, making it easier to handle remaining bytes after the main loop.

These improvements should lead to better performance across various data sizes and memory alignments.


Line range hint 222-250: LGTM: Consistent updates to CHUNKMEMSET_SAFE

The changes to CHUNKMEMSET_SAFE are consistent with the updates made to CHUNKMEMSET:

  1. The function signature has been updated to match the new CHUNKMEMSET signature, maintaining consistency across the API.
  2. The function now uses CHUNKMEMSET instead of CHUNKCOPY, which is appropriate given the earlier changes and provides a more unified approach to memory operations.
  3. The existing safety checks for alignment and buffer boundaries have been preserved, ensuring that the function remains safe to use in various scenarios.

These changes improve the overall consistency and maintainability of the codebase.


255-273: LGTM: New CHUNKCOPY_SAFE function adds robustness

The addition of the CHUNKCOPY_SAFE function is a valuable improvement to the library:

  1. It provides a safe way to copy memory, handling various edge cases that could lead to buffer overflows or undefined behavior.
  2. The check for identical source and destination pointers (line 257-258) is an excellent optimization, avoiding unnecessary operations.
  3. The use of CHUNKMEMSET for the actual copying (line 272) leverages the optimizations already implemented in that function.
  4. The function correctly handles cases where the copy might exceed the safe boundary, ensuring that operations remain within bounds.

This new function enhances the safety and flexibility of memory operations in the library, making it easier for users to perform safe memory copies.


Line range hint 1-273: Overall: Significant improvements to memory operations

This update to chunkset_tpl.h brings several valuable enhancements:

  1. Improved support for larger memory ranges with the update to 64-bit distances.
  2. New optimizations for different chunk sizes, including the introduction of HALFCHUNKCOPY.
  3. Enhanced flexibility in memory operations, particularly in CHUNKMEMSET.
  4. Improved safety with the addition of CHUNKCOPY_SAFE and updates to CHUNKMEMSET_SAFE.

These changes should result in better performance across various data sizes and memory layouts while maintaining or improving safety. The code is generally well-structured and considers various edge cases.

Minor suggestions for improvement include:

  • Verifying the necessity of including <stdio.h>.
  • Adding some documentation for complex optimization paths.
  • Considering small tweaks to HALFCHUNKCOPY for improved robustness.

Overall, this is a solid improvement to the library's memory handling capabilities.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/


7-7: Verify the necessity of including <stdio.h>

The addition of #include <stdio.h> seems unnecessary for this header file, as it primarily deals with memory operations. Unless there are specific stdio functions used in other parts of the file or in dependent code, consider removing this include to minimize dependencies.

✅ Verification successful

stdio.h is not required in chunkset_tpl.h

After reviewing the codebase, chunkset_tpl.h does not utilize any functions or macros from <stdio.h>. Other files include and use <stdio.h> independently as needed. Therefore, the inclusion of <stdio.h> in chunkset_tpl.h is unnecessary and can be safely removed to reduce dependencies.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if stdio.h functions are used in this file or other files that include it
rg -t c -t cpp -t h --no-heading --no-line-number '\b(printf|scanf|f(open|close|read|write|gets|puts)|std(in|out|err))\b' chunkset_tpl.h

Length of output: 138


Script:

#!/bin/bash
# Search for any inclusion of stdio.h and its usage across the codebase
rg -t c -t cpp -t h '#include\s*<stdio\.h>' . &&
rg -t c -t cpp -t h '\b(printf|scanf|fopen|fclose|fread|fwrite|gets|puts|fprintf|sprintf|stdin|stdout|stderr)\b' .

Length of output: 39363

inffast_tpl.h (3)

241-242: Improved safety in window buffer copying

The use of CHUNKCOPY_SAFE here enhances the safety of memory operations when copying from the window buffer. This change aligns well with the PR's objective of improving the handling of chunk copying operations.


277-277: Optimized copying for overlapping regions

The use of CHUNKMEMSET here is a good optimization for cases where the source and destination may overlap. This change aligns well with the PR's objective of improving performance in chunk copying operations, especially for short distances.


Line range hint 1-324: Overall assessment of safety and performance improvements

The introduction of the extra_safe variable and the subsequent changes in copying strategies significantly enhance the safety of memory operations in this function. This approach allows for optimized performance when safe, while ensuring proper handling of potentially overlapping memory regions.

To ensure these changes don't introduce any regressions, it would be beneficial to run performance benchmarks, especially for cases involving short distances and overlapping memory regions. Additionally, consider adding or updating unit tests to cover the new code paths introduced by the extra_safe condition.

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch 3 times, most recently from 258af3b to 8df34bd Compare October 8, 2024 03:32
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
chunkset_tpl.h (1)

92-108: New HALFCHUNKCOPY function looks good, with a minor suggestion

The new HALFCHUNKCOPY function is a great addition for optimizing smaller data sizes. The implementation follows the pattern of CHUNKCOPY, which is good for consistency.

However, consider changing the alignment calculation to use unsigned instead of int32_t:

-    int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1;
+    unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;

This change would make it consistent with the len parameter type and prevent potential issues with large len values.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b2101f7 and 258af3b.

📒 Files selected for processing (13)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/arm/chunkset_neon.c (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/riscv/riscv_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (6 hunks)
  • arch/x86/chunkset_ssse3.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (6 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
  • arch/arm/arm_functions.h
  • arch/arm/chunkset_neon.c
  • arch/generic/generic_functions.h
  • arch/power/power_functions.h
  • arch/riscv/riscv_functions.h
  • arch/x86/chunkset_ssse3.c
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T03:09:54.717Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (18)
chunkset_tpl.h (7)

Line range hint 70-84: Improved flexibility with larger distance values

The changes to GET_CHUNK_MAG function are well-implemented. By updating the dist parameter and internal variables to uint64_t, the function can now handle larger distance values, increasing its flexibility without compromising the existing logic.


113-133: Improved CHUNKMEMSET with better handling of overlapping regions

The changes to CHUNKMEMSET are well-implemented. The new function signature allows for more flexible memory operations, and the added logic correctly handles cases where the source pointer is ahead of the destination. The use of memmove for overlapping regions is appropriate and ensures correct behavior in all scenarios.


138-168: Performance optimizations in CHUNKMEMSET

The new logic in CHUNKMEMSET for handling different chunk sizes and distances is well-implemented. The use of HALFCHUNKCOPY for smaller lengths and the conditional logic for different distance cases should improve performance in various scenarios. The use of goto for handling remainder bytes is acceptable in this performance-critical code.


185-213: Additional optimizations and consistent changes in CHUNKMEMSET

The changes in this section of CHUNKMEMSET are well-implemented. The addition of the dist == 16 case is consistent with the existing pattern for other distances. The minor changes to the main loop and remainder handling maintain the function's logic while potentially improving performance.


Line range hint 222-250: Consistent updates to CHUNKMEMSET_SAFE

The changes to CHUNKMEMSET_SAFE are consistent with the updates made to CHUNKMEMSET. The new function signature maintains consistency across the codebase, and the minor changes to the function body preserve the existing logic while accommodating the new parameter.


255-273: New CHUNKCOPY_SAFE function enhances safety and flexibility

The new CHUNKCOPY_SAFE function is a valuable addition to the codebase. It provides a safe way to copy chunks, preventing potential buffer overflows and handling various edge cases. The logic for different scenarios (same pointers, close pointers) is well-implemented, and the use of CHUNKMEMSET for the main copy operation leverages existing optimized code.


135-135: Regarding memset() usage

The static analysis tool flagged the use of memset() on this line, suggesting memset_s() as an alternative. However, in this performance-critical code, the use of memset() is intentional and appropriate. The function already includes necessary safety checks, and changing to memset_s() could introduce unnecessary overhead. The current implementation balances performance and safety effectively.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

arch/x86/chunkset_avx2.c (11)

9-9: Including x86_intrins.h

The inclusion of "x86_intrins.h" provides access to x86-specific intrinsics required for optimized implementations.


12-12: Defining halfchunk_t as __m128i

Defining halfchunk_t as __m128i appropriately represents half-sized chunks using 128-bit SIMD registers.


17-18: Enabling additional chunkmemset functions

By defining HAVE_CHUNKMEMSET_16 and HAVE_CHUNKMEMSET_1, you enable optimized implementations for memory setting operations with chunk sizes of 16 bytes and 1 byte.


20-20: Defining HAVE_HALF_CHUNK

Defining HAVE_HALF_CHUNK allows the use of half-chunk operations, enhancing performance for scenarios involving smaller data sizes.


56-59: Adding half_rem_vals array

The half_rem_vals array is correctly defined and used in the GET_HALFCHUNK_MAG function to calculate remainder values for half-chunk operations.


78-81: Implementing chunkmemset_16 function

The chunkmemset_16 function correctly initializes a 256-bit chunk by broadcasting a 128-bit value, optimizing memory setting for 16-byte patterns.


129-131: Implementing loadhalfchunk function

The loadhalfchunk function correctly loads data into a halfchunk_t using unaligned memory access, which is acceptable and efficient on AVX2-capable CPUs.


133-135: Implementing storehalfchunk function

The storehalfchunk function accurately stores a halfchunk_t to memory, utilizing unaligned access without performance penalties on modern CPUs.


137-141: Implementing halfchunk2whole function

The halfchunk2whole function properly zero-extends a 128-bit half chunk to a 256-bit chunk, which helps prevent issues with memory sanitizers and is appropriate for subsequent operations requiring full chunks.


143-154: Implementing GET_HALFCHUNK_MAG function

The GET_HALFCHUNK_MAG function is correctly implemented to handle half-chunk magnitude retrieval based on the dist parameter, efficiently utilizing SIMD operations for performance gains.


90-90: Updating GET_CHUNK_MAG function signature

The dist parameter type in GET_CHUNK_MAG has been updated from uint32_t to uint64_t. To prevent potential type mismatches, please ensure all calls to this function pass a uint64_t for dist.

Run the following script to verify consistency across the codebase:

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from 8df34bd to 64111bc Compare October 8, 2024 03:47
@nmoinvaz
Copy link
Copy Markdown
Member

nmoinvaz commented Oct 8, 2024

Benchmark

Dataset: silesia.tar

Develop (old) 8d10c30

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
Tool: ../zlib-ng/build-develop/Release/minigzip.exe Size: 105,472 B
Levels: 1-9
Runs: 70         Trim worst: 30

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.409%      0.958/0.972/0.983/0.007        0.440/0.446/0.454/0.004       94,127,497
 2     35.519%      1.481/1.497/1.510/0.008        0.430/0.437/0.445/0.004       75,286,322
 3     33.844%      2.004/2.030/2.047/0.011        0.411/0.418/0.425/0.004       71,735,218
 4     33.146%      2.285/2.303/2.317/0.009        0.403/0.410/0.417/0.004       70,255,223
 5     32.642%      2.515/2.533/2.547/0.009        0.399/0.404/0.410/0.003       69,187,419
 6     32.483%      2.969/2.994/3.018/0.014        0.398/0.405/0.411/0.004       68,850,776
 7     32.255%      3.937/3.970/3.991/0.015        0.398/0.405/0.411/0.004       68,366,759
 8     32.167%      6.225/6.265/6.301/0.020        0.398/0.407/0.411/0.004       68,180,762
 9     31.887%      6.856/6.909/6.944/0.022        0.392/0.397/0.401/0.002       67,586,442

 avg1  34.261%                        3.275                          0.414
 tot                               1178.868                        149.132      653,576,418

PR 64111bc

OS: Windows 10 10.0.22631 AMD64
CPU: Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
Tool: ../zlib-ng/build-develop-jesus-1802/Release/minigzip.exe Size: 107,008 B
Levels: 1-9
Runs: 70         Trim worst: 30

 Level   Comp   Comptime min/avg/max/stddev  Decomptime min/avg/max/stddev  Compressed size
 1     44.409%      0.961/0.969/0.979/0.006        0.431/0.437/0.444/0.004       94,127,497
 2     35.519%      1.481/1.495/1.507/0.008        0.424/0.432/0.439/0.004       75,286,322
 3     33.844%      2.002/2.020/2.033/0.009        0.406/0.413/0.420/0.004       71,735,218
 4     33.146%      2.286/2.312/2.325/0.009        0.399/0.403/0.408/0.003       70,255,223
 5     32.642%      2.520/2.537/2.554/0.010        0.394/0.400/0.405/0.003       69,187,419
 6     32.483%      2.968/2.992/3.005/0.010        0.392/0.397/0.402/0.003       68,850,776
 7     32.255%      3.945/3.972/3.992/0.013        0.395/0.402/0.407/0.003       68,366,759
 8     32.167%      6.239/6.265/6.290/0.015        0.395/0.401/0.408/0.004       68,180,762
 9     31.887%      6.824/6.874/6.907/0.020        0.388/0.393/0.398/0.003       67,586,442

 avg1  34.261%                        3.271                          0.409
 tot                               1177.397                        147.137      653,576,418

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from 64111bc to a2df161 Compare October 8, 2024 21:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
arch/x86/chunkset_avx2.c (4)

17-20: LGTM: New macro definitions with a minor suggestion

The addition of HAVE_CHUNKMEMSET_16, HAVE_CHUNKMEMSET_1, and HAVE_HALF_CHUNK macros correctly indicates the availability of new functions in this implementation. This is good for conditional compilation and feature detection.

Consider adding brief comments explaining the purpose of these macros, especially HAVE_HALF_CHUNK, to improve code readability.


78-81: LGTM: Efficient implementation of chunkmemset_16

The new chunkmemset_16 function efficiently sets a 256-bit chunk using a 128-bit value. The use of _mm256_broadcastsi128_si256 is an excellent choice for duplicating the 128-bit value across the 256-bit chunk.

Consider adding a brief comment explaining the purpose of this function and why unaligned load (_mm_loadu_si128) is used instead of aligned load.


129-141: LGTM: New half chunk operations

The new functions loadhalfchunk, storehalfchunk, and halfchunk2whole are well-implemented and provide essential operations for half chunks. The use of SSE intrinsics is appropriate for these 128-bit operations.

Consider adding brief comments for each function explaining their purpose and any specific considerations (e.g., the reason for zero extension in halfchunk2whole).


143-154: LGTM: Well-implemented GET_HALFCHUNK_MAG function

The new GET_HALFCHUNK_MAG function is a well-implemented half-size version of GET_CHUNK_MAG. It correctly uses 128-bit operations and the new half_rem_vals array. The use of __msan_unpoison is consistent with the full-size version, which is good for maintaining memory safety.

Consider adding a comment explaining the purpose of this function and its relationship to GET_CHUNK_MAG. Also, document the usage of half_rem_vals to improve code readability.

chunkset_tpl.h (3)

7-7: Consider removing unnecessary include

The addition of #include <stdio.h> seems unusual for a file focused on low-level memory operations. Unless there are specific stdio functions being used (which are not apparent in the provided code), consider removing this include to reduce compilation time and maintain a cleaner dependency structure.


91-109: Approved with suggestion: HALFCHUNKCOPY implementation

The addition of HALFCHUNKCOPY is a good optimization for smaller data sizes. However, consider changing the alignment calculation to use unsigned instead of int32_t to match the len parameter type:

-    int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1;
+    unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;

This change would ensure consistency with the len parameter type and prevent potential issues with large len values.


138-168: Approved with suggestion: Enhanced optimizations and edge case handling

The changes introduce several optimizations and improve edge case handling:

  1. Using CHUNKCOPY for dist >= sizeof(chunk_t) is a good optimization.
  2. The HALFCHUNKCOPY optimization for shorter lengths is beneficial.
  3. The complex logic for various distance and length scenarios is comprehensive.

However, the complexity of the logic, especially in the HAVE_HALF_CHUNK section, could benefit from additional inline comments explaining the rationale behind each condition and optimization. This would improve code maintainability and make it easier for other developers to understand the optimizations.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 258af3b and a2df161.

📒 Files selected for processing (11)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/riscv/riscv_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (5 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (5 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • arch/arm/arm_functions.h
  • arch/generic/generic_functions.h
  • arch/power/power_functions.h
  • arch/riscv/riscv_functions.h
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T03:09:54.717Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T19:37:14.998Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (5)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (7)
arch/x86/chunkset_avx2.c (2)

9-12: LGTM: New includes and type definitions

The addition of x86_intrins.h and the new halfchunk_t type definition are appropriate for the new functionality being introduced. The halfchunk_t type is correctly defined as __m128i, which is half the size of the existing chunk_t (__m256i), aligning with the purpose of handling half-sized chunks.


56-59: Please clarify the purpose of half_rem_vals

The new constant array half_rem_vals has been added, but its purpose and the significance of its values are not immediately clear from the context.

Consider adding a brief comment explaining the purpose and usage of this array. This would improve code readability and maintainability.

#!/bin/bash
# Search for references to half_rem_vals in the codebase
rg "half_rem_vals" --type c
chunkset_tpl.h (5)

73-84: Approved: Improved type handling for larger distances

The changes to use uint64_t for cpy_dist and bytes_remaining allow for handling larger distances, which is a good improvement. The added casts ensure type consistency and prevent potential warnings. The use of size_t cast for memcpy is appropriate and safe.


113-133: Approved: Improved CHUNKMEMSET flexibility and edge case handling

The changes to CHUNKMEMSET are well-thought-out:

  1. The new signature using uint8_t *from instead of unsigned dist allows for more flexible memory operations.
  2. The added handling for cases where the source pointer is ahead of the destination is a good addition for edge cases.
  3. The updated assertion Assert(from != out, ...) prevents potential issues with zero distance.

These improvements enhance the function's robustness and flexibility.


185-189: Approved: Added support for 16-byte CHUNKMEMSET optimization

The addition of the HAVE_CHUNKMEMSET_16 conditional compilation is consistent with the existing pattern for other CHUNKMEMSET optimizations. This allows for a specialized implementation for 16-byte distances, which can potentially improve performance for this specific case.


211-213: Approved: Added rem_bytes label for HAVE_HALF_CHUNK

The addition of the rem_bytes: label under the HAVE_HALF_CHUNK condition is a good optimization. It provides a jump target from the HALFCHUNKCOPY optimization section, allowing for efficient handling of remaining bytes after the main copy loop.


Line range hint 222-273: Approved: Enhanced safety in memory operations

The changes to CHUNKMEMSET_SAFE and the addition of CHUNKCOPY_SAFE significantly improve the safety of memory operations:

  1. The updated CHUNKMEMSET_SAFE signature is consistent with the CHUNKMEMSET changes, improving flexibility.
  2. The new CHUNKCOPY_SAFE function provides a robust way to copy memory, handling various edge cases:
    • It checks for self-copying (out == from).
    • It ensures the copy doesn't exceed the safe boundary.
    • It handles cases where the source and destination regions might overlap.

These changes enhance the overall safety and reliability of the memory operations in the library.

Comment on lines +113 to 114
} else {
__m128i ret_vec0 = _mm_loadu_si128((__m128i*)buf);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Missing tests for GET_CHUNK_MAG

No tests covering the GET_CHUNK_MAG function were found. Please add appropriate tests to ensure the function behaves correctly for all possible values of dist, especially at the boundary condition where dist == 16.

🔗 Analysis chain

LGTM: Simplified condition in GET_CHUNK_MAG

The simplification of the condition for the 'else' branch in GET_CHUNK_MAG is a good improvement. It reduces code complexity and assumes that all cases where dist >= 16 can be handled uniformly.

To ensure this change doesn't introduce any regressions, please verify that the function behaves correctly for all possible values of dist, especially at the boundary condition where dist == 16.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for tests or usage of GET_CHUNK_MAG
rg "GET_CHUNK_MAG" --type c

Length of output: 163


Script:

#!/bin/bash
# Search for tests or usage of GET_CHUNK_MAG
rg "GET_CHUNK_MAG" --type c

Length of output: 551


Script:

#!/bin/bash
# Search for usages of GET_CHUNK_MAG within test directories
rg "GET_CHUNK_MAG" --type c -g "test/*" -g "tests/*" -g "*_test.c"

Length of output: 66


Script:

#!/bin/bash
# Expanded search for usages of GET_CHUNK_MAG in common test directories and naming patterns
rg "GET_CHUNK_MAG" --type c -g "tests/**" -g "test/**" -g "**/*_test.c" -g "**/*_spec.c" -g "**/*_tests.c"

Length of output: 106

@KungFuJesus KungFuJesus force-pushed the chunking_experiments branch from a2df161 to 30860bb Compare October 9, 2024 00:33
@Dead2 Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Oct 15, 2024
@Dead2
Copy link
Copy Markdown
Member

Dead2 commented Oct 15, 2024

@KungFuJesus This needs a rebase

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (6)
arch/x86/chunkset_avx2.c (3)

56-59: LGTM: New constant array for half-chunk operations

The addition of the half_rem_vals array introduces precomputed values for half-chunk operations, which can potentially improve performance by avoiding runtime calculations.

However, the purpose and usage of these specific values are not immediately clear from the context.

Could you add a brief comment explaining the purpose and usage of this array? This would improve code readability and maintainability.


137-141: LGTM: New halfchunk2whole function for converting 128-bit to 256-bit chunks

The halfchunk2whole function is a useful addition:

  1. It efficiently converts a 128-bit chunk to a 256-bit chunk using the _mm256_zextsi128_si256 intrinsic.
  2. The zero-extension helps satisfy memory sanitizers by ensuring the upper 128 bits are explicitly set.
  3. This function provides a clean interface for converting between half and whole chunks.

Consider slightly rewording the comment for clarity:

/* Zero-extend to appease memory sanitizers. The upper 128 bits are
 * unlikely to be actually written or read from in most cases. */

This more explicitly states the purpose of the zero-extension and its impact.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 137-137: arch/x86/chunkset_avx2.c#L137
Added line #L137 was not covered by tests


[warning] 140-140: arch/x86/chunkset_avx2.c#L140
Added line #L140 was not covered by tests


143-154: LGTM: New GET_HALFCHUNK_MAG function for efficient half-chunk operations

The GET_HALFCHUNK_MAG function is a valuable addition:

  1. It provides functionality similar to GET_CHUNK_MAG but for 128-bit (half) chunks.
  2. The function efficiently uses AVX2 intrinsics for optimal performance.
  3. It utilizes the new half_rem_vals array for remainder calculations.

This function will allow for more efficient and flexible chunk operations, especially when dealing with smaller data sizes.

Consider adding a brief comment at the beginning of the function to explain its purpose and how it differs from GET_CHUNK_MAG. This would improve code readability and maintainability.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 143-143: arch/x86/chunkset_avx2.c#L143
Added line #L143 was not covered by tests


[warning] 145-148: arch/x86/chunkset_avx2.c#L145-L148
Added lines #L145 - L148 were not covered by tests


[warning] 151-151: arch/x86/chunkset_avx2.c#L151
Added line #L151 was not covered by tests


[warning] 153-153: arch/x86/chunkset_avx2.c#L153
Added line #L153 was not covered by tests

chunkset_tpl.h (3)

113-139: Approved changes to CHUNKMEMSET with a suggestion

The modifications to CHUNKMEMSET improve its flexibility and correctness, especially in handling overlapping memory regions. The use of memmove for such cases is appropriate.

One suggestion for improvement:

Consider adding a brief comment explaining the use of llabs for distance calculation, as it might not be immediately obvious why it's necessary. For example:

// Use llabs to handle both forward and backward memory operations
uint64_t dist = llabs(sdist);
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 113-113: chunkset_tpl.h#L113
Added line #L113 was not covered by tests


[warning] 118-122: chunkset_tpl.h#L118-L122
Added lines #L118 - L122 were not covered by tests


[warning] 131-132: chunkset_tpl.h#L131-L132
Added lines #L131 - L132 were not covered by tests


[warning] 139-139: chunkset_tpl.h#L139
Added line #L139 was not covered by tests

🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/


255-273: Approved new CHUNKCOPY_SAFE function with a suggestion

The new CHUNKCOPY_SAFE function is a valuable addition, providing a safer alternative to CHUNKCOPY. It handles various edge cases well, including when out and from are the same, and for small distances or lengths.

One suggestion for improvement:

Consider adding a brief comment explaining the use of llabs for distance calculation. For example:

// Use llabs to handle both forward and backward memory operations
uint64_t from_dist = (uint64_t)llabs(safe - from);

This comment would help clarify the rationale behind using llabs in this context.


Line range hint 1-273: Overall assessment of changes to chunkset_tpl.h

The modifications and additions to chunkset_tpl.h generally improve the flexibility, performance, and safety of the memory operations. Key improvements include:

  1. Updated function signatures to allow for more flexible memory operations.
  2. Addition of half-chunk handling for better performance with smaller data sizes.
  3. Introduction of CHUNKCOPY_SAFE for safer memory operations.
  4. Improved handling of overlapping memory regions.

Recommendations for further improvement:

  1. Increase test coverage, particularly for the new and modified functions.
  2. Add comments to explain the rationale behind some of the more complex logic, especially in the half-chunk handling section.
  3. Consider adding brief comments to explain the use of llabs in distance calculations.

Overall, these changes represent a significant improvement to the codebase. Addressing the recommendations above will further enhance the maintainability and robustness of the code.

To help with improving test coverage, consider using a code coverage tool like gcov or llvm-cov to identify specific areas that need attention. You can integrate this into your CI/CD pipeline to ensure ongoing coverage of new and modified code.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 73-74: chunkset_tpl.h#L73-L74
Added lines #L73 - L74 were not covered by tests


[warning] 79-79: chunkset_tpl.h#L79
Added line #L79 was not covered by tests


[warning] 84-84: chunkset_tpl.h#L84
Added line #L84 was not covered by tests


[warning] 92-99: chunkset_tpl.h#L92-L99
Added lines #L92 - L99 were not covered by tests


[warning] 101-105: chunkset_tpl.h#L101-L105
Added lines #L101 - L105 were not covered by tests


[warning] 107-107: chunkset_tpl.h#L107
Added line #L107 was not covered by tests


[warning] 113-113: chunkset_tpl.h#L113
Added line #L113 was not covered by tests


[warning] 118-122: chunkset_tpl.h#L118-L122
Added lines #L118 - L122 were not covered by tests


[warning] 131-132: chunkset_tpl.h#L131-L132
Added lines #L131 - L132 were not covered by tests


[warning] 139-139: chunkset_tpl.h#L139
Added line #L139 was not covered by tests


[warning] 152-152: chunkset_tpl.h#L152
Added line #L152 was not covered by tests


[warning] 157-157: chunkset_tpl.h#L157
Added line #L157 was not covered by tests


[warning] 159-161: chunkset_tpl.h#L159-L161
Added lines #L159 - L161 were not covered by tests


[warning] 164-165: chunkset_tpl.h#L164-L165
Added lines #L164 - L165 were not covered by tests

🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between a2df161 and 30860bb.

📒 Files selected for processing (14)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/arm/chunkset_neon.c (0 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/riscv/riscv_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (5 hunks)
  • arch/x86/chunkset_sse2.c (0 hunks)
  • arch/x86/chunkset_ssse3.c (0 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (5 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
💤 Files with no reviewable changes (3)
  • arch/arm/chunkset_neon.c
  • arch/x86/chunkset_sse2.c
  • arch/x86/chunkset_ssse3.c
🚧 Files skipped from review as they are similar to previous changes (9)
  • arch/arm/arm_functions.h
  • arch/generic/generic_functions.h
  • arch/power/power_functions.h
  • arch/riscv/riscv_functions.h
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T03:09:54.717Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T19:37:14.998Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:168-168
Timestamp: 2024-10-07T21:21:10.570Z
Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
chunkset_tpl.h (5)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-07T21:59:15.763Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:94-107
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 GitHub Check: codecov/patch
arch/x86/chunkset_avx2.c

[warning] 78-79: arch/x86/chunkset_avx2.c#L78-L79
Added lines #L78 - L79 were not covered by tests


[warning] 113-113: arch/x86/chunkset_avx2.c#L113
Added line #L113 was not covered by tests


[warning] 129-130: arch/x86/chunkset_avx2.c#L129-L130
Added lines #L129 - L130 were not covered by tests


[warning] 133-134: arch/x86/chunkset_avx2.c#L133-L134
Added lines #L133 - L134 were not covered by tests


[warning] 137-137: arch/x86/chunkset_avx2.c#L137
Added line #L137 was not covered by tests


[warning] 140-140: arch/x86/chunkset_avx2.c#L140
Added line #L140 was not covered by tests


[warning] 143-143: arch/x86/chunkset_avx2.c#L143
Added line #L143 was not covered by tests


[warning] 145-148: arch/x86/chunkset_avx2.c#L145-L148
Added lines #L145 - L148 were not covered by tests


[warning] 151-151: arch/x86/chunkset_avx2.c#L151
Added line #L151 was not covered by tests


[warning] 153-153: arch/x86/chunkset_avx2.c#L153
Added line #L153 was not covered by tests

chunkset_tpl.h

[warning] 73-74: chunkset_tpl.h#L73-L74
Added lines #L73 - L74 were not covered by tests


[warning] 79-79: chunkset_tpl.h#L79
Added line #L79 was not covered by tests


[warning] 84-84: chunkset_tpl.h#L84
Added line #L84 was not covered by tests


[warning] 92-99: chunkset_tpl.h#L92-L99
Added lines #L92 - L99 were not covered by tests


[warning] 101-105: chunkset_tpl.h#L101-L105
Added lines #L101 - L105 were not covered by tests


[warning] 107-107: chunkset_tpl.h#L107
Added line #L107 was not covered by tests


[warning] 113-113: chunkset_tpl.h#L113
Added line #L113 was not covered by tests


[warning] 118-122: chunkset_tpl.h#L118-L122
Added lines #L118 - L122 were not covered by tests


[warning] 131-132: chunkset_tpl.h#L131-L132
Added lines #L131 - L132 were not covered by tests


[warning] 139-139: chunkset_tpl.h#L139
Added line #L139 was not covered by tests


[warning] 152-152: chunkset_tpl.h#L152
Added line #L152 was not covered by tests


[warning] 157-157: chunkset_tpl.h#L157
Added line #L157 was not covered by tests


[warning] 159-161: chunkset_tpl.h#L159-L161
Added lines #L159 - L161 were not covered by tests


[warning] 164-165: chunkset_tpl.h#L164-L165
Added lines #L164 - L165 were not covered by tests


[warning] 187-188: chunkset_tpl.h#L187-L188
Added lines #L187 - L188 were not covered by tests


[warning] 190-190: chunkset_tpl.h#L190
Added line #L190 was not covered by tests


[warning] 192-192: chunkset_tpl.h#L192
Added line #L192 was not covered by tests


[warning] 195-198: chunkset_tpl.h#L195-L198
Added lines #L195 - L198 were not covered by tests


[warning] 212-212: chunkset_tpl.h#L212
Added line #L212 was not covered by tests


[warning] 222-222: chunkset_tpl.h#L222
Added line #L222 was not covered by tests

🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (12)
arch/x86/chunkset_avx2.c (7)

9-12: LGTM: New include and type definition for half-chunk operations

The addition of x86_intrins.h and the new halfchunk_t type are good improvements:

  1. Including x86_intrins.h likely provides access to additional x86-specific intrinsics.
  2. The halfchunk_t type, defined as __m128i, introduces support for operations on 128-bit chunks, which is half the size of the existing 256-bit chunk_t.

These changes lay the groundwork for more flexible chunk operations in the AVX2 implementation.


17-20: LGTM: New macro definitions for extended chunk operations

The addition of new macro definitions enhances the functionality of the chunk operations:

  1. HAVE_CHUNKMEMSET_16 and HAVE_CHUNKMEMSET_1 indicate support for 16-byte and 1-byte chunk memory set operations.
  2. HAVE_HALF_CHUNK aligns with the introduction of the halfchunk_t type, enabling operations on 128-bit chunks.

These macros provide compile-time information about the available chunk operations, which can be useful for conditional compilation and optimizations in other parts of the codebase.


78-81: LGTM: New chunkmemset_16 function for efficient 16-byte chunk operations

The chunkmemset_16 function is a valuable addition:

  1. It efficiently sets a 256-bit chunk using a 128-bit source.
  2. The function uses AVX2 intrinsics (_mm256_broadcastsi128_si256) for optimal performance.
  3. It aligns with the newly added HAVE_CHUNKMEMSET_16 macro.

This function will allow for more efficient memory operations when dealing with 16-byte aligned data.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 78-79: arch/x86/chunkset_avx2.c#L78-L79
Added lines #L78 - L79 were not covered by tests


129-132: LGTM: New loadhalfchunk function for efficient 128-bit chunk loading

The loadhalfchunk function is a valuable addition:

  1. It efficiently loads a 128-bit chunk using the _mm_loadu_si128 intrinsic.
  2. This function aligns with the newly added halfchunk_t type and HAVE_HALF_CHUNK macro.
  3. It provides a consistent interface for loading half-chunks, improving code readability and maintainability.

This function will allow for more efficient memory operations when dealing with 128-bit (16-byte) data chunks.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 129-130: arch/x86/chunkset_avx2.c#L129-L130
Added lines #L129 - L130 were not covered by tests


133-136: LGTM: New storehalfchunk function for efficient 128-bit chunk storing

The storehalfchunk function is a well-designed complement to loadhalfchunk:

  1. It efficiently stores a 128-bit chunk using the _mm_storeu_si128 intrinsic.
  2. This function aligns with the halfchunk_t type and completes the set of operations for half-chunks.
  3. It provides a consistent interface for storing half-chunks, improving code readability and maintainability.

This function will allow for more efficient memory operations when dealing with 128-bit (16-byte) data chunks.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 133-134: arch/x86/chunkset_avx2.c#L133-L134
Added lines #L133 - L134 were not covered by tests


Line range hint 78-153: Improve test coverage for new functions

While the new functions and modifications appear to be well-designed and valuable additions to the AVX2 chunk operations, the static analysis tool has flagged several new functions and code paths as not covered by tests. To ensure the reliability and correctness of these new features, it's important to add appropriate test cases.

Please add test cases for the following new functions and code paths:

  1. chunkmemset_16
  2. loadhalfchunk
  3. storehalfchunk
  4. halfchunk2whole
  5. GET_HALFCHUNK_MAG
  6. The new else branch in GET_CHUNK_MAG

Adding these tests will help verify the correctness of the new functionality and prevent potential regressions in future changes.

#!/bin/bash
# Search for existing tests related to chunk operations
rg --type c "test.*chunk" tests/
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 129-130: arch/x86/chunkset_avx2.c#L129-L130
Added lines #L129 - L130 were not covered by tests


[warning] 133-134: arch/x86/chunkset_avx2.c#L133-L134
Added lines #L133 - L134 were not covered by tests


[warning] 137-137: arch/x86/chunkset_avx2.c#L137
Added line #L137 was not covered by tests


[warning] 140-140: arch/x86/chunkset_avx2.c#L140
Added line #L140 was not covered by tests


[warning] 143-143: arch/x86/chunkset_avx2.c#L143
Added line #L143 was not covered by tests


[warning] 145-148: arch/x86/chunkset_avx2.c#L145-L148
Added lines #L145 - L148 were not covered by tests


[warning] 151-151: arch/x86/chunkset_avx2.c#L151
Added line #L151 was not covered by tests


[warning] 153-153: arch/x86/chunkset_avx2.c#L153
Added line #L153 was not covered by tests


113-114: LGTM: Simplified condition in GET_CHUNK_MAG

The simplification of the condition for the 'else' branch in GET_CHUNK_MAG is a good improvement. It reduces code complexity and assumes that all cases where dist >= 16 can be handled uniformly.

To ensure this change doesn't introduce any regressions, please verify that the function behaves correctly for all possible values of dist, especially at the boundary condition where dist == 16.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 113-113: arch/x86/chunkset_avx2.c#L113
Added line #L113 was not covered by tests

chunkset_tpl.h (5)

Line range hint 222-232: Approved changes to CHUNKMEMSET_SAFE

The modifications to CHUNKMEMSET_SAFE are consistent with the changes made to CHUNKMEMSET and improve the overall safety of the function. The use of MIN to ensure len doesn't exceed left is a good practice that prevents potential buffer overflows.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 222-222: chunkset_tpl.h#L222
Added line #L222 was not covered by tests


185-198: Approved changes with suggestion for improved test coverage

The addition of CHUNKMEMSET_16 and the modification to the main loop to handle larger chunks are good improvements that can enhance performance.

However, the static analysis tool indicates that some of these lines are not covered by tests. To ensure the correctness and performance of these changes, consider adding test cases that specifically target:

  1. The new CHUNKMEMSET_16 functionality.
  2. The modified main loop that handles 2 * sizeof(chunk_t) at a time.

You can use the following script to help identify relevant test files or suggest creating new ones:

#!/bin/bash
# Find test files related to CHUNKMEMSET operations
test_files=$(rg -l "CHUNKMEMSET" test/)

if [ -n "$test_files" ]; then
    echo "Existing test files that might be relevant for CHUNKMEMSET operations:"
    echo "$test_files"
    echo "Ensure these tests cover the new CHUNKMEMSET_16 and modified loop."
else
    echo "No existing test files found for CHUNKMEMSET operations. Consider creating new tests."
fi
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 187-188: chunkset_tpl.h#L187-L188
Added lines #L187 - L188 were not covered by tests


[warning] 190-190: chunkset_tpl.h#L190
Added line #L190 was not covered by tests


[warning] 192-192: chunkset_tpl.h#L192
Added line #L192 was not covered by tests


[warning] 195-198: chunkset_tpl.h#L195-L198
Added lines #L195 - L198 were not covered by tests


149-168: Approved half-chunk handling with suggestions for clarity and testing

The addition of half-chunk handling is a good optimization for smaller data sizes. However, there are a few points to consider:

  1. The rationale behind some conditions (e.g., dist == 6) is not immediately clear. Consider adding comments to explain these special cases.

  2. The static analysis tool indicates that parts of this code are not covered by tests. It's important to ensure that these new code paths are thoroughly tested.

To improve clarity and test coverage:

  1. Add comments explaining the rationale for special cases, e.g.:
// Special handling for odd distances and distance of 6
if ((dist % 2) != 0 || dist == 6) {
    // ... (add explanation here)
}
  1. Use the following script to help identify areas needing test coverage:
#!/bin/bash
# Find test files related to CHUNKMEMSET and half-chunk operations
test_files=$(rg -l "CHUNKMEMSET|HALFCHUNKCOPY|GET_HALFCHUNK_MAG" test/)

if [ -n "$test_files" ]; then
    echo "Existing test files that might be relevant:"
    echo "$test_files"
    echo "Ensure these tests cover the new half-chunk handling code paths."
else
    echo "No existing test files found for these operations. Consider creating new tests for half-chunk handling."
fi
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 152-152: chunkset_tpl.h#L152
Added line #L152 was not covered by tests


[warning] 157-157: chunkset_tpl.h#L157
Added line #L157 was not covered by tests


[warning] 159-161: chunkset_tpl.h#L159-L161
Added lines #L159 - L161 were not covered by tests


[warning] 164-165: chunkset_tpl.h#L164-L165
Added lines #L164 - L165 were not covered by tests


92-108: Approved new HALFCHUNKCOPY function with suggestions

The new HALFCHUNKCOPY function is a good addition for optimizing smaller data copies. However, there are a couple of points to consider:

  1. The alignment calculation uses int32_t, which could potentially cause issues with very large len values. Consider changing it to unsigned to match the len parameter type:
- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1;
+ unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;
  1. The static analysis tool indicates that this function is not covered by tests. Consider adding test cases to ensure its correctness and performance.

To help with adding tests, you can use the following script to find relevant test files or suggest creating a new one:

#!/bin/bash
# Find test files related to chunk copying functions
test_files=$(rg -l "CHUNKCOPY|HALFCHUNKCOPY" test/)

if [ -n "$test_files" ]; then
    echo "Existing test files that might be relevant for HALFCHUNKCOPY:"
    echo "$test_files"
else
    echo "No existing test files found for chunk copying functions. Consider creating a new test file for HALFCHUNKCOPY."
fi
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 92-99: chunkset_tpl.h#L92-L99
Added lines #L92 - L99 were not covered by tests


[warning] 101-105: chunkset_tpl.h#L101-L105
Added lines #L101 - L105 were not covered by tests


[warning] 107-107: chunkset_tpl.h#L107
Added line #L107 was not covered by tests


135-135: Acknowledge memset usage, but change not recommended

The static analysis tool flagged the use of memset on this line, suggesting memset_s as an alternative. While memset_s does provide additional safety through bounds checking, there are reasons to maintain the current implementation:

  1. memset_s is part of the C11 standard but is not widely supported across all platforms.
  2. The current usage of memset is in a controlled context where the length is known and checked.
  3. Changing to memset_s could introduce compatibility issues with older compilers or platforms.

Given these considerations, the current use of memset is acceptable. However, if broad platform support for memset_s becomes available in the future, it might be worth revisiting this decision.

To ensure the safety of the current memset usage, you can run the following script to check for any potential buffer overflow scenarios:

This script will help identify the context around memset calls and any related buffer size declarations.

🧰 Tools
🪛 ast-grep

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

Comment on lines +73 to +74
uint64_t cpy_dist;
uint64_t bytes_remaining = sizeof(chunk_t);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add test cases for CHUNKCOPY and HALFCHUNKCOPY functions

No existing test files found for CHUNKCOPY and HALFCHUNKCOPY. To ensure these critical memory operations are reliable and to prevent potential bugs, please create comprehensive test cases covering various scenarios for these functions.

🔗 Analysis chain

Approved changes with a suggestion for improved test coverage

The updates to use uint64_t for distance-related variables and the switch to memcpy are good improvements. They allow for handling larger distances and potentially improve performance and safety.

However, the static analysis tool indicates that these lines are not covered by tests. Consider adding test cases to cover these scenarios:

Also applies to: 79-84


[Additional test coverage for chunk copying functions required]

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Find test files related to GET_CHUNK_MAG function
test_files=$(rg -l "GET_CHUNK_MAG" test/)

if [ -n "$test_files" ]; then
    echo "Existing test files that might be relevant:"
    echo "$test_files"
else
    echo "No existing test files found for GET_CHUNK_MAG. Consider creating a new test file."
fi

Length of output: 600


Script:

# Find test files related to CHUNKCOPY and HALFCHUNKCOPY functions
test_files=$(rg -l "CHUNKCOPY|HALFCHUNKCOPY" test/)

if [ -n "$test_files" ]; then
    echo "Existing test files that might be relevant:"
    echo "$test_files"
else
    echo "No existing test files found for chunk copying functions. Consider creating new test cases for CHUNKCOPY and HALFCHUNKCOPY."
fi

Length of output: 327

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 73-74: chunkset_tpl.h#L73-L74
Added lines #L73 - L74 were not covered by tests

@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

Will do but probably won't get to it for a little bit.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
chunkset_tpl.h (1)

7-7: Consider removing unnecessary include

The addition of #include <stdio.h> seems unnecessary for a header file focused on memory operations. Unless there are specific stdio functions used in this file, consider removing this include to reduce dependencies and potential compilation overhead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 30860bb and d63912a.

📒 Files selected for processing (12)
  • arch/arm/arm_functions.h (1 hunks)
  • arch/generic/generic_functions.h (1 hunks)
  • arch/power/power_functions.h (1 hunks)
  • arch/riscv/riscv_functions.h (1 hunks)
  • arch/x86/chunkset_avx2.c (2 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • chunkset_tpl.h (7 hunks)
  • functable.c (1 hunks)
  • functable.h (1 hunks)
  • inffast_tpl.h (3 hunks)
  • inflate.c (1 hunks)
  • inflate_p.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • arch/arm/arm_functions.h
  • arch/generic/generic_functions.h
  • arch/power/power_functions.h
  • arch/riscv/riscv_functions.h
  • arch/x86/x86_functions.h
  • functable.c
  • functable.h
  • inffast_tpl.h
  • inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (3)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T03:09:54.717Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: arch/x86/chunkset_avx2.c:56-59
Timestamp: 2024-10-08T19:37:14.998Z
Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.
chunkset_tpl.h (1)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h

[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/

🔇 Additional comments (8)
inflate_p.h (1)

153-153: Verify the intended behavior of the safelen calculation.

The calculation of safelen has been modified to remove the + 1 at the end. This change makes the safe length calculation more precise by exactly matching the difference between the safe and out pointers.

Please verify that this change aligns with the intended behavior of the chunkcopy_safe function. If it does, consider adding a comment explaining why the exact difference is used (e.g., "Calculate the exact number of bytes that can be safely written").

To ensure this change doesn't introduce any regressions, please run the following verification script:

This script will help identify any tests, benchmarks, or documentation that might need updating due to this change.

chunkset_tpl.h (5)

73-74: LGTM: Improved type consistency for larger values

The change from uint32_t to uint64_t for cpy_dist and bytes_remaining is a good improvement. It allows for handling larger distances and sizes, consistent with the function signature change. This modification enhances the function's capability to work with bigger memory chunks.


113-133: LGTM: Improved flexibility and correct handling of overlapping memory

The changes to CHUNKMEMSET are well-implemented:

  1. The new function signature using a from pointer instead of a dist parameter allows for more flexible memory operations.
  2. The added logic correctly handles cases where from is ahead of out, which is a significant improvement in robustness.
  3. The use of memmove for overlapping memory regions is appropriate and ensures correct behavior in all cases.

These changes make the function more versatile and safer to use in various scenarios.


Line range hint 138-166: LGTM: Excellent optimization for various memory operation sizes

The changes in this section demonstrate thoughtful optimization:

  1. The updated condition dist >= sizeof(chunk_t) for using CHUNKCOPY is logical and consistent with the function's purpose.
  2. The introduction of HALF_CHUNK logic provides optimization for smaller memory operations, which can significantly improve performance in certain scenarios.
  3. The code is well-commented, clearly explaining the rationale behind these optimizations.

These changes show a commendable effort to fine-tune the function for different memory operation sizes, potentially leading to improved overall performance.


Line range hint 190-250: LGTM: Consistent updates

The changes in this section primarily focus on maintaining consistency with earlier modifications. Variable names and types have been updated appropriately, while preserving the existing optimizations and logical structure. This attention to consistency is commendable and helps maintain the overall integrity of the code.


255-273: LGTM: Excellent addition of a safe copy function

The new CHUNKCOPY_SAFE function is a valuable addition:

  1. It provides crucial safety checks for memory boundaries and overlapping regions.
  2. The function handles edge cases well, including when out == from or when the safe distance is less than sizeof(chunk_t).
  3. It efficiently uses CHUNKMEMSET for the actual copying when it's safe to do so, leveraging existing optimized code.

This function enhances the overall safety of memory operations in the library while maintaining performance where possible. Great job on improving the robustness of the codebase!

arch/x86/chunkset_avx2.c (2)

18-18: Addition of HAVE_CHUNKMEMSET_1 macro is appropriate

The introduction of #define HAVE_CHUNKMEMSET_1 adds support for 1-byte chunk memory setting. This enhancement increases the flexibility of memory operations and aligns with the existing macros for other chunk sizes.


137-140: Ensure all calls to halfchunk2whole are updated to match the new signature

The function halfchunk2whole now accepts a pointer to halfchunk_t instead of passing the value directly. Please verify that all invocations of this function have been updated accordingly to prevent potential compilation errors or unexpected behavior.

Run the following script to locate all calls to halfchunk2whole and check their usage:

✅ Verification successful

Verification Successful: All calls to halfchunk2whole have been updated to match the new signature.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to `halfchunk2whole` and display their usages.

rg --type c --no-heading --line-number "halfchunk2whole"

Length of output: 223

@Dead2 Dead2 removed the Rebase needed Please do a 'git rebase develop yourbranch' label Oct 21, 2024
Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants