Simplify inflate fast by dispatching to chunkmemset for all chunkcopy cases#1802
Conversation
WalkthroughThe changes in this pull request involve modifications to several source files related to memory chunk operations across different architectures (ARM, x86, RISC-V, POWER). Key alterations include updates to function signatures, replacing distance parameters with pointers to Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Memory
participant ChunkSet
User->>ChunkSet: Request memory operation
ChunkSet->>Memory: Perform chunk memory set
Memory-->>ChunkSet: Return success
ChunkSet-->>User: Confirm operation completed
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
8fc5dad to
2e8beb1
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (6)
functable.h (1)
Line range hint
1-30: Overall assessment of changesThe modification to the
chunkmemset_safefunction pointer signature infunctable.his consistent with the PR's objective of simplifying and optimizing chunk operations in the zlib-ng library. This change allows for more direct memory manipulation, which could lead to performance improvements, especially for short strings and overlapping memory regions as mentioned in the PR description.However, it's crucial to ensure that this change is reflected consistently throughout the codebase. Any code that calls
chunkmemset_safewill need to be updated to match the new signature. Additionally, thorough testing should be conducted to verify that this change doesn't introduce any regressions or unexpected behavior in the library's functionality.Consider adding a comment near the
chunkmemset_safefunction pointer to explain the rationale behind this change and any specific usage guidelines for the new signature. This will help future maintainers understand the design decision and use the function correctly.arch/x86/x86_functions.h (2)
36-36: LGTM. Consistent across all x86 extensions.The change to
chunkmemset_safe_avx2completes the set of modifications across all supported x86 architecture extensions (SSE2, SSSE3, AVX2). This consistency is crucial for maintaining the codebase and ensuring optimal performance across different CPU capabilities.Consider adding or updating a comprehensive test suite that covers all these
chunkmemset_safevariants to ensure they behave consistently across different x86 extensions.
11-11: Summary: Consistent optimization across x86 extensionsThe changes to
chunkmemset_safefunctions for SSE2, SSSE3, and AVX2 are consistent and align well with the PR objectives of optimizing chunk copying and memory setting operations. By changing the second parameter fromunsigned disttouint8_t *from, the functions can now work directly with memory addresses, potentially leading to more efficient operations.However, it's important to note that this change might require updates to all code that calls these functions. To ensure a smooth transition:
- Update all call sites to these functions throughout the codebase.
- Adjust any documentation or comments related to these functions to reflect the new parameter.
- Update and expand the test suite to cover these changes thoroughly.
- Consider adding a deprecation notice or wrapper functions for the old signature to ease migration if this is a public API.
To verify the impact and ensure all necessary updates have been made, please run the following script:
#!/bin/bash # Search for all calls to chunkmemset_safe functions and verify they've been updated rg --type c --type cpp "chunkmemset_safe_(sse2|ssse3|avx2)\s*\(" -A 3 # Search for any remaining uses of the old 'dist' parameter name in relation to these functions rg --type c --type cpp "chunkmemset_safe_(sse2|ssse3|avx2).*dist" # Check if there are any comments or documentation that need updating rg --type c --type cpp "chunkmemset_safe_(sse2|ssse3|avx2)" -C 5 | rg "(/\*|//|@param|@brief)"This will help identify any areas that might need further attention during this refactoring process.
Also applies to: 24-24, 36-36
arch/x86/chunkset_avx2.c (2)
56-58: Add explanatory comments forhalf_rem_valsarrayThe
half_rem_valsarray contains magic numbers whose purpose may not be immediately clear to future maintainers. Adding comments to explain how these values are derived and how they are used in the code would enhance readability and maintainability.
133-149: Consider adding unit tests forhalfchunkmemset_*functionsThe newly introduced functions
halfchunkmemset_2,halfchunkmemset_4, andhalfchunkmemset_8are critical for handling half-sized chunks. Adding unit tests for these functions would help ensure their correctness and prevent future regressions.🧰 Tools
🪛 GitHub Check: CodeQL
[notice] 133-133: Unused static function
Static function halfchunkmemset_2 is unreachable
[notice] 139-139: Unused static function
Static function halfchunkmemset_4 is unreachable
[notice] 145-145: Unused static function
Static function halfchunkmemset_8 is unreachablechunkset_tpl.h (1)
7-7: Remove unnecessary inclusion of<stdio.h>The
<stdio.h>header is included but not used in this file. Removing it can reduce compilation time and prevent potential namespace pollution.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
- arch/arm/chunkset_neon.c (0 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (5 hunks)
- arch/x86/chunkset_sse2.c (0 hunks)
- arch/x86/chunkset_ssse3.c (0 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (5 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
💤 Files with no reviewable changes (3)
- arch/arm/chunkset_neon.c
- arch/x86/chunkset_sse2.c
- arch/x86/chunkset_ssse3.c
🧰 Additional context used
🪛 GitHub Check: CodeQL
arch/x86/chunkset_avx2.c
[notice] 60-60: Unused static function
Static function chunkmemset_1 is unreachable
[notice] 133-133: Unused static function
Static function halfchunkmemset_2 is unreachable
[notice] 139-139: Unused static function
Static function halfchunkmemset_4 is unreachable
[notice] 145-145: Unused static function
Static function halfchunkmemset_8 is unreachable
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (13)
functable.h (1)
30-30: Approve change with verification neededThe modification to the
chunkmemset_safefunction pointer signature aligns with the PR objectives of simplifying and optimizing chunk operations. Changing fromunsigned disttouint8_t *fromallows for direct pointer manipulation, which can potentially improve performance.However, this change may impact other parts of the codebase that call this function.
Please run the following script to verify all calls to
chunkmemset_safehave been updated to match the new signature:If any mismatched calls are found, please update them to use the new signature.
✅ Verification successful
Change Verified Successfully
All calls to
chunkmemset_safehave been updated to match the new signature.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all calls to chunkmemset_safe match the new signature # Test: Search for function calls to chunkmemset_safe # Expect: All calls should use a pointer as the second argument, not an unsigned int rg --type c --type cpp -e 'chunkmemset_safe\s*\([^)]*\)' -g '!functable.h'Length of output: 149
arch/generic/generic_functions.h (1)
25-25: Approved: Function signature change aligns with PR objectives.The modification of
chunkmemset_safe_cfunction signature from(uint8_t *out, unsigned dist, unsigned len, unsigned left)to(uint8_t *out, uint8_t *from, unsigned len, unsigned left)aligns well with the PR's goal of optimizing chunk copying and memory setting operations. This change could potentially improve performance by directly using a pointer to the source memory instead of a distance value.To ensure this change doesn't introduce any issues, please run the following script to check for any remaining uses of the old function signature:
This script will help identify any places in the codebase where the function might still be called with the old signature, allowing you to update them accordingly.
✅ Verification successful
Verified: No remaining uses of the old
chunkmemset_safe_cfunction signature found.All references have been updated to use the new signature successfully.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining uses of the old chunkmemset_safe_c function signature # Test: Search for the old function signature rg --type c --type cpp "chunkmemset_safe_c\s*\(\s*\w+\s*,\s*unsigned\s+dist\s*," -g '!arch/generic/generic_functions.h' # Test: Search for all uses of chunkmemset_safe_c to manually verify rg --type c --type cpp "chunkmemset_safe_c" -g '!arch/generic/generic_functions.h'Length of output: 333
arch/x86/x86_functions.h (2)
24-24: LGTM. Consistent with SSE2 changes.The modification to
chunkmemset_safe_ssse3is consistent with the changes made to the SSE2 version, maintaining uniformity across different x86 architecture extensions.
11-11: LGTM. Verify impact on existing code.The change from
unsigned disttouint8_t *fromaligns with the PR objectives of optimizing chunk copying and memory setting operations. This modification allows for more direct memory access, potentially improving performance.Please run the following script to check for any unintended side effects:
✅ Verification successful
Verified: No existing calls to
chunkmemset_safe_sse2found. The change does not impact current codebase functionality.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for all calls to chunkmemset_safe_sse2 and verify they've been updated rg --type c --type cpp "chunkmemset_safe_sse2\s*\(" -A 3Length of output: 347
inflate.c (2)
1084-1084: Improved handling of overlapping memory regionsThe change from
state->offsettoput - state->offsetas the second argument in thechunkmemset_safefunction call is a good improvement. This modification calculates a relative address based on the currentputposition, which should enhance the handling of overlapping memory regions during decompression. This aligns well with the PR objective of optimizing the handling of short strings and overlapping cases.The new implementation is likely to be more robust in scenarios where the source and destination memory areas overlap, potentially reducing the number of memory reads and improving overall efficiency.
Line range hint
1-1984: Overall assessment of changes in inflate.cThe modification made to the
inflatefunction in this file is focused and aligns well with the PR objectives. The change improves the handling of overlapping memory regions during decompression, which should enhance performance for short strings and overlapping cases.No other significant changes or issues were identified in the file. The implementation appears to be sound and maintains the overall structure and logic of the zlib decompression process.
chunkset_tpl.h (2)
254-272:⚠️ Potential issueEnsure valid pointer arithmetic in
CHUNKCOPY_SAFEIn
CHUNKCOPY_SAFE, pointer arithmetic is performed withsafe,out, andfrom. Subtracting pointers that do not refer to the same array or object can lead to undefined behavior. Confirm that these pointers are within the same array or valid memory region before performing arithmetic.Run the following script to identify all calls to
CHUNKCOPY_SAFEand check the relationships between the pointers:#!/bin/bash # Description: Verify that 'safe', 'out', and 'from' pointers in CHUNKCOPY_SAFE are valid for arithmetic operations. rg --type c 'CHUNKCOPY_SAFE\(([^,]+),\s*([^,]+),\s*([^,]+),\s*([^)]+)\)' -o -r 'out=\1 from=\2 len=\3 safe=\4' | while read -r call; do echo "Call: $call" # Further static analysis can be performed to ensure pointers are in the same object done
128-133: Verify the correctness ofsdisthandling inCHUNKMEMSETThe condition
if (sdist < 0 && dist < len)handles cases wherefromis ahead ofout, andmemmoveis used to accommodate overlapping regions. Ensure that this logic correctly covers all potential overlap scenarios and does not introduce unintended side effects.Run the following script to examine all usages of
CHUNKMEMSETand check for overlapping regions:✅ Verification successful
CHUNKMEMSET
sdistHandling Verified SuccessfullyAll usages of
CHUNKMEMSETcorrectly handle overlapping memory regions with the currentsdistcondition.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all calls to CHUNKMEMSET and analyze the distance between 'out' and 'from' rg --type c --word-regexp 'CHUNKMEMSET\(([^,]+),\s*([^,]+),\s*[^)]+\)' -o -r '$0' | while read -r line; do # Extract 'out' and 'from' parameters out_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\1/') from_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\2/') echo "Call: $line" echo "out: $out_var, from: $from_var" # Additional analysis can be performed here doneLength of output: 929
Script:
#!/bin/bash # Description: Find all calls to CHUNKMEMSET and extract 'out' and 'from' parameters for analysis rg --type c --word-regexp 'CHUNKMEMSET\(([^,]+),\s*([^,]+),\s*[^)]+\)' -o -r '$0' | while read -r line; do # Extract 'out' and 'from' parameters out_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\1/') from_var=$(echo "$line" | sed -E 's/.*CHUNKMEMSET\(([^,]+),\s*([^,]+),.*/\2/') echo "Call: $line" echo "out: $out_var, from: $from_var" # Additional analysis can be performed here doneLength of output: 1980
functable.c (1)
276-278: The updated function signature is appropriateThe change from
unsigned disttouint8_t *frominchunkmemset_safe_stubaligns with the updated implementation. The stub function properly initializes the function table and delegates the call to the optimized function.inffast_tpl.h (4)
241-241: Appropriate Use ofCHUNKCOPY_SAFEto Prevent Overwriting Window DataThe use of
CHUNKCOPY_SAFEensures safe copying of data when there is a risk of overwriting the window buffer. This change maintains data integrity during decompression by leveraging optimized, architecture-specific routines.
251-258: Conditional Handling Enhances Safety and PerformanceThe conditional selection between
CHUNKCOPY_SAFEandchunkcopy_safebased on theextra_safeflag is appropriate. It ensures that optimized functions are used when it's safe, improving performance without compromising safety. This approach maintains correctness whenoutandwindowmay overlap.
260-264: Consistent Application of Safe Copy Functions Based onextra_safeUsing
CHUNKCOPY_SAFEwhenextra_safeis false andchunkcopy_safeotherwise ensures that the most efficient function is used without sacrificing safety. This consistency enhances code clarity and maintainability.
277-277: Utilization ofCHUNKMEMSETfor Optimized Memory Set OperationsThe change to use
CHUNKMEMSETleverages architecture-specific optimizations for memory setting operations. This enhancement is expected to improve performance during decompression.
chunkset_tpl.h
Outdated
| if (UNLIKELY((safe - from) < sizeof(chunk_t))) { | ||
| while (len--) { | ||
| *out++ = *from++; | ||
| } | ||
|
|
||
| return out; | ||
| } | ||
|
|
||
| return CHUNKMEMSET(out, from, len); |
There was a problem hiding this comment.
Handle overlap correctly in CHUNKCOPY_SAFE
The check if (UNLIKELY((safe - from) < sizeof(chunk_t))) in CHUNKCOPY_SAFE assumes that safe is ahead of from. If this is not the case, the subtraction could result in incorrect values or undefined behavior. Ensure that safe is always greater than or equal to from or adjust the logic to handle all possible pointer relationships.
Apply this diff to add a safety check:
uint64_t safelen = (safe - out);
len = MIN(len, safelen);
- if (UNLIKELY((safe - from) < sizeof(chunk_t))) {
+ if (UNLIKELY(safe <= from || (safe - from) < sizeof(chunk_t))) {
while (len--) {
*out++ = *from++;
}
return out;
}Committable suggestion was skipped due to low confidence.
There was a problem hiding this comment.
Safe is the marker for the end of the window. We're only trying to detect the presence of a read length from from that is greater than the allocated window, as that will cause a segmentation fault. We wrap the window after dealing with these boundary conditions such that we're still looking at the same allocation. I don't believe that from can ever be ahead of safe. But please verify that for me.
There was a problem hiding this comment.
@KungFuJesus, thank you for the clarification. I've verified that from cannot be ahead of safe since safe represents the end of the window and from is always within the bounds of the allocated window. Therefore, the subtraction safe - from is valid, and the existing logic correctly handles all possible pointer relationships. I'll make note of this for future reviews.
✏️ Learnings added
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:263-271
Timestamp: 2024-10-07T21:56:31.719Z
Learning: In the `CHUNKCOPY_SAFE` function within `chunkset_tpl.h`, `from` cannot be ahead of `safe` because `safe` marks the end of the window, and `from` is always within the allocated window bounds. Thus, the subtraction `safe - from` is valid without additional checks.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1802 +/- ##
===========================================
- Coverage 33.03% 31.29% -1.74%
===========================================
Files 66 66
Lines 5546 5538 -8
Branches 1224 1228 +4
===========================================
- Hits 1832 1733 -99
- Misses 3453 3547 +94
+ Partials 261 258 -3 ☔ View full report in Codecov by Sentry. |
|
Looks like MSVC broken. |
Those are valid warnings. I'll remedy soon. |
2e8beb1 to
d50ddbb
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
- arch/arm/chunkset_neon.c (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (6 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (6 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- arch/generic/generic_functions.h
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (1)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (9)
chunkset_tpl.h (4)
113-139: Approve CHUNKMEMSET changes with a minor suggestionThe changes to the
CHUNKMEMSETfunction significantly improve its flexibility and handling of edge cases. The use ofmemmovefor overlapping regions is correct, and the optimizations for different chunk sizes can improve performance.The changes look good overall. However, consider adding a comment explaining the
llabsfunction usage for clarity:uint64_t dist = llabs(sdist); // Use absolute value to handle both forward and backward distancesThis comment would help future maintainers understand the reasoning behind using
llabs.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
Line range hint
222-250: Approve CHUNKMEMSET_SAFE changesThe changes to the
CHUNKMEMSET_SAFEfunction are consistent with the updates toCHUNKMEMSET. The function maintains its safety checks while adapting to the new signature.The changes look good and maintain the safety guarantees of the function.
Line range hint
1-272: Overall assessment of changes in chunkset_tpl.hThe changes made to
chunkset_tpl.hsignificantly improve the flexibility and performance of chunk operations. The introduction of new functions likeHALFCHUNKCOPYandCHUNKCOPY_SAFE, along with the updates to existing functions, provide better handling of various memory scenarios.The modifications are generally well-implemented and align with the PR objectives of simplifying and optimizing chunk operations. The suggested improvements in the previous comments will further enhance the safety and robustness of these functions.
To ensure the changes don't introduce any regressions, consider adding or updating unit tests to cover the new functionality and edge cases, particularly for the new
HALFCHUNKCOPYandCHUNKCOPY_SAFEfunctions.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
255-272:⚠️ Potential issueImprove safety check in CHUNKCOPY_SAFE
The new
CHUNKCOPY_SAFEfunction provides a safe way to copy chunks, which is good. However, there's a potential issue with the safety check for overlapping regions.The current check
(safe - from) < sizeof(chunk_t)assumes thatsafeis always greater thanfrom. This might not always be true, especially iffrompoints to a location aftersafe. Consider updating the check to handle all cases:- if (UNLIKELY((safe - from) < sizeof(chunk_t))) { + if (UNLIKELY(safe <= from || (safe - from) < sizeof(chunk_t))) {This change ensures that the function handles all possible pointer relationships correctly, improving its safety.
To verify this change, we can check for any existing uses of
CHUNKCOPY_SAFEin the codebase:This will help us understand the impact of the proposed change on existing code.
arch/x86/chunkset_avx2.c (5)
78-81: LGTMThe
chunkmemset_16function correctly uses_mm256_broadcastsi128_si256to broadcast a 128-bit value across a 256-bit AVX2 register. Unaligned memory access with_mm_loadu_si128is acceptable on AVX2-capable CPUs, as per the learnings.
129-131: LGTMThe
loadhalfchunkfunction correctly loads a 128-bit half-size chunk using_mm_loadu_si128. Unaligned memory access is acceptable for AVX2-capable CPUs.
133-136: LGTMThe
storehalfchunkfunction correctly stores a 128-bit half-size chunk using_mm_storeu_si128. Unaligned memory access is acceptable.
137-142: LGTMThe
halfchunk2wholefunction correctly zero-extends ahalfchunk_tto achunk_tusing_mm256_zextsi128_si256. This approach is appropriate and helps appease memory sanitizers.
143-154:⚠️ Potential issueEnsure
distis within the valid range inGET_HALFCHUNK_MAGIn the function
GET_HALFCHUNK_MAG,distis used asdist - 3to index into arraysperm_idx_lutandhalf_rem_vals. To prevent potential out-of-bounds array accesses, ensure thatdistis always greater than or equal to 3 and does not exceed the maximum expected value based on the sizes of these arrays.Consider adding an assertion or input validation to enforce that
distis within the valid range:assert(dist >= 3 && dist <= MAX_VALID_DIST);Replace
MAX_VALID_DISTwith the appropriate maximum value based on the array sizes.⛔ Skipped due to learnings
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.
28e9c80 to
d1205a7
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
arch/x86/chunkset_avx2.c (2)
56-59: New constant array for half remainder valuesA new constant array
half_rem_valshas been added to store remainder values for half chunks. This array will be used in theGET_HALFCHUNK_MAGfunction.The values seem to follow a pattern, but it's not immediately clear why these specific values were chosen. Consider adding a comment explaining the purpose and derivation of these values for better maintainability.
78-81: New chunkmemset_16 function addedThe
chunkmemset_16function has been implemented to set a 256-bit chunk using a 128-bit value. This function uses AVX2 intrinsics to broadcast a 128-bit value across a 256-bit chunk.The implementation looks correct and efficient. However, consider adding a brief comment explaining the purpose of this function for better readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (10)
- arch/arm/chunkset_neon.c (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (6 hunks)
- arch/x86/chunkset_ssse3.c (1 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (6 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- arch/arm/chunkset_neon.c
- arch/generic/generic_functions.h
- arch/x86/chunkset_ssse3.c
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (4)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:263-271 Timestamp: 2024-10-07T21:56:31.870Z Learning: In the `CHUNKCOPY_SAFE` function within `chunkset_tpl.h`, `from` cannot be ahead of `safe` because `safe` marks the end of the window, and `from` is always within the allocated window bounds. Thus, the subtraction `safe - from` is valid without additional checks.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (13)
arch/x86/chunkset_avx2.c (5)
9-9: New type and function declarations addedThe changes introduce new types and function declarations:
halfchunk_tis defined as__m128i.HAVE_CHUNKMEMSET_16andHAVE_HALF_CHUNKmacros are added.These additions seem to enhance the functionality for handling half-sized chunks. The changes look good and align with the PR objectives.
Also applies to: 12-12, 17-18, 20-20
90-90: Updated GET_CHUNK_MAG function signatureThe
GET_CHUNK_MAGfunction signature has been updated to useuint64_tfor thedistparameter instead ofuint32_t. This change aligns with the PR objectives and improves consistency across different implementations.
113-113: Simplified conditional logic in GET_CHUNK_MAGThe conditional logic for handling cases where
distis equal to 16 has been removed. This simplification is good for code readability and performance.
129-141: New functions for handling half chunksThree new functions have been added for handling half chunks:
loadhalfchunk: Loads a 128-bit chunk from memory.storehalfchunk: Stores a 128-bit chunk to memory.halfchunk2whole: Converts a half chunk to a whole chunk by zero-extending it.These functions look well-implemented and provide necessary functionality for working with half-sized chunks. The comment in
halfchunk2wholeexplains the reason for zero-extension, which is good for maintainability.
143-154: 💡 Codebase verificationConsistency in 'dist' Parameter Type
The
distparameter inGET_HALFCHUNK_MAGis of typeuint32_t, whereas inGET_CHUNK_MAGit isuint64_t. To maintain consistency, consider changing the type inGET_HALFCHUNK_MAGtouint64_t.🔗 Analysis chain
New GET_HALFCHUNK_MAG function added
The
GET_HALFCHUNK_MAGfunction has been implemented to retrieve the magnitude of a half chunk. It uses a lookup table for remainder values and performs shuffling operations on the data.The implementation looks correct and efficient. However, there are a couple of points to consider:
- The function uses
__msan_unpoison, which is good for memory sanitization. However, ensure that this doesn't introduce any performance overhead in production builds.- The
distparameter is of typeuint32_t, while inGET_CHUNK_MAGit was changed touint64_t. Consider usinguint64_there as well for consistency.To verify the consistency of the
distparameter type, run the following script:🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check the type of 'dist' parameter in GET_HALFCHUNK_MAG and GET_CHUNK_MAG functions rg --type c --no-heading --line-number "GET_(HALF)?CHUNK_MAG.*\(.*dist\)" arch/x86/chunkset_avx2.cLength of output: 291
⛔ Skipped due to learnings
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.chunkset_tpl.h (8)
70-74: Improved handling of larger distances in GET_CHUNK_MAGThe changes to use
uint64_tfor thedistparameter and internal variables enhance the function's capability to handle larger memory ranges. This modification aligns well with the overall goal of improving flexibility in memory operations.
92-108: Efficient handling of smaller data sizes with HALFCHUNKCOPYThe addition of the
HALFCHUNKCOPYfunction is a good optimization for handling smaller data sizes. This can potentially improve performance in scenarios where full chunk copies are unnecessary. The implementation follows the established pattern and looks correct.
113-133: Enhanced flexibility and safety in CHUNKMEMSETThe modifications to
CHUNKMEMSETsignificantly improve its flexibility by accepting a source pointer instead of a distance. The addition of logic to handle cases where the source is ahead of the destination, usingmemmovefor overlapping regions, enhances the function's safety and reliability. These changes align well with the goal of improving memory manipulation functions.
138-139: Optimized dispatch to CHUNKCOPY in CHUNKMEMSETThe updated condition for using
CHUNKCOPYwhen the distance is greater than or equal to the chunk size is a good optimization. This change simplifies the logic and potentially improves performance by leveraging the specializedCHUNKCOPYfunction for larger distances.
149-168: Optimized handling for systems with half-chunk supportThe addition of the
HAVE_HALF_CHUNKconditional block provides excellent optimization for systems supporting half-chunk operations. This specialized handling for smaller memory operations can potentially improve performance in scenarios where full chunk operations are unnecessary. The implementation correctly handles various edge cases, aligning well with the overall goal of enhancing memory manipulation efficiency.
Line range hint
222-250: Consistent updates to CHUNKMEMSET_SAFEThe modifications to
CHUNKMEMSET_SAFEare consistent with the changes made toCHUNKMEMSET, now accepting a source pointer instead of a distance. The function maintains its important safety checks while incorporating the new pointer-based approach. These changes ensure that the safe version of the function remains in sync with the optimized version, maintaining both safety and performance.
255-272: Enhanced safety with new CHUNKCOPY_SAFE functionThe addition of the
CHUNKCOPY_SAFEfunction is a valuable improvement to the library's safety features. This function provides a secure alternative toCHUNKCOPY, implementing crucial checks for overlapping memory regions and buffer boundaries. The implementation correctly handles various edge cases, ensuring safe memory operations even in complex scenarios. This addition aligns perfectly with the goal of enhancing both performance and safety in memory manipulations.
135-135: Regarding memset usage: Performance considerationsThe static analysis tool flagged the use of
memsetas potentially unsafe. However, in this performance-critical section, the use ofmemsetis intentional and appropriate. The function already implements necessary safety checks, andlenis enforced withsafelenearly on. Changing tomemset_swould introduce additional overhead without providing significant safety benefits in this context.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
5b13bea to
5e41366
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
chunkset_tpl.h (3)
92-108: LGTM! Consider using unsigned for alignment calculation.The new
HALFCHUNKCOPYfunction is a great addition for optimizing operations on smaller data sizes. The implementation looks solid and follows a similar structure toCHUNKCOPY.However, there's a minor suggestion for improvement:
Consider changing the alignment calculation to use
unsignedinstead ofint32_t:- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1; + unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;This change would make the type consistent with the
lenparameter and avoid potential issues with largelenvalues.
113-133: LGTM! Consider optimizing distance calculation.The changes to
CHUNKMEMSETare well-implemented and improve the function's flexibility. The new logic to handle cases where the source pointer is ahead of the destination pointer is a great addition for safety.Consider optimizing the distance calculation:
- int64_t sdist = out - from; - uint64_t dist = llabs(sdist); + uint64_t dist = (out > from) ? (out - from) : (from - out);This change avoids the use of
llabs, which might introduce unnecessary overhead for most cases whereout > from. The suggested approach directly calculates the absolute difference without branching.
255-273: LGTM! Consider adding more detailed comments.The new
CHUNKCOPY_SAFEfunction is a valuable addition, providing a safe way to copy chunks while handling various edge cases and potential overlaps. The implementation looks solid and uses appropriate checks to ensure safety.Consider adding more detailed comments to explain the function's behavior, especially for the following aspects:
- The purpose of the
safeparameter.- The rationale behind the checks for
from_distandsafelen.- The fallback to byte-by-byte copy when conditions aren't met.
These comments would improve the function's maintainability and help future developers understand its intricacies.
arch/x86/chunkset_avx2.c (1)
143-154: Consider adding an assertion to enforcedist < 16Since
distis expected to always be less than 16 inGET_HALFCHUNK_MAG, consider adding an assertion to enforce this assumption. This can help catch unexpected usage and prevent potential undefined behavior.Example:
static inline halfchunk_t GET_HALFCHUNK_MAG(uint8_t *buf, uint32_t *chunk_rem, uint32_t dist) { + assert(dist < 16); lut_rem_pair lut_rem = perm_idx_lut[dist - 3]; __m128i perm_vec, ret_vec;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (12)
- arch/arm/arm_functions.h (1 hunks)
- arch/arm/chunkset_neon.c (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (6 hunks)
- arch/x86/chunkset_ssse3.c (1 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (6 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- arch/arm/chunkset_neon.c
- arch/generic/generic_functions.h
- arch/x86/chunkset_ssse3.c
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (13)
arch/arm/arm_functions.h (1)
11-11: Approve function signature change with verification suggestion.The modification of
chunkmemset_safe_neonfunction signature from usingunsigned disttouint8_t *fromaligns with the PR objectives of simplifying and optimizing chunk operations. This change allows for more direct memory access, potentially improving performance.However, ensure that:
- The function implementation has been updated accordingly.
- All calls to this function across the codebase have been adjusted to match the new signature.
To verify the consistent update of this function across the codebase, run the following script:
This script will help ensure that the function signature has been updated consistently across the codebase and identify any calls that might need adjustment.
✅ Verification successful
Function signature change verified successfully.
The
chunkmemset_safe_neonfunction signature has been consistently updated across the codebase. No instances of the old signature were found, and all function calls are confined toarch/arm/arm_functions.h. The change aligns with the PR objectives and poses no issues.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the consistent update of chunkmemset_safe_neon function signature # Test 1: Check for any remaining occurrences of the old function signature echo "Checking for old function signature..." rg "chunkmemset_safe_neon\s*\(\s*uint8_t\s*\*\s*out\s*,\s*unsigned\s+dist\s*," --type c # Test 2: Verify the new function signature is used consistently echo "Checking for new function signature..." rg "chunkmemset_safe_neon\s*\(\s*uint8_t\s*\*\s*out\s*,\s*uint8_t\s*\*\s*from\s*," --type c # Test 3: Check for any calls to chunkmemset_safe_neon that might need updating echo "Checking for function calls that might need updating..." rg "chunkmemset_safe_neon\s*\(" --type cLength of output: 751
arch/power/power_functions.h (1)
18-18: Approve the function signature change and verify its usage.The modification of
chunkmemset_safe_power8function signature from usingunsigned disttouint8_t *fromaligns with the PR objectives of optimizing chunk copying and memory setting operations. This change potentially improves performance by using a direct pointer instead of calculating an offset.To ensure this change doesn't introduce any issues, please verify all calls to this function across the codebase. Run the following script to find all occurrences:
Ensure that all calls to this function have been updated to pass a pointer instead of a distance value.
✅ Verification successful
Function signature change verified and approved.
The
chunkmemset_safe_power8function signature has been updated correctly. No direct function calls were found in the codebase, and the function pointer assignment infunctable.caligns with the new signature.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all occurrences of chunkmemset_safe_power8 function calls # Search for function calls echo "Searching for chunkmemset_safe_power8 function calls:" rg --type c --type cpp "chunkmemset_safe_power8\s*\(" -A 3 # Search for function pointer assignments echo "Searching for chunkmemset_safe_power8 function pointer assignments:" rg --type c --type cpp "chunkmemset_safe_power8\s*[^(]"Length of output: 832
chunkset_tpl.h (3)
135-135: Acknowledge static analysis hint about memsetThe static analysis tool suggests using
memset_s()instead ofmemset()for better security. However, in this performance-critical code, extra bounds checks are intentionally avoided to maintain optimal performance.The current use of
memset()is appropriate in this context, as the function already ensures proper bounds checking through other means (e.g., thesafelencalculation). Adding the overhead ofmemset_s()would negatively impact performance without providing significant additional safety in this specific implementation.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
Line range hint
1-273: Overall, the changes look good and improve the codebase.The modifications to
chunkset_tpl.henhance the flexibility and safety of chunk memory operations while maintaining performance. Key improvements include:
- Expanded support for larger memory ranges in
GET_CHUNK_MAG.- Addition of
HALFCHUNKCOPYfor optimizing smaller data sizes.- Improved
CHUNKMEMSETto handle more complex memory layouts.- Introduction of
CHUNKCOPY_SAFEfor safer chunk copying.These changes should positively impact the overall performance and reliability of the zlib-ng library.
🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
70-74: LGTM! Consider verifying the impact of uint64_t change.The change from
uint32_ttouint64_tfor thedistparameter and internal variables is a good improvement, allowing for larger distance values. This change enhances the function's capability to handle larger memory ranges.To ensure this change doesn't introduce any unexpected behavior in the rest of the codebase, please run the following verification script:
This will help identify any potential issues with existing code that might still be passing uint32_t values.
arch/x86/chunkset_avx2.c (8)
9-9: Inclusion ofx86_intrins.his appropriateThe inclusion of
"x86_intrins.h"is necessary for the AVX2 intrinsic functions utilized in this file.
12-12: Definition ofhalfchunk_tis appropriateDefining
halfchunk_tas__m128icorrectly establishes a type for handling 128-bit data chunks.
17-20: Declarations for chunkmemset and half-chunk support are appropriateThe added macros
HAVE_CHUNKMEMSET_16,HAVE_CHUNKMEMSET_1, andHAVE_HALF_CHUNKappropriately indicate the availability of these functions and features.
56-59: Initialization ofhalf_rem_valsarray is correctThe
half_rem_valsarray is correctly initialized with the appropriate remainder values for half-chunk operations.
78-81: Implementation ofchunkmemset_16is correctThe function
chunkmemset_16correctly uses_mm_loadu_si128to load a 128-bit value and_mm256_broadcastsi128_si256to broadcast it across a 256-bit chunk.
129-131: Implementation ofloadhalfchunkis appropriateThe function
loadhalfchunkcorrectly loads 128 bits from the source address into ahalfchunk_tusing_mm_loadu_si128.
133-135: Implementation ofstorehalfchunkis appropriateThe function
storehalfchunkcorrectly stores ahalfchunk_tinto the destination address using_mm_storeu_si128.
137-141: Zero-extension inhalfchunk2wholeis appropriateThe function
halfchunk2wholecorrectly zero-extends ahalfchunk_tto achunk_tusing_mm256_zextsi128_si256, which is suitable for combining half-sized chunks into full-sized chunks.
5e41366 to
c44806d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
chunkset_tpl.h (2)
92-108: Good addition of HALFCHUNKCOPY for smaller data sizesThe new
HALFCHUNKCOPYfunction is a good optimization for handling smaller data sizes. The implementation looks correct and follows the pattern ofCHUNKCOPY.One minor suggestion:
Consider changing thealignvariable type tounsignedto match thelenparameter type:- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1; + unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;This change would make the types consistent, although it's not critical given that
lenis known to be small and won't exceedint32_tlimits.
149-168: Good addition of half-chunk handling in CHUNKMEMSETThe new code block for handling half-chunk operations is a valuable addition:
- It provides more granular optimizations for different memory sizes.
- The conditional compilation allows for platform-specific optimizations.
- Edge cases for different distances are handled correctly.
Note: The use of
gotoon line 165 is acceptable in this context for optimization purposes, but it's worth documenting the reason for its use to prevent future misunderstandings.Consider adding a brief comment explaining the rationale behind using
gotohere, as it's generally discouraged but justified in this performance-critical code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (12)
- arch/arm/arm_functions.h (1 hunks)
- arch/arm/chunkset_neon.c (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (6 hunks)
- arch/x86/chunkset_ssse3.c (1 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (6 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- arch/arm/arm_functions.h
- arch/arm/chunkset_neon.c
- arch/generic/generic_functions.h
- arch/power/power_functions.h
- arch/x86/chunkset_ssse3.c
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (2)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (14)
arch/x86/chunkset_avx2.c (8)
9-12: New include and type definition for enhanced functionalityThe addition of "x86_intrins.h" and the new
halfchunk_ttype definition enhance the file's capabilities:
- Including "x86_intrins.h" provides access to additional x86-specific intrinsics.
- The
halfchunk_ttypedef introduces support for 128-bit chunks, complementing the existing 256-bitchunk_t.These changes lay the groundwork for more flexible chunk handling in the subsequent code.
17-20: New macro definitions for extended chunk operationsThe addition of new macro definitions enhances the file's capabilities:
HAVE_CHUNKMEMSET_16andHAVE_CHUNKMEMSET_1indicate support for 16-byte and 1-byte chunk memory set operations.HAVE_HALF_CHUNKsignifies the implementation of half-chunk (128-bit) operations.These macros provide compile-time information about the available functionality, which can be useful for conditional compilation or feature detection.
78-81: New chunkmemset_16 function for efficient 16-byte chunk settingThe
chunkmemset_16function is a valuable addition:
- It efficiently sets a 256-bit chunk using a 128-bit value.
- It utilizes AVX2 intrinsics (
_mm256_broadcastsi128_si256) for optimal performance.- The implementation aligns with the previously added
HAVE_CHUNKMEMSET_16macro.This function enhances the library's capability to handle 16-byte chunk operations efficiently.
129-136: New loadhalfchunk and storehalfchunk functions for efficient half-chunk operationsThe addition of
loadhalfchunkandstorehalfchunkfunctions enhances the library's capabilities:
- They provide efficient loading and storing of half-chunks (128-bit) using SSE2 intrinsics.
- These functions complement the existing full-chunk operations, allowing for more flexible memory handling.
- The implementation aligns with the previously added
halfchunk_ttype andHAVE_HALF_CHUNKmacro.These functions contribute to the PR's objective of optimizing chunk copying operations by providing fine-grained control over memory operations.
137-141: New halfchunk2whole function for safe half-chunk to whole-chunk conversionThe
halfchunk2wholefunction is a valuable addition:
- It efficiently converts a half-chunk (128-bit) to a whole chunk (256-bit) using AVX2 intrinsics.
- The zero-extension approach ensures that the upper 128 bits are always cleared, preventing potential issues with uninitialized memory.
- The comment about appeasing memory sanitizers demonstrates thoughtful consideration for code safety and debugging tools.
This function enhances the library's flexibility in handling different chunk sizes while maintaining memory safety.
143-154: New GET_HALFCHUNK_MAG function for efficient half-chunk magnitude retrievalThe
GET_HALFCHUNK_MAGfunction is a well-implemented addition:
- It complements the existing
GET_CHUNK_MAGfunction, providing similar functionality for half-chunks (128-bit).- The function efficiently uses SSE2 intrinsics for 128-bit data manipulation.
- It utilizes the previously defined
half_rem_valsarray, which now has a clear purpose in this context.- The inclusion of memory unpoisoning (
__msan_unpoison) demonstrates consideration for compatibility with memory sanitizers.This function aligns well with the PR's objective of optimizing chunk copying operations by providing efficient handling of half-chunks.
Line range hint
1-155: Overall assessment of changes in chunkset_avx2.cThe modifications to
chunkset_avx2.csignificantly enhance the zlib-ng library's chunk handling capabilities:
- Introduction of half-chunk (128-bit) operations alongside existing full-chunk (256-bit) operations.
- New functions for efficient loading, storing, and manipulation of both full and half chunks.
- Improved flexibility in chunk size handling, potentially leading to better performance in various scenarios.
- Consistent use of AVX2 and SSE2 intrinsics for optimal performance on supported hardware.
- Careful consideration of memory safety and sanitizer compatibility.
These changes align well with the PR's objective of optimizing chunk copying and memory setting operations. The added flexibility and efficiency in handling different chunk sizes should contribute to improved performance in the zlib-ng library.
90-90: Updated GET_CHUNK_MAG function signature for larger distancesThe modification of the
distparameter type fromuint32_ttouint64_tin theGET_CHUNK_MAGfunction is a positive change:
- It allows for larger distance values, potentially supporting bigger data chunks.
- This change aligns with the PR objectives to optimize handling of chunk copying operations.
To ensure this change doesn't introduce any compatibility issues, let's verify that all callers of this function have been updated accordingly:
#!/bin/bash # Search for calls to GET_CHUNK_MAG and display their context rg --type c --context 3 "GET_CHUNK_MAG\s*\("Please review the results of this search and ensure all callers are using the correct type for the
distparameter.chunkset_tpl.h (6)
70-74: Improved handling of larger distances in GET_CHUNK_MAGThe changes to use
uint64_tfordist,cpy_dist, andbytes_remainingare good improvements. This allows the function to handle larger distance values consistently, which can be beneficial for working with larger memory ranges.
113-133: Improved CHUNKMEMSET with better handling of memory overlapsThe changes to
CHUNKMEMSETare well-implemented:
- The new function signature allows for more flexible memory operations.
- The added logic correctly handles cases where the source pointer is ahead of the destination.
- The use of
memmovefor overlapping memory regions is appropriate and prevents undefined behavior.- The assertion on line 116 is a good safeguard against invalid input.
These improvements make the function more robust and versatile.
138-139: Efficient handling of larger distances in CHUNKMEMSETThe change to use
CHUNKCOPYwhendist >= sizeof(chunk_t)is a good optimization. This simplifies the logic for larger distances and likely improves performance by leveraging the specializedCHUNKCOPYfunction.
Line range hint
222-250: CHUNKMEMSET_SAFE updated consistently with CHUNKMEMSET changesThe modifications to
CHUNKMEMSET_SAFEare well-implemented:
- The function signature change is consistent with the changes made to
CHUNKMEMSET.- The core safety checks are maintained, ensuring the function remains safe to use.
- The function now leverages the improved
CHUNKMEMSET, which should provide better performance while maintaining safety.These changes improve consistency and potentially enhance performance while preserving the safety guarantees of the function.
255-273: Excellent addition of CHUNKCOPY_SAFE functionThe new
CHUNKCOPY_SAFEfunction is a valuable addition to the codebase:
- It provides a safe way to copy chunks, with appropriate bounds checking.
- The function correctly handles the case where source and destination pointers are the same.
- The use of
llabsfor calculatingfrom_distensures correct handling of both forward and backward copies.- The safety checks before calling
CHUNKMEMSETprevent potential buffer overflows.This function enhances the safety of chunk copying operations while still leveraging the performance benefits of
CHUNKMEMSETwhen possible.
135-135: Regarding memset usage: Performance considerations in critical codeThe static analysis tool flagged the use of
memseton line 135, suggestingmemset_sfor better buffer boundary consideration. However, in this performance-critical code:
- Extra bounds checks are intentionally avoided to maintain performance.
- The
lenparameter is enforced withsafelenearly in the function, mitigating the risk of buffer overflows.- Adding additional checks here could negatively impact performance in this critical section.
Given these factors, the current use of
memsetis appropriate and doesn't introduce security risks in this context.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
c44806d to
b2101f7
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (9)
arch/x86/chunkset_avx2.c (5)
56-59: New constant array for half-chunk operationsThe
half_rem_valsarray is introduced to support half-chunk operations. However, its purpose is not immediately clear from the context.Consider adding a brief comment explaining the purpose and usage of this array to improve code readability.
78-81: Efficient implementation ofchunkmemset_16The
chunkmemset_16function efficiently uses AVX2 intrinsics to broadcast a 128-bit value across a 256-bit chunk. This implementation is well-optimized for performance.Consider adding a brief comment explaining the purpose of this function, similar to the comments for other
chunkmemset_*functions above it.
Line range hint
90-126: ImprovedGET_CHUNK_MAGimplementationThe
GET_CHUNK_MAGfunction has been updated to handle larger distances withuint64_tand optimized for different cases using AVX2 intrinsics. The implementation is well-structured and efficient.Consider adding comments to explain the purpose of the different branches (when
dist < 16and when it's not) to improve code readability.
143-154: Efficient implementation ofGET_HALFCHUNK_MAGThe
GET_HALFCHUNK_MAGfunction provides an efficient implementation for getting the magnitude of half-sized chunks using SSE2 intrinsics. It follows a similar pattern toGET_CHUNK_MAGbut is adapted for half-sized chunks.For consistency with
GET_CHUNK_MAG, consider usinguint64_tfor thedistparameter instead ofuint32_t.
Line range hint
1-155: Overall improvements in chunk handling and performanceThe changes in this file significantly enhance the functionality and efficiency of chunk operations:
- Introduction of half-chunk support with new types and functions.
- Optimization of existing functions using AVX2 and SSE2 intrinsics.
- Improved handling of larger distances in
GET_CHUNK_MAG.These improvements should lead to better performance in handling various chunk sizes and operations.
Consider adding unit tests to verify the correctness and performance of the new and modified functions, especially for edge cases and different chunk sizes.
chunkset_tpl.h (2)
91-109: LGTM with suggestions: New HALFCHUNKCOPY functionThe addition of the
HALFCHUNKCOPYfunction is a good optimization for smaller data sizes. However, consider the following improvements for robustness:
- Change the
alignvariable type tounsignedto match thelenparameter type:- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1; + unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;
- Add a check to handle cases where
lenis smaller thanalign:if (len < align) { align = len; }These changes would prevent potential issues with small
lenvalues and ensure consistent behavior across all input sizes.
138-168: LGTM: Enhanced optimization for smaller chunk sizesThe addition of
HAVE_HALF_CHUNKsupport and the new logic for handling smaller chunk sizes is a good optimization. It provides more efficient paths for different memory layouts, particularly for smaller data sizes.Suggestion: Consider adding inline comments or a brief documentation block explaining the rationale behind the different code paths, especially for the conditions like
(dist % 2) != 0 || dist == 6. This would improve code maintainability and help future developers understand the optimizations.inffast_tpl.h (2)
251-258: Enhanced copying strategy with safety considerationsThis change introduces a more nuanced approach to copying operations, balancing safety and performance. The use of
CHUNKUNROLLin the!extra_safecase potentially optimizes performance while maintaining safety.Consider using braces for the single-line if-else statements to improve readability:
if (!extra_safe) { out = CHUNKCOPY_SAFE(out, from, op, safe); out = CHUNKUNROLL(out, &dist, &len); out = CHUNKCOPY_SAFE(out, out - dist, len, safe); } else { out = chunkcopy_safe(out, from, op, safe); out = chunkcopy_safe(out, out - dist, len, safe); }
260-263: Consistent safety-based copying strategyThis change maintains consistency with the previous block, applying appropriate copying strategies based on the
extra_safecondition.For improved readability and consistency with the surrounding code, consider using braces and aligning the statements:
-if (!extra_safe) - out = CHUNKCOPY_SAFE(out, from, len, safe); -else - out = chunkcopy_safe(out, from, len, safe); +if (!extra_safe) { + out = CHUNKCOPY_SAFE(out, from, len, safe); +} else { + out = chunkcopy_safe(out, from, len, safe); +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
- arch/arm/arm_functions.h (1 hunks)
- arch/arm/chunkset_neon.c (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/riscv/riscv_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (6 hunks)
- arch/x86/chunkset_ssse3.c (1 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (6 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- arch/arm/arm_functions.h
- arch/arm/chunkset_neon.c
- arch/generic/generic_functions.h
- arch/power/power_functions.h
- arch/x86/chunkset_ssse3.c
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T03:09:54.717Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (15)
arch/riscv/riscv_functions.h (1)
16-16: Approve function signature change with verification suggestion.The modification of
chunkmemset_safe_rvvfunction signature from usingunsigned disttouint8_t *fromaligns with the PR objectives of simplifying and optimizing chunk operations. This change allows for more direct memory access, potentially improving performance.To ensure consistency across the codebase, please verify all calls to this function have been updated accordingly. Run the following script to check for any remaining uses of the old signature:
✅ Verification successful
Function signature change verified and approved.
The
chunkmemset_safe_rvvfunction signature has been successfully updated to useuint8_t *fromin all instances. No remaining uses of the old signature were found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining uses of the old chunkmemset_safe_rvv signature # Test: Search for function calls with the old signature rg --type c --type cpp 'chunkmemset_safe_rvv\s*\([^,]+,\s*[^*]+,' # Note: If this returns any results, those calls need to be updated to use the new signatureLength of output: 67
Script:
#!/bin/bash # Description: Verify all calls to chunkmemset_safe_rvv use the new signature with uint8_t *from # Using ast-grep to find all calls to chunkmemset_safe_rvv and check their second argument type ast-grep --lang c --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)' ast-grep --lang cpp --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)' # Note: If any calls are found that do not match the new signature, they need to be updated accordingly.Length of output: 369
Script:
#!/bin/bash # Description: Verify all calls to chunkmemset_safe_rvv use the new signature with uint8_t *from # Using ast-grep to find all calls to chunkmemset_safe_rvv and check their second argument type ast-grep match --lang c --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)' ast-grep match --lang cpp --pattern='chunkmemset_safe_rvv($_, uint8_t *$_, $_, $_)' # If ast-grep is not functioning as expected, use ripgrep with an improved regex rg --type c --type cpp 'chunkmemset_safe_rvv\s*\([^,]+,\s*uint8_t\s*\*\s*[^,]+,'Length of output: 576
arch/x86/chunkset_avx2.c (4)
12-12: New typedef for half-sized chunksThe addition of
typedef __m128i halfchunk_t;introduces a clear type for half-sized chunks, which improves code readability and maintainability.
17-18: New chunk operations definedThe addition of
HAVE_CHUNKMEMSET_16andHAVE_HALF_CHUNKmacros indicates new functionality for handling 16-byte chunks and half-sized chunks. This expands the library's capabilities for different chunk sizes.Also applies to: 20-20
129-132: Efficient implementation ofloadhalfchunkThe
loadhalfchunkfunction provides a simple and efficient way to load half-sized chunks using SSE2 intrinsics. This aligns well with the newhalfchunk_ttype and enhances the library's capabilities for handling different chunk sizes.
133-136: Efficient implementation ofstorehalfchunkThe
storehalfchunkfunction provides a simple and efficient way to store half-sized chunks using SSE2 intrinsics. This complements theloadhalfchunkfunction and enhances the library's capabilities for handling different chunk sizes.chunkset_tpl.h (7)
70-74: LGTM: Improved support for larger memory rangesThe changes to
GET_CHUNK_MAGfunction, including updating thedistparameter touint64_tand adjusting internal variables, improve support for larger memory ranges. This is a good enhancement for systems dealing with large amounts of data.Note that this change might have a slight performance impact on 32-bit systems, but it's likely a necessary trade-off for the added functionality.
113-133: LGTM: Improved CHUNKMEMSET functionThe changes to the
CHUNKMEMSETfunction significantly improve its flexibility and safety:
- The new signature allows for more flexible memory operations by accepting a source pointer instead of a distance.
- The added logic to handle cases where the source pointer is ahead of the destination pointer is a great improvement, preventing potential data corruption.
- The use of
memmovefor overlapping memory regions is correct and ensures safe memory operations.- The assertion
Assert(from != out, "chunkmemset cannot have a distance 0");is a good safety check.These changes make the function more robust and capable of handling a wider range of memory scenarios safely.
185-213: LGTM: Improved chunk handling and optimizationThe changes in this section enhance the function's ability to handle different chunk sizes efficiently:
- The addition of
HAVE_CHUNKMEMSET_16support allows for optimized handling of 16-byte chunks.- The modified main loop now processes larger chunks more efficiently, potentially improving performance for larger data sets.
- The addition of the
rem_byteslabel improves code organization and readability, making it easier to handle remaining bytes after the main loop.These improvements should lead to better performance across various data sizes and memory alignments.
Line range hint
222-250: LGTM: Consistent updates to CHUNKMEMSET_SAFEThe changes to
CHUNKMEMSET_SAFEare consistent with the updates made toCHUNKMEMSET:
- The function signature has been updated to match the new
CHUNKMEMSETsignature, maintaining consistency across the API.- The function now uses
CHUNKMEMSETinstead ofCHUNKCOPY, which is appropriate given the earlier changes and provides a more unified approach to memory operations.- The existing safety checks for alignment and buffer boundaries have been preserved, ensuring that the function remains safe to use in various scenarios.
These changes improve the overall consistency and maintainability of the codebase.
255-273: LGTM: New CHUNKCOPY_SAFE function adds robustnessThe addition of the
CHUNKCOPY_SAFEfunction is a valuable improvement to the library:
- It provides a safe way to copy memory, handling various edge cases that could lead to buffer overflows or undefined behavior.
- The check for identical source and destination pointers (line 257-258) is an excellent optimization, avoiding unnecessary operations.
- The use of
CHUNKMEMSETfor the actual copying (line 272) leverages the optimizations already implemented in that function.- The function correctly handles cases where the copy might exceed the safe boundary, ensuring that operations remain within bounds.
This new function enhances the safety and flexibility of memory operations in the library, making it easier for users to perform safe memory copies.
Line range hint
1-273: Overall: Significant improvements to memory operationsThis update to
chunkset_tpl.hbrings several valuable enhancements:
- Improved support for larger memory ranges with the update to 64-bit distances.
- New optimizations for different chunk sizes, including the introduction of
HALFCHUNKCOPY.- Enhanced flexibility in memory operations, particularly in
CHUNKMEMSET.- Improved safety with the addition of
CHUNKCOPY_SAFEand updates toCHUNKMEMSET_SAFE.These changes should result in better performance across various data sizes and memory layouts while maintaining or improving safety. The code is generally well-structured and considers various edge cases.
Minor suggestions for improvement include:
- Verifying the necessity of including
<stdio.h>.- Adding some documentation for complex optimization paths.
- Considering small tweaks to
HALFCHUNKCOPYfor improved robustness.Overall, this is a solid improvement to the library's memory handling capabilities.
🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
7-7: Verify the necessity of including <stdio.h>The addition of
#include <stdio.h>seems unnecessary for this header file, as it primarily deals with memory operations. Unless there are specific stdio functions used in other parts of the file or in dependent code, consider removing this include to minimize dependencies.✅ Verification successful
stdio.h is not required in chunkset_tpl.h
After reviewing the codebase,
chunkset_tpl.hdoes not utilize any functions or macros from<stdio.h>. Other files include and use<stdio.h>independently as needed. Therefore, the inclusion of<stdio.h>inchunkset_tpl.his unnecessary and can be safely removed to reduce dependencies.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if stdio.h functions are used in this file or other files that include it rg -t c -t cpp -t h --no-heading --no-line-number '\b(printf|scanf|f(open|close|read|write|gets|puts)|std(in|out|err))\b' chunkset_tpl.hLength of output: 138
Script:
#!/bin/bash # Search for any inclusion of stdio.h and its usage across the codebase rg -t c -t cpp -t h '#include\s*<stdio\.h>' . && rg -t c -t cpp -t h '\b(printf|scanf|fopen|fclose|fread|fwrite|gets|puts|fprintf|sprintf|stdin|stdout|stderr)\b' .Length of output: 39363
inffast_tpl.h (3)
241-242: Improved safety in window buffer copyingThe use of
CHUNKCOPY_SAFEhere enhances the safety of memory operations when copying from the window buffer. This change aligns well with the PR's objective of improving the handling of chunk copying operations.
277-277: Optimized copying for overlapping regionsThe use of
CHUNKMEMSEThere is a good optimization for cases where the source and destination may overlap. This change aligns well with the PR's objective of improving performance in chunk copying operations, especially for short distances.
Line range hint
1-324: Overall assessment of safety and performance improvementsThe introduction of the
extra_safevariable and the subsequent changes in copying strategies significantly enhance the safety of memory operations in this function. This approach allows for optimized performance when safe, while ensuring proper handling of potentially overlapping memory regions.To ensure these changes don't introduce any regressions, it would be beneficial to run performance benchmarks, especially for cases involving short distances and overlapping memory regions. Additionally, consider adding or updating unit tests to cover the new code paths introduced by the
extra_safecondition.
258af3b to
8df34bd
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
chunkset_tpl.h (1)
92-108: New HALFCHUNKCOPY function looks good, with a minor suggestionThe new
HALFCHUNKCOPYfunction is a great addition for optimizing smaller data sizes. The implementation follows the pattern ofCHUNKCOPY, which is good for consistency.However, consider changing the alignment calculation to use
unsignedinstead ofint32_t:- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1; + unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;This change would make it consistent with the
lenparameter type and prevent potential issues with largelenvalues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
- arch/arm/arm_functions.h (1 hunks)
- arch/arm/chunkset_neon.c (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/riscv/riscv_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (6 hunks)
- arch/x86/chunkset_ssse3.c (1 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (6 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (11)
- arch/arm/arm_functions.h
- arch/arm/chunkset_neon.c
- arch/generic/generic_functions.h
- arch/power/power_functions.h
- arch/riscv/riscv_functions.h
- arch/x86/chunkset_ssse3.c
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T03:09:54.717Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (18)
chunkset_tpl.h (7)
Line range hint
70-84: Improved flexibility with larger distance valuesThe changes to
GET_CHUNK_MAGfunction are well-implemented. By updating thedistparameter and internal variables touint64_t, the function can now handle larger distance values, increasing its flexibility without compromising the existing logic.
113-133: Improved CHUNKMEMSET with better handling of overlapping regionsThe changes to
CHUNKMEMSETare well-implemented. The new function signature allows for more flexible memory operations, and the added logic correctly handles cases where the source pointer is ahead of the destination. The use ofmemmovefor overlapping regions is appropriate and ensures correct behavior in all scenarios.
138-168: Performance optimizations in CHUNKMEMSETThe new logic in
CHUNKMEMSETfor handling different chunk sizes and distances is well-implemented. The use ofHALFCHUNKCOPYfor smaller lengths and the conditional logic for different distance cases should improve performance in various scenarios. The use ofgotofor handling remainder bytes is acceptable in this performance-critical code.
185-213: Additional optimizations and consistent changes in CHUNKMEMSETThe changes in this section of
CHUNKMEMSETare well-implemented. The addition of thedist == 16case is consistent with the existing pattern for other distances. The minor changes to the main loop and remainder handling maintain the function's logic while potentially improving performance.
Line range hint
222-250: Consistent updates to CHUNKMEMSET_SAFEThe changes to
CHUNKMEMSET_SAFEare consistent with the updates made toCHUNKMEMSET. The new function signature maintains consistency across the codebase, and the minor changes to the function body preserve the existing logic while accommodating the new parameter.
255-273: New CHUNKCOPY_SAFE function enhances safety and flexibilityThe new
CHUNKCOPY_SAFEfunction is a valuable addition to the codebase. It provides a safe way to copy chunks, preventing potential buffer overflows and handling various edge cases. The logic for different scenarios (same pointers, close pointers) is well-implemented, and the use ofCHUNKMEMSETfor the main copy operation leverages existing optimized code.
135-135: Regarding memset() usageThe static analysis tool flagged the use of
memset()on this line, suggestingmemset_s()as an alternative. However, in this performance-critical code, the use ofmemset()is intentional and appropriate. The function already includes necessary safety checks, and changing tomemset_s()could introduce unnecessary overhead. The current implementation balances performance and safety effectively.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/arch/x86/chunkset_avx2.c (11)
9-9: Includingx86_intrins.hThe inclusion of
"x86_intrins.h"provides access to x86-specific intrinsics required for optimized implementations.
12-12: Defininghalfchunk_tas__m128iDefining
halfchunk_tas__m128iappropriately represents half-sized chunks using 128-bit SIMD registers.
17-18: Enabling additionalchunkmemsetfunctionsBy defining
HAVE_CHUNKMEMSET_16andHAVE_CHUNKMEMSET_1, you enable optimized implementations for memory setting operations with chunk sizes of 16 bytes and 1 byte.
20-20: DefiningHAVE_HALF_CHUNKDefining
HAVE_HALF_CHUNKallows the use of half-chunk operations, enhancing performance for scenarios involving smaller data sizes.
56-59: Addinghalf_rem_valsarrayThe
half_rem_valsarray is correctly defined and used in theGET_HALFCHUNK_MAGfunction to calculate remainder values for half-chunk operations.
78-81: Implementingchunkmemset_16functionThe
chunkmemset_16function correctly initializes a 256-bit chunk by broadcasting a 128-bit value, optimizing memory setting for 16-byte patterns.
129-131: ImplementingloadhalfchunkfunctionThe
loadhalfchunkfunction correctly loads data into ahalfchunk_tusing unaligned memory access, which is acceptable and efficient on AVX2-capable CPUs.
133-135: ImplementingstorehalfchunkfunctionThe
storehalfchunkfunction accurately stores ahalfchunk_tto memory, utilizing unaligned access without performance penalties on modern CPUs.
137-141: Implementinghalfchunk2wholefunctionThe
halfchunk2wholefunction properly zero-extends a 128-bit half chunk to a 256-bit chunk, which helps prevent issues with memory sanitizers and is appropriate for subsequent operations requiring full chunks.
143-154: ImplementingGET_HALFCHUNK_MAGfunctionThe
GET_HALFCHUNK_MAGfunction is correctly implemented to handle half-chunk magnitude retrieval based on thedistparameter, efficiently utilizing SIMD operations for performance gains.
90-90: UpdatingGET_CHUNK_MAGfunction signatureThe
distparameter type inGET_CHUNK_MAGhas been updated fromuint32_ttouint64_t. To prevent potential type mismatches, please ensure all calls to this function pass auint64_tfordist.Run the following script to verify consistency across the codebase:
8df34bd to
64111bc
Compare
BenchmarkDataset: silesia.tar Develop (old) 8d10c30PR 64111bc |
64111bc to
a2df161
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (7)
arch/x86/chunkset_avx2.c (4)
17-20: LGTM: New macro definitions with a minor suggestionThe addition of
HAVE_CHUNKMEMSET_16,HAVE_CHUNKMEMSET_1, andHAVE_HALF_CHUNKmacros correctly indicates the availability of new functions in this implementation. This is good for conditional compilation and feature detection.Consider adding brief comments explaining the purpose of these macros, especially
HAVE_HALF_CHUNK, to improve code readability.
78-81: LGTM: Efficient implementation ofchunkmemset_16The new
chunkmemset_16function efficiently sets a 256-bit chunk using a 128-bit value. The use of_mm256_broadcastsi128_si256is an excellent choice for duplicating the 128-bit value across the 256-bit chunk.Consider adding a brief comment explaining the purpose of this function and why unaligned load (
_mm_loadu_si128) is used instead of aligned load.
129-141: LGTM: New half chunk operationsThe new functions
loadhalfchunk,storehalfchunk, andhalfchunk2wholeare well-implemented and provide essential operations for half chunks. The use of SSE intrinsics is appropriate for these 128-bit operations.Consider adding brief comments for each function explaining their purpose and any specific considerations (e.g., the reason for zero extension in
halfchunk2whole).
143-154: LGTM: Well-implementedGET_HALFCHUNK_MAGfunctionThe new
GET_HALFCHUNK_MAGfunction is a well-implemented half-size version ofGET_CHUNK_MAG. It correctly uses 128-bit operations and the newhalf_rem_valsarray. The use of__msan_unpoisonis consistent with the full-size version, which is good for maintaining memory safety.Consider adding a comment explaining the purpose of this function and its relationship to
GET_CHUNK_MAG. Also, document the usage ofhalf_rem_valsto improve code readability.chunkset_tpl.h (3)
7-7: Consider removing unnecessary includeThe addition of
#include <stdio.h>seems unusual for a file focused on low-level memory operations. Unless there are specific stdio functions being used (which are not apparent in the provided code), consider removing this include to reduce compilation time and maintain a cleaner dependency structure.
91-109: Approved with suggestion: HALFCHUNKCOPY implementationThe addition of
HALFCHUNKCOPYis a good optimization for smaller data sizes. However, consider changing the alignment calculation to useunsignedinstead ofint32_tto match thelenparameter type:- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1; + unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;This change would ensure consistency with the
lenparameter type and prevent potential issues with largelenvalues.
138-168: Approved with suggestion: Enhanced optimizations and edge case handlingThe changes introduce several optimizations and improve edge case handling:
- Using
CHUNKCOPYfordist >= sizeof(chunk_t)is a good optimization.- The
HALFCHUNKCOPYoptimization for shorter lengths is beneficial.- The complex logic for various distance and length scenarios is comprehensive.
However, the complexity of the logic, especially in the
HAVE_HALF_CHUNKsection, could benefit from additional inline comments explaining the rationale behind each condition and optimization. This would improve code maintainability and make it easier for other developers to understand the optimizations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
- arch/arm/arm_functions.h (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/riscv/riscv_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (5 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (5 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- arch/arm/arm_functions.h
- arch/generic/generic_functions.h
- arch/power/power_functions.h
- arch/riscv/riscv_functions.h
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (7)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T03:09:54.717Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T19:37:14.998Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1778 File: arch/x86/chunkset_avx2.c:160-171 Timestamp: 2024-10-08T21:51:45.330Z Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-08T19:37:14.998Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-08T19:37:14.998Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (5)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-08T19:37:14.998Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-08T19:37:14.998Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (7)
arch/x86/chunkset_avx2.c (2)
9-12: LGTM: New includes and type definitionsThe addition of
x86_intrins.hand the newhalfchunk_ttype definition are appropriate for the new functionality being introduced. Thehalfchunk_ttype is correctly defined as__m128i, which is half the size of the existingchunk_t(__m256i), aligning with the purpose of handling half-sized chunks.
56-59: Please clarify the purpose ofhalf_rem_valsThe new constant array
half_rem_valshas been added, but its purpose and the significance of its values are not immediately clear from the context.Consider adding a brief comment explaining the purpose and usage of this array. This would improve code readability and maintainability.
#!/bin/bash # Search for references to half_rem_vals in the codebase rg "half_rem_vals" --type cchunkset_tpl.h (5)
73-84: Approved: Improved type handling for larger distancesThe changes to use
uint64_tforcpy_distandbytes_remainingallow for handling larger distances, which is a good improvement. The added casts ensure type consistency and prevent potential warnings. The use ofsize_tcast formemcpyis appropriate and safe.
113-133: Approved: Improved CHUNKMEMSET flexibility and edge case handlingThe changes to
CHUNKMEMSETare well-thought-out:
- The new signature using
uint8_t *frominstead ofunsigned distallows for more flexible memory operations.- The added handling for cases where the source pointer is ahead of the destination is a good addition for edge cases.
- The updated assertion
Assert(from != out, ...)prevents potential issues with zero distance.These improvements enhance the function's robustness and flexibility.
185-189: Approved: Added support for 16-byte CHUNKMEMSET optimizationThe addition of the
HAVE_CHUNKMEMSET_16conditional compilation is consistent with the existing pattern for otherCHUNKMEMSEToptimizations. This allows for a specialized implementation for 16-byte distances, which can potentially improve performance for this specific case.
211-213: Approved: Added rem_bytes label for HAVE_HALF_CHUNKThe addition of the
rem_bytes:label under theHAVE_HALF_CHUNKcondition is a good optimization. It provides a jump target from theHALFCHUNKCOPYoptimization section, allowing for efficient handling of remaining bytes after the main copy loop.
Line range hint
222-273: Approved: Enhanced safety in memory operationsThe changes to
CHUNKMEMSET_SAFEand the addition ofCHUNKCOPY_SAFEsignificantly improve the safety of memory operations:
- The updated
CHUNKMEMSET_SAFEsignature is consistent with theCHUNKMEMSETchanges, improving flexibility.- The new
CHUNKCOPY_SAFEfunction provides a robust way to copy memory, handling various edge cases:
- It checks for self-copying (
out == from).- It ensures the copy doesn't exceed the safe boundary.
- It handles cases where the source and destination regions might overlap.
These changes enhance the overall safety and reliability of the memory operations in the library.
| } else { | ||
| __m128i ret_vec0 = _mm_loadu_si128((__m128i*)buf); |
There was a problem hiding this comment.
💡 Codebase verification
Missing tests for GET_CHUNK_MAG
No tests covering the GET_CHUNK_MAG function were found. Please add appropriate tests to ensure the function behaves correctly for all possible values of dist, especially at the boundary condition where dist == 16.
🔗 Analysis chain
LGTM: Simplified condition in GET_CHUNK_MAG
The simplification of the condition for the 'else' branch in GET_CHUNK_MAG is a good improvement. It reduces code complexity and assumes that all cases where dist >= 16 can be handled uniformly.
To ensure this change doesn't introduce any regressions, please verify that the function behaves correctly for all possible values of dist, especially at the boundary condition where dist == 16.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for tests or usage of GET_CHUNK_MAG
rg "GET_CHUNK_MAG" --type c
Length of output: 163
Script:
#!/bin/bash
# Search for tests or usage of GET_CHUNK_MAG
rg "GET_CHUNK_MAG" --type c
Length of output: 551
Script:
#!/bin/bash
# Search for usages of GET_CHUNK_MAG within test directories
rg "GET_CHUNK_MAG" --type c -g "test/*" -g "tests/*" -g "*_test.c"
Length of output: 66
Script:
#!/bin/bash
# Expanded search for usages of GET_CHUNK_MAG in common test directories and naming patterns
rg "GET_CHUNK_MAG" --type c -g "tests/**" -g "test/**" -g "**/*_test.c" -g "**/*_spec.c" -g "**/*_tests.c"
Length of output: 106
a2df161 to
30860bb
Compare
|
@KungFuJesus This needs a rebase |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (6)
arch/x86/chunkset_avx2.c (3)
56-59: LGTM: New constant array for half-chunk operationsThe addition of the
half_rem_valsarray introduces precomputed values for half-chunk operations, which can potentially improve performance by avoiding runtime calculations.However, the purpose and usage of these specific values are not immediately clear from the context.
Could you add a brief comment explaining the purpose and usage of this array? This would improve code readability and maintainability.
137-141: LGTM: New halfchunk2whole function for converting 128-bit to 256-bit chunksThe
halfchunk2wholefunction is a useful addition:
- It efficiently converts a 128-bit chunk to a 256-bit chunk using the
_mm256_zextsi128_si256intrinsic.- The zero-extension helps satisfy memory sanitizers by ensuring the upper 128 bits are explicitly set.
- This function provides a clean interface for converting between half and whole chunks.
Consider slightly rewording the comment for clarity:
/* Zero-extend to appease memory sanitizers. The upper 128 bits are * unlikely to be actually written or read from in most cases. */This more explicitly states the purpose of the zero-extension and its impact.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 137-137: arch/x86/chunkset_avx2.c#L137
Added line #L137 was not covered by tests
[warning] 140-140: arch/x86/chunkset_avx2.c#L140
Added line #L140 was not covered by tests
143-154: LGTM: New GET_HALFCHUNK_MAG function for efficient half-chunk operationsThe
GET_HALFCHUNK_MAGfunction is a valuable addition:
- It provides functionality similar to
GET_CHUNK_MAGbut for 128-bit (half) chunks.- The function efficiently uses AVX2 intrinsics for optimal performance.
- It utilizes the new
half_rem_valsarray for remainder calculations.This function will allow for more efficient and flexible chunk operations, especially when dealing with smaller data sizes.
Consider adding a brief comment at the beginning of the function to explain its purpose and how it differs from
GET_CHUNK_MAG. This would improve code readability and maintainability.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 143-143: arch/x86/chunkset_avx2.c#L143
Added line #L143 was not covered by tests
[warning] 145-148: arch/x86/chunkset_avx2.c#L145-L148
Added lines #L145 - L148 were not covered by tests
[warning] 151-151: arch/x86/chunkset_avx2.c#L151
Added line #L151 was not covered by tests
[warning] 153-153: arch/x86/chunkset_avx2.c#L153
Added line #L153 was not covered by testschunkset_tpl.h (3)
113-139: Approved changes to CHUNKMEMSET with a suggestionThe modifications to
CHUNKMEMSETimprove its flexibility and correctness, especially in handling overlapping memory regions. The use ofmemmovefor such cases is appropriate.One suggestion for improvement:
Consider adding a brief comment explaining the use of
llabsfor distance calculation, as it might not be immediately obvious why it's necessary. For example:// Use llabs to handle both forward and backward memory operations uint64_t dist = llabs(sdist);🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 113-113: chunkset_tpl.h#L113
Added line #L113 was not covered by tests
[warning] 118-122: chunkset_tpl.h#L118-L122
Added lines #L118 - L122 were not covered by tests
[warning] 131-132: chunkset_tpl.h#L131-L132
Added lines #L131 - L132 were not covered by tests
[warning] 139-139: chunkset_tpl.h#L139
Added line #L139 was not covered by tests🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
255-273: Approved new CHUNKCOPY_SAFE function with a suggestionThe new
CHUNKCOPY_SAFEfunction is a valuable addition, providing a safer alternative toCHUNKCOPY. It handles various edge cases well, including whenoutandfromare the same, and for small distances or lengths.One suggestion for improvement:
Consider adding a brief comment explaining the use of
llabsfor distance calculation. For example:// Use llabs to handle both forward and backward memory operations uint64_t from_dist = (uint64_t)llabs(safe - from);This comment would help clarify the rationale behind using
llabsin this context.
Line range hint
1-273: Overall assessment of changes to chunkset_tpl.hThe modifications and additions to
chunkset_tpl.hgenerally improve the flexibility, performance, and safety of the memory operations. Key improvements include:
- Updated function signatures to allow for more flexible memory operations.
- Addition of half-chunk handling for better performance with smaller data sizes.
- Introduction of
CHUNKCOPY_SAFEfor safer memory operations.- Improved handling of overlapping memory regions.
Recommendations for further improvement:
- Increase test coverage, particularly for the new and modified functions.
- Add comments to explain the rationale behind some of the more complex logic, especially in the half-chunk handling section.
- Consider adding brief comments to explain the use of
llabsin distance calculations.Overall, these changes represent a significant improvement to the codebase. Addressing the recommendations above will further enhance the maintainability and robustness of the code.
To help with improving test coverage, consider using a code coverage tool like gcov or llvm-cov to identify specific areas that need attention. You can integrate this into your CI/CD pipeline to ensure ongoing coverage of new and modified code.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 73-74: chunkset_tpl.h#L73-L74
Added lines #L73 - L74 were not covered by tests
[warning] 79-79: chunkset_tpl.h#L79
Added line #L79 was not covered by tests
[warning] 84-84: chunkset_tpl.h#L84
Added line #L84 was not covered by tests
[warning] 92-99: chunkset_tpl.h#L92-L99
Added lines #L92 - L99 were not covered by tests
[warning] 101-105: chunkset_tpl.h#L101-L105
Added lines #L101 - L105 were not covered by tests
[warning] 107-107: chunkset_tpl.h#L107
Added line #L107 was not covered by tests
[warning] 113-113: chunkset_tpl.h#L113
Added line #L113 was not covered by tests
[warning] 118-122: chunkset_tpl.h#L118-L122
Added lines #L118 - L122 were not covered by tests
[warning] 131-132: chunkset_tpl.h#L131-L132
Added lines #L131 - L132 were not covered by tests
[warning] 139-139: chunkset_tpl.h#L139
Added line #L139 was not covered by tests
[warning] 152-152: chunkset_tpl.h#L152
Added line #L152 was not covered by tests
[warning] 157-157: chunkset_tpl.h#L157
Added line #L157 was not covered by tests
[warning] 159-161: chunkset_tpl.h#L159-L161
Added lines #L159 - L161 were not covered by tests
[warning] 164-165: chunkset_tpl.h#L164-L165
Added lines #L164 - L165 were not covered by tests🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
- arch/arm/arm_functions.h (1 hunks)
- arch/arm/chunkset_neon.c (0 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/riscv/riscv_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (5 hunks)
- arch/x86/chunkset_sse2.c (0 hunks)
- arch/x86/chunkset_ssse3.c (0 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (5 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
💤 Files with no reviewable changes (3)
- arch/arm/chunkset_neon.c
- arch/x86/chunkset_sse2.c
- arch/x86/chunkset_ssse3.c
🚧 Files skipped from review as they are similar to previous changes (9)
- arch/arm/arm_functions.h
- arch/generic/generic_functions.h
- arch/power/power_functions.h
- arch/riscv/riscv_functions.h
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (7)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T03:09:54.717Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T19:37:14.998Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1778 File: arch/x86/chunkset_avx2.c:160-171 Timestamp: 2024-10-08T21:51:45.330Z Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-08T19:37:14.998Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:168-168 Timestamp: 2024-10-07T21:21:10.570Z Learning: In the function `GET_HALFCHUNK_MAG` in `arch/x86/chunkset_avx2.c`, `dist` is always less than 16, so `16 - dist` in `__msan_unpoison` is always positive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-07T21:18:37.806Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:82-85 Timestamp: 2024-10-08T19:37:14.998Z Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.chunkset_tpl.h (5)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-08T19:37:14.998Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:92-108 Timestamp: 2024-10-07T22:00:02.180Z Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-07T21:59:15.763Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:94-107 Timestamp: 2024-10-08T19:37:14.998Z Learning: In the function `HALFCHUNKCOPY` in `chunkset_tpl.h`, the parameter `len` is guaranteed to never be zero.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 GitHub Check: codecov/patch
arch/x86/chunkset_avx2.c
[warning] 78-79: arch/x86/chunkset_avx2.c#L78-L79
Added lines #L78 - L79 were not covered by tests
[warning] 113-113: arch/x86/chunkset_avx2.c#L113
Added line #L113 was not covered by tests
[warning] 129-130: arch/x86/chunkset_avx2.c#L129-L130
Added lines #L129 - L130 were not covered by tests
[warning] 133-134: arch/x86/chunkset_avx2.c#L133-L134
Added lines #L133 - L134 were not covered by tests
[warning] 137-137: arch/x86/chunkset_avx2.c#L137
Added line #L137 was not covered by tests
[warning] 140-140: arch/x86/chunkset_avx2.c#L140
Added line #L140 was not covered by tests
[warning] 143-143: arch/x86/chunkset_avx2.c#L143
Added line #L143 was not covered by tests
[warning] 145-148: arch/x86/chunkset_avx2.c#L145-L148
Added lines #L145 - L148 were not covered by tests
[warning] 151-151: arch/x86/chunkset_avx2.c#L151
Added line #L151 was not covered by tests
[warning] 153-153: arch/x86/chunkset_avx2.c#L153
Added line #L153 was not covered by testschunkset_tpl.h
[warning] 73-74: chunkset_tpl.h#L73-L74
Added lines #L73 - L74 were not covered by tests
[warning] 79-79: chunkset_tpl.h#L79
Added line #L79 was not covered by tests
[warning] 84-84: chunkset_tpl.h#L84
Added line #L84 was not covered by tests
[warning] 92-99: chunkset_tpl.h#L92-L99
Added lines #L92 - L99 were not covered by tests
[warning] 101-105: chunkset_tpl.h#L101-L105
Added lines #L101 - L105 were not covered by tests
[warning] 107-107: chunkset_tpl.h#L107
Added line #L107 was not covered by tests
[warning] 113-113: chunkset_tpl.h#L113
Added line #L113 was not covered by tests
[warning] 118-122: chunkset_tpl.h#L118-L122
Added lines #L118 - L122 were not covered by tests
[warning] 131-132: chunkset_tpl.h#L131-L132
Added lines #L131 - L132 were not covered by tests
[warning] 139-139: chunkset_tpl.h#L139
Added line #L139 was not covered by tests
[warning] 152-152: chunkset_tpl.h#L152
Added line #L152 was not covered by tests
[warning] 157-157: chunkset_tpl.h#L157
Added line #L157 was not covered by tests
[warning] 159-161: chunkset_tpl.h#L159-L161
Added lines #L159 - L161 were not covered by tests
[warning] 164-165: chunkset_tpl.h#L164-L165
Added lines #L164 - L165 were not covered by tests
[warning] 187-188: chunkset_tpl.h#L187-L188
Added lines #L187 - L188 were not covered by tests
[warning] 190-190: chunkset_tpl.h#L190
Added line #L190 was not covered by tests
[warning] 192-192: chunkset_tpl.h#L192
Added line #L192 was not covered by tests
[warning] 195-198: chunkset_tpl.h#L195-L198
Added lines #L195 - L198 were not covered by tests
[warning] 212-212: chunkset_tpl.h#L212
Added line #L212 was not covered by tests
[warning] 222-222: chunkset_tpl.h#L222
Added line #L222 was not covered by tests
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (12)
arch/x86/chunkset_avx2.c (7)
9-12: LGTM: New include and type definition for half-chunk operationsThe addition of
x86_intrins.hand the newhalfchunk_ttype are good improvements:
- Including
x86_intrins.hlikely provides access to additional x86-specific intrinsics.- The
halfchunk_ttype, defined as__m128i, introduces support for operations on 128-bit chunks, which is half the size of the existing 256-bitchunk_t.These changes lay the groundwork for more flexible chunk operations in the AVX2 implementation.
17-20: LGTM: New macro definitions for extended chunk operationsThe addition of new macro definitions enhances the functionality of the chunk operations:
HAVE_CHUNKMEMSET_16andHAVE_CHUNKMEMSET_1indicate support for 16-byte and 1-byte chunk memory set operations.HAVE_HALF_CHUNKaligns with the introduction of thehalfchunk_ttype, enabling operations on 128-bit chunks.These macros provide compile-time information about the available chunk operations, which can be useful for conditional compilation and optimizations in other parts of the codebase.
78-81: LGTM: New chunkmemset_16 function for efficient 16-byte chunk operationsThe
chunkmemset_16function is a valuable addition:
- It efficiently sets a 256-bit chunk using a 128-bit source.
- The function uses AVX2 intrinsics (_mm256_broadcastsi128_si256) for optimal performance.
- It aligns with the newly added HAVE_CHUNKMEMSET_16 macro.
This function will allow for more efficient memory operations when dealing with 16-byte aligned data.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 78-79: arch/x86/chunkset_avx2.c#L78-L79
Added lines #L78 - L79 were not covered by tests
129-132: LGTM: New loadhalfchunk function for efficient 128-bit chunk loadingThe
loadhalfchunkfunction is a valuable addition:
- It efficiently loads a 128-bit chunk using the
_mm_loadu_si128intrinsic.- This function aligns with the newly added
halfchunk_ttype andHAVE_HALF_CHUNKmacro.- It provides a consistent interface for loading half-chunks, improving code readability and maintainability.
This function will allow for more efficient memory operations when dealing with 128-bit (16-byte) data chunks.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 129-130: arch/x86/chunkset_avx2.c#L129-L130
Added lines #L129 - L130 were not covered by tests
133-136: LGTM: New storehalfchunk function for efficient 128-bit chunk storingThe
storehalfchunkfunction is a well-designed complement toloadhalfchunk:
- It efficiently stores a 128-bit chunk using the
_mm_storeu_si128intrinsic.- This function aligns with the
halfchunk_ttype and completes the set of operations for half-chunks.- It provides a consistent interface for storing half-chunks, improving code readability and maintainability.
This function will allow for more efficient memory operations when dealing with 128-bit (16-byte) data chunks.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 133-134: arch/x86/chunkset_avx2.c#L133-L134
Added lines #L133 - L134 were not covered by tests
Line range hint
78-153: Improve test coverage for new functionsWhile the new functions and modifications appear to be well-designed and valuable additions to the AVX2 chunk operations, the static analysis tool has flagged several new functions and code paths as not covered by tests. To ensure the reliability and correctness of these new features, it's important to add appropriate test cases.
Please add test cases for the following new functions and code paths:
chunkmemset_16loadhalfchunkstorehalfchunkhalfchunk2wholeGET_HALFCHUNK_MAG- The new
elsebranch inGET_CHUNK_MAGAdding these tests will help verify the correctness of the new functionality and prevent potential regressions in future changes.
#!/bin/bash # Search for existing tests related to chunk operations rg --type c "test.*chunk" tests/🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 129-130: arch/x86/chunkset_avx2.c#L129-L130
Added lines #L129 - L130 were not covered by tests
[warning] 133-134: arch/x86/chunkset_avx2.c#L133-L134
Added lines #L133 - L134 were not covered by tests
[warning] 137-137: arch/x86/chunkset_avx2.c#L137
Added line #L137 was not covered by tests
[warning] 140-140: arch/x86/chunkset_avx2.c#L140
Added line #L140 was not covered by tests
[warning] 143-143: arch/x86/chunkset_avx2.c#L143
Added line #L143 was not covered by tests
[warning] 145-148: arch/x86/chunkset_avx2.c#L145-L148
Added lines #L145 - L148 were not covered by tests
[warning] 151-151: arch/x86/chunkset_avx2.c#L151
Added line #L151 was not covered by tests
[warning] 153-153: arch/x86/chunkset_avx2.c#L153
Added line #L153 was not covered by tests
113-114: LGTM: Simplified condition in GET_CHUNK_MAGThe simplification of the condition for the 'else' branch in
GET_CHUNK_MAGis a good improvement. It reduces code complexity and assumes that all cases wheredist >= 16can be handled uniformly.To ensure this change doesn't introduce any regressions, please verify that the function behaves correctly for all possible values of
dist, especially at the boundary condition wheredist == 16.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 113-113: arch/x86/chunkset_avx2.c#L113
Added line #L113 was not covered by testschunkset_tpl.h (5)
Line range hint
222-232: Approved changes to CHUNKMEMSET_SAFEThe modifications to
CHUNKMEMSET_SAFEare consistent with the changes made toCHUNKMEMSETand improve the overall safety of the function. The use ofMINto ensurelendoesn't exceedleftis a good practice that prevents potential buffer overflows.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 222-222: chunkset_tpl.h#L222
Added line #L222 was not covered by tests
185-198: Approved changes with suggestion for improved test coverageThe addition of
CHUNKMEMSET_16and the modification to the main loop to handle larger chunks are good improvements that can enhance performance.However, the static analysis tool indicates that some of these lines are not covered by tests. To ensure the correctness and performance of these changes, consider adding test cases that specifically target:
- The new
CHUNKMEMSET_16functionality.- The modified main loop that handles
2 * sizeof(chunk_t)at a time.You can use the following script to help identify relevant test files or suggest creating new ones:
#!/bin/bash # Find test files related to CHUNKMEMSET operations test_files=$(rg -l "CHUNKMEMSET" test/) if [ -n "$test_files" ]; then echo "Existing test files that might be relevant for CHUNKMEMSET operations:" echo "$test_files" echo "Ensure these tests cover the new CHUNKMEMSET_16 and modified loop." else echo "No existing test files found for CHUNKMEMSET operations. Consider creating new tests." fi🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 187-188: chunkset_tpl.h#L187-L188
Added lines #L187 - L188 were not covered by tests
[warning] 190-190: chunkset_tpl.h#L190
Added line #L190 was not covered by tests
[warning] 192-192: chunkset_tpl.h#L192
Added line #L192 was not covered by tests
[warning] 195-198: chunkset_tpl.h#L195-L198
Added lines #L195 - L198 were not covered by tests
149-168: Approved half-chunk handling with suggestions for clarity and testingThe addition of half-chunk handling is a good optimization for smaller data sizes. However, there are a few points to consider:
The rationale behind some conditions (e.g.,
dist == 6) is not immediately clear. Consider adding comments to explain these special cases.The static analysis tool indicates that parts of this code are not covered by tests. It's important to ensure that these new code paths are thoroughly tested.
To improve clarity and test coverage:
- Add comments explaining the rationale for special cases, e.g.:
// Special handling for odd distances and distance of 6 if ((dist % 2) != 0 || dist == 6) { // ... (add explanation here) }
- Use the following script to help identify areas needing test coverage:
#!/bin/bash # Find test files related to CHUNKMEMSET and half-chunk operations test_files=$(rg -l "CHUNKMEMSET|HALFCHUNKCOPY|GET_HALFCHUNK_MAG" test/) if [ -n "$test_files" ]; then echo "Existing test files that might be relevant:" echo "$test_files" echo "Ensure these tests cover the new half-chunk handling code paths." else echo "No existing test files found for these operations. Consider creating new tests for half-chunk handling." fi🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 152-152: chunkset_tpl.h#L152
Added line #L152 was not covered by tests
[warning] 157-157: chunkset_tpl.h#L157
Added line #L157 was not covered by tests
[warning] 159-161: chunkset_tpl.h#L159-L161
Added lines #L159 - L161 were not covered by tests
[warning] 164-165: chunkset_tpl.h#L164-L165
Added lines #L164 - L165 were not covered by tests
92-108: Approved new HALFCHUNKCOPY function with suggestionsThe new
HALFCHUNKCOPYfunction is a good addition for optimizing smaller data copies. However, there are a couple of points to consider:
- The alignment calculation uses
int32_t, which could potentially cause issues with very largelenvalues. Consider changing it tounsignedto match thelenparameter type:- int32_t align = ((len - 1) % sizeof(halfchunk_t)) + 1; + unsigned align = ((len - 1) % sizeof(halfchunk_t)) + 1;
- The static analysis tool indicates that this function is not covered by tests. Consider adding test cases to ensure its correctness and performance.
To help with adding tests, you can use the following script to find relevant test files or suggest creating a new one:
#!/bin/bash # Find test files related to chunk copying functions test_files=$(rg -l "CHUNKCOPY|HALFCHUNKCOPY" test/) if [ -n "$test_files" ]; then echo "Existing test files that might be relevant for HALFCHUNKCOPY:" echo "$test_files" else echo "No existing test files found for chunk copying functions. Consider creating a new test file for HALFCHUNKCOPY." fi🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 92-99: chunkset_tpl.h#L92-L99
Added lines #L92 - L99 were not covered by tests
[warning] 101-105: chunkset_tpl.h#L101-L105
Added lines #L101 - L105 were not covered by tests
[warning] 107-107: chunkset_tpl.h#L107
Added line #L107 was not covered by tests
135-135: Acknowledge memset usage, but change not recommendedThe static analysis tool flagged the use of
memseton this line, suggestingmemset_sas an alternative. Whilememset_sdoes provide additional safety through bounds checking, there are reasons to maintain the current implementation:
memset_sis part of the C11 standard but is not widely supported across all platforms.- The current usage of
memsetis in a controlled context where the length is known and checked.- Changing to
memset_scould introduce compatibility issues with older compilers or platforms.Given these considerations, the current use of
memsetis acceptable. However, if broad platform support formemset_sbecomes available in the future, it might be worth revisiting this decision.To ensure the safety of the current
memsetusage, you can run the following script to check for any potential buffer overflow scenarios:This script will help identify the context around
memsetcalls and any related buffer size declarations.🧰 Tools
🪛 ast-grep
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
| uint64_t cpy_dist; | ||
| uint64_t bytes_remaining = sizeof(chunk_t); |
There was a problem hiding this comment.
💡 Codebase verification
Add test cases for CHUNKCOPY and HALFCHUNKCOPY functions
No existing test files found for CHUNKCOPY and HALFCHUNKCOPY. To ensure these critical memory operations are reliable and to prevent potential bugs, please create comprehensive test cases covering various scenarios for these functions.
🔗 Analysis chain
Approved changes with a suggestion for improved test coverage
The updates to use uint64_t for distance-related variables and the switch to memcpy are good improvements. They allow for handling larger distances and potentially improve performance and safety.
However, the static analysis tool indicates that these lines are not covered by tests. Consider adding test cases to cover these scenarios:
Also applies to: 79-84
[Additional test coverage for chunk copying functions required]
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Find test files related to GET_CHUNK_MAG function
test_files=$(rg -l "GET_CHUNK_MAG" test/)
if [ -n "$test_files" ]; then
echo "Existing test files that might be relevant:"
echo "$test_files"
else
echo "No existing test files found for GET_CHUNK_MAG. Consider creating a new test file."
fi
Length of output: 600
Script:
# Find test files related to CHUNKCOPY and HALFCHUNKCOPY functions
test_files=$(rg -l "CHUNKCOPY|HALFCHUNKCOPY" test/)
if [ -n "$test_files" ]; then
echo "Existing test files that might be relevant:"
echo "$test_files"
else
echo "No existing test files found for chunk copying functions. Consider creating new test cases for CHUNKCOPY and HALFCHUNKCOPY."
fi
Length of output: 327
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 73-74: chunkset_tpl.h#L73-L74
Added lines #L73 - L74 were not covered by tests
|
Will do but probably won't get to it for a little bit. |
30860bb to
d63912a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
chunkset_tpl.h (1)
7-7: Consider removing unnecessary includeThe addition of
#include <stdio.h>seems unnecessary for a header file focused on memory operations. Unless there are specific stdio functions used in this file, consider removing this include to reduce dependencies and potential compilation overhead.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (12)
- arch/arm/arm_functions.h (1 hunks)
- arch/generic/generic_functions.h (1 hunks)
- arch/power/power_functions.h (1 hunks)
- arch/riscv/riscv_functions.h (1 hunks)
- arch/x86/chunkset_avx2.c (2 hunks)
- arch/x86/x86_functions.h (3 hunks)
- chunkset_tpl.h (7 hunks)
- functable.c (1 hunks)
- functable.h (1 hunks)
- inffast_tpl.h (3 hunks)
- inflate.c (1 hunks)
- inflate_p.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- arch/arm/arm_functions.h
- arch/generic/generic_functions.h
- arch/power/power_functions.h
- arch/riscv/riscv_functions.h
- arch/x86/x86_functions.h
- functable.c
- functable.h
- inffast_tpl.h
- inflate.c
🧰 Additional context used
📓 Learnings (2)
arch/x86/chunkset_avx2.c (3)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T03:09:54.717Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: arch/x86/chunkset_avx2.c:56-59 Timestamp: 2024-10-08T19:37:14.998Z Learning: The array `half_rem_vals` in `arch/x86/chunkset_avx2.c` is used in the function `GET_HALFCHUNK_MAG` on line 148.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1778 File: arch/x86/chunkset_avx2.c:160-171 Timestamp: 2024-10-08T21:51:45.330Z Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.chunkset_tpl.h (1)
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1802 File: chunkset_tpl.h:135-135 Timestamp: 2024-10-07T21:23:13.401Z Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.
🪛 ast-grep
chunkset_tpl.h
[warning] 135-135: Avoid 'memset()' function, it does not consider buffer boundaries and can lead to buffer overflows. Use 'memset_s()' instead.
Context: memset(out, *from, len);
Note: [CWE-14]: Compiler Removal of Code to Clear Buffers [OWASP A04:2021] Insecure Design [REFERENCES]
- https://cwe.mitre.org/data/definitions/14.html
- https://owasp.org/Top10/A02_2021-Cryptographic_Failures/
🔇 Additional comments (8)
inflate_p.h (1)
153-153: Verify the intended behavior of thesafelencalculation.The calculation of
safelenhas been modified to remove the+ 1at the end. This change makes the safe length calculation more precise by exactly matching the difference between thesafeandoutpointers.Please verify that this change aligns with the intended behavior of the
chunkcopy_safefunction. If it does, consider adding a comment explaining why the exact difference is used (e.g., "Calculate the exact number of bytes that can be safely written").To ensure this change doesn't introduce any regressions, please run the following verification script:
This script will help identify any tests, benchmarks, or documentation that might need updating due to this change.
chunkset_tpl.h (5)
73-74: LGTM: Improved type consistency for larger valuesThe change from
uint32_ttouint64_tforcpy_distandbytes_remainingis a good improvement. It allows for handling larger distances and sizes, consistent with the function signature change. This modification enhances the function's capability to work with bigger memory chunks.
113-133: LGTM: Improved flexibility and correct handling of overlapping memoryThe changes to
CHUNKMEMSETare well-implemented:
- The new function signature using a
frompointer instead of adistparameter allows for more flexible memory operations.- The added logic correctly handles cases where
fromis ahead ofout, which is a significant improvement in robustness.- The use of
memmovefor overlapping memory regions is appropriate and ensures correct behavior in all cases.These changes make the function more versatile and safer to use in various scenarios.
Line range hint
138-166: LGTM: Excellent optimization for various memory operation sizesThe changes in this section demonstrate thoughtful optimization:
- The updated condition
dist >= sizeof(chunk_t)for usingCHUNKCOPYis logical and consistent with the function's purpose.- The introduction of
HALF_CHUNKlogic provides optimization for smaller memory operations, which can significantly improve performance in certain scenarios.- The code is well-commented, clearly explaining the rationale behind these optimizations.
These changes show a commendable effort to fine-tune the function for different memory operation sizes, potentially leading to improved overall performance.
Line range hint
190-250: LGTM: Consistent updatesThe changes in this section primarily focus on maintaining consistency with earlier modifications. Variable names and types have been updated appropriately, while preserving the existing optimizations and logical structure. This attention to consistency is commendable and helps maintain the overall integrity of the code.
255-273: LGTM: Excellent addition of a safe copy functionThe new
CHUNKCOPY_SAFEfunction is a valuable addition:
- It provides crucial safety checks for memory boundaries and overlapping regions.
- The function handles edge cases well, including when
out == fromor when the safe distance is less thansizeof(chunk_t).- It efficiently uses
CHUNKMEMSETfor the actual copying when it's safe to do so, leveraging existing optimized code.This function enhances the overall safety of memory operations in the library while maintaining performance where possible. Great job on improving the robustness of the codebase!
arch/x86/chunkset_avx2.c (2)
18-18: Addition ofHAVE_CHUNKMEMSET_1macro is appropriateThe introduction of
#define HAVE_CHUNKMEMSET_1adds support for 1-byte chunk memory setting. This enhancement increases the flexibility of memory operations and aligns with the existing macros for other chunk sizes.
137-140: Ensure all calls tohalfchunk2wholeare updated to match the new signatureThe function
halfchunk2wholenow accepts a pointer tohalfchunk_tinstead of passing the value directly. Please verify that all invocations of this function have been updated accordingly to prevent potential compilation errors or unexpected behavior.Run the following script to locate all calls to
halfchunk2wholeand check their usage:✅ Verification successful
Verification Successful: All calls to
halfchunk2wholehave been updated to match the new signature.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all calls to `halfchunk2whole` and display their usages. rg --type c --no-heading --line-number "halfchunk2whole"Length of output: 223
Pretty much all chunkcopy, chunkmemset, and _safe variants are just special optimization cases of
while (len--) *to++ = *from++;We call chunk_memset in scenarios where we are repeating a previous element as we would in a self modifying copy loop (i.e. when there's overlap in the to & from pointers, and from points before to). We call chunkcopy, with or without overlap, as a special case of overlap is such that the distance between the pointers is >= or vector register width. We dispatched mostly to chunkcopy_safe in the case that we could not write more than len bytes (the edge of a buffer). It also was intentionally agnostic to the positions of to and from.
By modifying the code such that we have a an architecture specific CHUNKCOPY_SAFE that we call into, we can make use of the architecture-specific CHUNKCOPY and CHUNKMEMSET routines. This version avoids dispatching to memcpy, which was probably more expensive particularly for short strings. It also is likely to save us several reads, as many chunkcopy_safe calls can now detect overlap and leverage the optimized chunkmemset in doing so.
Summary by CodeRabbit
New Features
CHUNKCOPY_SAFEfunction for improved data copying with additional safety checks.Bug Fixes
Documentation
Refactor