Skip to content

port: Use memalign or _aligned_malloc only, when available, fallback to malloc.#1863

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
winspool:port
Mar 17, 2025
Merged

port: Use memalign or _aligned_malloc only, when available, fallback to malloc.#1863
Dead2 merged 1 commit intozlib-ng:developfrom
winspool:port

Conversation

@winspool
Copy link
Copy Markdown
Contributor

@winspool winspool commented Feb 3, 2025

This fixes also the usage of "_WIN32" to decide,
if the MSVC extensions _aligned_malloc / _aligned_free are available

That bug breaks other Compiler on Windows. (OpenWatcom as Example).

--
Regards ... Detlef

Summary by CodeRabbit

  • New Features

    • Enhanced build configuration to automatically detect and enable advanced aligned memory allocation methods.
    • Introduced support for optional sanitizers to improve error detection during builds.
  • Refactor

    • Refined the logic for memory allocation and deallocation, ensuring more robust cross-platform behavior.
    • Improved header inclusion logic for aligned memory allocation methods.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 3, 2025

Walkthrough

The pull request enhances the build configuration and memory management code by adding checks for various aligned memory allocation functions (including posix_memalign, aligned_alloc, memalign, and _aligned_malloc) in both the CMake and configure scripts. It also introduces support for sanitizers based on the WITH_SANITIZER option. Additionally, the internal memory allocation (zng_alloc) and deallocation (zng_free) functions in zutil_p.h have been modified to utilize the appropriate allocation methods with fallback mechanisms. No public API declarations have been altered.

Changes

File(s) Change Summary
CMakeLists.txt, configure Enhanced build scripts to check for multiple aligned memory allocation functions and set corresponding preprocessor definitions. Configured sanitizer support based on options.
zutil_p.h Updated header inclusion logic and revised memory allocation (zng_alloc) and deallocation (zng_free) functions to use available aligned allocation methods with fallback logic.

Suggested labels

Architecture

Suggested reviewers

  • Dead2
  • nmoinvaz

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eccaaa and 3b1b476.

📒 Files selected for processing (3)
  • CMakeLists.txt (3 hunks)
  • configure (3 hunks)
  • zutil_p.h (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • zutil_p.h
  • CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
🔇 Additional comments (5)
configure (5)

727-746: Good addition of POSIX aligned memory allocation check

This code properly checks for the availability of posix_memalign by compiling a test program that uses the function. If successful, it correctly adds the -DHAVE_POSIX_MEMALIGN flag to both CFLAGS and SFLAGS.


749-767: Well-implemented C11 aligned_alloc detection

The code correctly tests for C11's aligned_alloc function using the appropriate _ISOC11_SOURCE define and adds the corresponding compiler flag when available.


770-788: Proper handling of the deprecated GNU memalign extension

Good implementation of the check for GNU's deprecated memalign function. The code comments appropriately mention that this function is deprecated in glibc, which is important information for maintainers.


791-807: Thorough check for MSVC aligned memory functions

This code effectively tests for the MSVC-specific _aligned_malloc and _aligned_free functions. The test correctly includes both allocation and deallocation to ensure the complete functionality is available.


727-807: Overall well-implemented aligned memory detection

The PR successfully implements detection for multiple aligned memory allocation methods (POSIX, C11, GNU, MSVC) with proper fallback mechanisms. This addresses the issue mentioned in the PR description where certain compilers on Windows (like OpenWatcom) were affected by relying solely on the _WIN32 macro to detect availability of MSVC functions.

The implementation follows the same pattern as other detection mechanisms in the script, maintains consistency, and properly sets the necessary preprocessor defines when features are available.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

endif()
set(CMAKE_REQUIRED_DEFINITIONS)

check_symbol_exists(_aligned_malloc malloc.h HAVE__ALIGNED_MALLOC)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two underscores. Is that intentional?

Copy link
Copy Markdown
Contributor Author

@winspool winspool Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the function starts with an underscore.

Without the second underscore, the difference between the
C11 aligned allocation macro and the
MS aligned allocation macro is only one character
and can easily mixed:
(HAVE_ALIGNED_ALLOC and HAVE_ALLIGNED_MALLOC) compared to
(HAVE_ALIGNED_ALLOC and HAVE__ALLIGNED_MALLOC)

But of course, when the project wants macros with only one underscore, please let me know.

@winspool
Copy link
Copy Markdown
Contributor Author

winspool commented Feb 3, 2025

When a similar check for cpuid is added, OpenWatcom can build "zlib-ng":
0000_check_for_cpuid_support.txt

Some TODO items for OpenWatcom:

  • Make the OW internal atomic support available
  • Make aligned allocation support available
  • Make cpuid support available

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.30%. Comparing base (50e9ca0) to head (3b1b476).
Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1863       +/-   ##
============================================
+ Coverage    42.10%   79.30%   +37.20%     
============================================
  Files           71      132       +61     
  Lines         7353    12199     +4846     
  Branches      1265     2684     +1419     
============================================
+ Hits          3096     9675     +6579     
+ Misses        4035     1538     -2497     
- Partials       222      986      +764     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

void *ptr;
return posix_memalign(&ptr, 64, size) ? NULL : ptr;
#elif defined(_WIN32)
#elif defined(HAVE__ALIGNED_MALLOC)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth moving this to the top just in case there's a platform that supports both _aligned_malloc and aligned_alloc/posix_memalign, to make sure that the correct free function is used.

Copy link
Copy Markdown
Contributor Author

@winspool winspool Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah.
I changed the detection strategy
for this Revision to reduce indention
(Posix and c11.
Check gnu ext. only, when c11 ist absent
Check MS ext only, when gnu ext. ist absent)

That's better to read, but i forgot to handle the free function.

I will prepare an updated Patch.

zutil_p.h Outdated
return (void *)memalign(64, size);
#else
/* Fallback to use a normal allocation and hope the best */
/* This works only, when the default C compression code is used */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, zlib-ng does ensure that allocated memory is aligned at a higher level, so it might be worth clarifying that in the comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know only about the alignments required for SSE/AVX.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that zlib-ng currently includes padding when allocating, which means it can make sure that the buffer is aligned even if the allocation callback doesn't guarantee it (since it might be provided by users of the library rather than using the default implementation).

https://github.com/zlib-ng/zlib-ng/blob/develop/deflate.c#L246
https://github.com/zlib-ng/zlib-ng/blob/develop/inflate.c#L181

@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Feb 27, 2025

Needs rebase, fail with Static Analysis / Clang should already be fixed.

@@ -10,8 +10,11 @@
#elif defined(__FreeBSD__)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (non-blocking): Since you removed the APPLE below in block choosing malloc type in favor of using a fallback, perhaps also remove the above check for APPLE as well.

You could also remove the whole first check there, and just add a comment in the fallback that HAVE_POSIX_MEMALIGN and HAVE_ALIGNED_ALLOC as well as standard malloc are all handled by the fallback. Seems redundant that both the first and the last block are the same here.

Copy link
Copy Markdown
Contributor Author

@winspool winspool Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

I will prepare a follow-up patch to clean it up
after this change is committed.

zutil_p.h Outdated
#elif defined(HAVE_MEMALIGN) || defined(HAVE__ALIGNED_MALLOC)
# include <malloc.h>
#else
/* Fallback, when no other aligned allocation was found */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Maybe "aligned allocation function" would be better...

…alloc

Using "_WIN32" to decide,
if the MSVC extensions _aligned_malloc / _aligned_free are available
is a bug that breaks other Compiler on Windows. (OpenWatcom as Example)

Regards ... Detlef
@winspool
Copy link
Copy Markdown
Contributor Author

I updated the patch to current git
and included the comments from @Dead2 and @mtl1979.

Please let me know, what else i can do to get this patch committed.

Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 89f90d3 into zlib-ng:develop Mar 17, 2025
151 of 152 checks passed
@winspool winspool deleted the port branch March 19, 2025 21:58
@pps83
Copy link
Copy Markdown
Contributor

pps83 commented May 6, 2025

IMO, better not to bring this messy complexity and do it ffmpeg/x264 way: they completely abandoned these and simply align memory themselves. Not sure who of them does, or maybe both. If the code needs aligned memory and there is no routines you need to fallback to manual alignment: could as well keep using it for all to make it simple. cost of manual alignment is zero.

malloc64:
    uint8_t *buf = (uint8_t*)malloc(size + 63 + sizeof(void**));
    if (buf)
    {
        uint8_t *aligned = buf + 63 + sizeof(void**);
        aligned -= (intptr_t)aligned & 63;
        *((void**)(aligned- sizeof(void **))) = buf;
        return aligned;
    }
    

free64:
    if(p)
        free( *(((void**)p) - 1));

something like that

902abe7

@Dead2
Copy link
Copy Markdown
Member

Dead2 commented May 6, 2025

@pps83 Unless I missed something, and I don't think I did, all allocs should already get checked for alignment and if not already aligned it will manually align them. See #1713.

The reason we always check for alignment is that the zlib api allows applications giving it pointers to its preferred alloc/free functions, and we cannot know whether those will supply us with aligned memory.
So, in fact you are right, we could do away with using the aligned alloc functions entirely, if we want to. It would only waste up to 64bytes of ram at the start of the buffer, instead of like today 64 bytes at the end of the buffer. The effective difference to us should be zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants