Skip to content

Annotate alloc/free open/close functions with compiler attributes#13537

Merged
shindere merged 1 commit intoocaml:trunkfrom
MisterDA:caml_stat-annotations-v3
Oct 11, 2024
Merged

Annotate alloc/free open/close functions with compiler attributes#13537
shindere merged 1 commit intoocaml:trunkfrom
MisterDA:caml_stat-annotations-v3

Conversation

@MisterDA
Copy link
Copy Markdown
Contributor

@MisterDA MisterDA commented Oct 9, 2024

This helps the compiler optimize code, and static analysis by detecting potential mismatches in alloc/free pairs.

With GCC, restrict the attributes usage to GCC >= 14, as GCC 13 reports false positives on the runtime code.
The only change from #13457 is to restrict this PR to GCC >= 14, or LLVM-based compilers. It was reverted in bce3bb7, as builds using GCC 13 would fail.

  • malloc

The malloc attribute indicates that the function acts like a system
memory allocation function, returning a pointer to allocated storage
disjoint from the storage for any other object accessible to the
caller.

https://clang.llvm.org/docs/AttributeReference.html#malloc

Associating a function with a deallocator helps detect calls to
mismatched allocation and deallocation functions and diagnose them
under the control of options such as -Wmismatched-dealloc. It also
makes it possible to diagnose attempts to deallocate objects that
were not allocated dynamically, by -Wfree-nonheap-object. To
indicate that an allocation function both satisifies the nonaliasing
property and has a deallocator associated with it, both the plain
form of the attribute and the one with the deallocator argument must
be used.

The warnings guarded by -fanalyzer respect allocation and
deallocation pairs marked with the malloc.

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute

Note that the malloc attribute can only be applied to functions returning pointers. The OCaml value type is a typedef to an integer type, and the C compiler will refuse applying the attribute to a function returning an OCaml value.

  • nodiscard / warn_unused_result

Prevent memory leaks by warning if the result of an allocation is ignored.

https://clang.llvm.org/docs/AttributeReference.html#nodiscard-warn-unused-result

  • alloc_align

GCC uses this information to improve pointer alignment analysis.

https://clang.llvm.org/docs/AttributeReference.html#alloc-align https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005falign-function-attribute

  • alloc_size

GCC uses this information to improve the results of
__builtin_object_size.

https://clang.llvm.org/docs/AttributeReference.html#alloc-size https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute

  • returns_nonnull

lets the compiler optimize callers based on the knowledge that the
return value will never be null.

The analyzer considers the possibility that an allocation function
could fail and return null. […] If the allocator always returns
non-null, use attribute ((returns_nonnull)) to suppress these
warnings.

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-returns_005fnonnull-function-attribute

Invoked on the following test program, GCC 14.2.0 reports:

#include <caml/memory.h> /* with this patch */
#include <stdio.h>

void test_alloc_wrong_free(void)
{
  void *data = caml_stat_alloc(42);
  free(data);
}

void test_discard_alloc_result(void)
{
  caml_stat_alloc(42);
}

void test_alloc_noexec_null(void)
{
  void *data = caml_stat_alloc(42);
  printf("hello, world!");
}
opam exec -- gcc $(ocamlopt -config-var native_c_flags) \
                 -I $(ocamlopt -config-var standard_library_default) \
                 $(ocamlopt -config-var native_ldflags) -c \
                 -Wall -fanalyzer ./test.c
./test.c: In function ‘test_alloc_noexec_null’:
./test.c:17:9: warning: unused variable ‘data’ [-Wunused-variable]
   17 |   void *data = caml_stat_alloc(42);
      |         ^~~~
./test.c: In function ‘test_discard_alloc_result’:
./test.c:12:3: warning: ignoring return value of ‘caml_stat_alloc’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   12 |   caml_stat_alloc(42);
      |   ^~~~~~~~~~~~~~~~~~~
./test.c: In function ‘test_alloc_wrong_free’:
./test.c:7:3: warning: ‘data’ should have been deallocated with ‘caml_stat_free’ but was deallocated with ‘free’ [CWE-762] [-Wanalyzer-mismatching-deallocation]
    7 |   free(data);
      |   ^~~~~~~~~~
  ‘test_alloc_wrong_free’: events 1-2
    |
    |    6 |   void *data = caml_stat_alloc(42);
    |      |                ^~~~~~~~~~~~~~~~~~~
    |      |                |
    |      |                (1) allocated here
    |    7 |   free(data);
    |      |   ~~~~~~~~~~    
    |      |   |
    |      |   (2) deallocated with ‘free’ here
    |
./test.c: In function ‘test_alloc_noexec_null’:
./test.c:19:1: warning: leak of ‘data’ [CWE-401] [-Wanalyzer-malloc-leak]
   19 | }
      | ^
  ‘test_alloc_noexec_null’: events 1-2
    |
    |   17 |   void *data = caml_stat_alloc(42);
    |      |                ^~~~~~~~~~~~~~~~~~~
    |      |                |
    |      |                (1) allocated here
    |   18 |   printf("hello, world!");
    |   19 | }
    |      | ~               
    |      | |
    |      | (2) ‘data’ leaks here; was allocated at (1)
    |
./test.c: In function ‘test_alloc_wrong_free’:
./test.c:7:3: warning: ‘free’ called on pointer returned from a mismatched allocation function [-Wmismatched-dealloc]
    7 |   free(data);
      |   ^~~~~~~~~~
./test.c:6:16: note: returned from ‘caml_stat_alloc’
    6 |   void *data = caml_stat_alloc(42);
      |                ^~~~~~~~~~~~~~~~~~~

Simply using -Wall turns on more checks. The -fanalyzer warning has both false positive and false negatives and I'm not turning it on on the codebase. I believe this code may help users writing C FFI code for OCaml. No warnings have been raised on the current trunk or on Lwt. Let me know if there are packages containing C code that could be tested.

glibc has annotated functions in malloc/malloc.h, but as we're keeping track of C heap allocations, the compiler cannot propagate the annotations to the call sites of caml_stat_* functions.

I don't have insights on whether code gen has improved or worsened.

cc @gasche and @Octachron, who reviewed the previous version of this PR.

This helps the compiler optimize code, and static analysis by
detecting potential mismatches in alloc/free pairs.

- malloc

> The malloc attribute indicates that the function acts like a system
> memory allocation function, returning a pointer to allocated storage
> disjoint from the storage for any other object accessible to the
> caller.

https://clang.llvm.org/docs/AttributeReference.html#malloc

> Associating a function with a deallocator helps detect calls to
> mismatched allocation and deallocation functions and diagnose them
> under the control of options such as -Wmismatched-dealloc. It also
> makes it possible to diagnose attempts to deallocate objects that
> were not allocated dynamically, by -Wfree-nonheap-object. To
> indicate that an allocation function both satisifies the nonaliasing
> property and has a deallocator associated with it, both the plain
> form of the attribute and the one with the deallocator argument must
> be used.

> The warnings guarded by -fanalyzer respect allocation and
> deallocation pairs marked with the malloc.

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-malloc-function-attribute

Note that the malloc attribute can only be applied to functions
returning pointers. The OCaml value type is a typedef to an integer
type, and the C compiler will refuse applying the attribute to a
function returning an OCaml value.

- nodiscard / warn_unused_result

Prevent memory leaks by warning if the result of an allocation is
ignored.

https://clang.llvm.org/docs/AttributeReference.html#nodiscard-warn-unused-result

- alloc_align

> GCC uses this information to improve pointer alignment analysis.

https://clang.llvm.org/docs/AttributeReference.html#alloc-align
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005falign-function-attribute

- alloc_size

> GCC uses this information to improve the results of
> __builtin_object_size.

https://clang.llvm.org/docs/AttributeReference.html#alloc-size
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute

- returns_nonnull

> lets the compiler optimize callers based on the knowledge that the
> return value will never be null.

> The analyzer considers the possibility that an allocation function
> could fail and return null. […] If the allocator always returns
> non-null, use __attribute__ ((returns_nonnull)) to suppress these
> warnings.

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-returns_005fnonnull-function-attribute

With GCC, restrict the attributes usage to GCC >= 14, as GCC 13
reports false positives on the runtime code.
@MisterDA MisterDA force-pushed the caml_stat-annotations-v3 branch from bb3b7fa to 48264c3 Compare October 9, 2024 10:22
@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 9, 2024

I was confused by this PR description as received in my mailbox ("Didn't we already receive this PR? What is going on?"). Maybe a small remark at the top to say that it is a new iteration on PR number .... would help. What is the old PR number?

@gasche
Copy link
Copy Markdown
Member

gasche commented Oct 9, 2024

My opinion doesn't change that I think we should merge this PR, modulo finding the right configuration gimmicks.

@Octachron is running the PR through precheck enough to tell if the CI will be happy with this new iteration?

@MisterDA
Copy link
Copy Markdown
Contributor Author

MisterDA commented Oct 9, 2024

I was confused by this PR description as received in my mailbox ("Didn't we already receive this PR? What is going on?"). Maybe a small remark at the top to say that it is a new iteration on PR number .... would help. What is the old PR number?

I wrote this text in the middle of the PR message, I'll move it on top.

With GCC, restrict the attributes usage to GCC >= 14, as GCC 13 reports false positives on the runtime code.
The only change from #13457 is to restrict this PR to GCC >= 14, or LLVM-based compilers. It was reverted in bce3bb7, as builds using GCC 13 would fail.

I'd like to point out that there are other compiler attributes we may want to leverage, at the cost of some verbosity. I hope to revisit this idea if this PR proves itself useful and not too invasive.

For functions:

  • access allowing to specify read/write permissions on pointer arguments, stronger than const,
  • fd_arg allowing to track the state of file descriptors and the mode they're open with,
  • nonnull allowing to state that pointer args must be nonnull,
  • null_terminated_string_arg indicating that the passed argument must be a C-style null-terminated string.

For variables:

  • nonstring indicating that a character array does not necessarily contain a terminating NUL.

@Octachron
Copy link
Copy Markdown
Member

@gasche , precheck should cover at least a wide range of versions for the C compiler: https://ci.inria.fr/ocaml/job/precheck/986 .

Copy link
Copy Markdown
Member

@Octachron Octachron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precheck is passing (or failing on unrelated issue), and the restriction to gcc>=14 sounds sensible to me. (I am willing to hope that gcc and llvm will try to be backward compatible on those attributes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants