[Clang][CodeGen] Emit !alloc_token for new expressions#162099
Merged
Conversation
Created using spr 1.3.8-beta.1 [skip ci]
Created using spr 1.3.8-beta.1
This was referenced Oct 6, 2025
Member
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Marco Elver (melver) ChangesFor new expressions, the allocated type is syntactically known and we This change is part of the following series:
Full diff: https://github.com/llvm/llvm-project/pull/162099.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index e6e4947882544..4cf0071b4b884 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1272,6 +1272,23 @@ void CodeGenFunction::EmitBoundsCheckImpl(const Expr *E, llvm::Value *Bound,
EmitCheck(std::make_pair(Check, CheckKind), CheckHandler, StaticData, Index);
}
+void CodeGenFunction::EmitAllocToken(llvm::CallBase *CB, QualType AllocType) {
+ assert(SanOpts.has(SanitizerKind::AllocToken) &&
+ "Only needed with -fsanitize=alloc-token");
+
+ PrintingPolicy Policy(CGM.getContext().getLangOpts());
+ Policy.SuppressTagKeyword = true;
+ Policy.FullyQualifiedName = true;
+ SmallString<64> TypeName;
+ llvm::raw_svector_ostream TypeNameOS(TypeName);
+ AllocType.getCanonicalType().print(TypeNameOS, Policy);
+ auto *TypeMDS = llvm::MDString::get(CGM.getLLVMContext(), TypeNameOS.str());
+
+ // Format: !{<type-name>}
+ auto *MDN = llvm::MDNode::get(CGM.getLLVMContext(), {TypeMDS});
+ CB->setMetadata(llvm::LLVMContext::MD_alloc_token, MDN);
+}
+
CodeGenFunction::ComplexPairTy CodeGenFunction::
EmitComplexPrePostIncDec(const UnaryOperator *E, LValue LV,
bool isInc, bool isPre) {
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index a092b718412be..9877dc1311cd3 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -1707,11 +1707,16 @@ llvm::Value *CodeGenFunction::EmitCXXNewExpr(const CXXNewExpr *E) {
RValue RV =
EmitNewDeleteCall(*this, allocator, allocatorType, allocatorArgs);
- // Set !heapallocsite metadata on the call to operator new.
- if (getDebugInfo())
- if (auto *newCall = dyn_cast<llvm::CallBase>(RV.getScalarVal()))
- getDebugInfo()->addHeapAllocSiteMetadata(newCall, allocType,
- E->getExprLoc());
+ if (auto *newCall = dyn_cast<llvm::CallBase>(RV.getScalarVal())) {
+ if (auto *CGDI = getDebugInfo()) {
+ // Set !heapallocsite metadata on the call to operator new.
+ CGDI->addHeapAllocSiteMetadata(newCall, allocType, E->getExprLoc());
+ }
+ if (SanOpts.has(SanitizerKind::AllocToken)) {
+ // Set !alloc_token metadata.
+ EmitAllocToken(newCall, allocType);
+ }
+ }
// If this was a call to a global replaceable allocation function that does
// not take an alignment argument, the allocator is known to produce
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index f0565c1de04c4..caae791b0c25e 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -3348,6 +3348,9 @@ class CodeGenFunction : public CodeGenTypeCache {
SanitizerAnnotateDebugInfo(ArrayRef<SanitizerKind::SanitizerOrdinal> Ordinals,
SanitizerHandler Handler);
+ /// Emit additional metadata used by the AllocToken instrumentation.
+ void EmitAllocToken(llvm::CallBase *CB, QualType AllocType);
+
llvm::Value *GetCountedByFieldExprGEP(const Expr *Base, const FieldDecl *FD,
const FieldDecl *CountDecl);
|
fmayer
approved these changes
Oct 6, 2025
melver
added a commit
to melver/llvm-project
that referenced
this pull request
Oct 7, 2025
For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. Pull Request: llvm#162099
Created using spr 1.3.8-beta.1 [skip ci]
Created using spr 1.3.8-beta.1 [skip ci]
melver
added a commit
to melver/llvm-project
that referenced
this pull request
Oct 7, 2025
For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. Pull Request: llvm#162099
Created using spr 1.3.8-beta.1 [skip ci]
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 7, 2025
…alloc_token metadata (#160131) In preparation of adding the "AllocToken" pass, add the pre-requisite `sanitize_alloc_token` function attribute and `alloc_token` metadata. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
melver
added a commit
that referenced
this pull request
Oct 7, 2025
Introduce `AllocToken`, an instrumentation pass designed to provide tokens to memory allocators enabling various heap organization strategies, such as heap partitioning. Initially, the pass instruments functions marked with a new attribute `sanitize_alloc_token` by rewriting allocation calls to include a token ID, appended as a function argument with the default ABI. The design aims to provide a flexible framework for implementing different token generation schemes. It currently supports the following token modes: - TypeHash (default): token IDs based on a hash of the allocated type - Random: statically-assigned pseudo-random token IDs - Increment: incrementing token IDs per TU For the `TypeHash` mode introduce support for `!alloc_token` metadata: the metadata can be attached to allocation calls to provide richer semantic information to be consumed by the AllocToken pass. Optimization remarks can be enabled to show where no metadata was available. An alternative "fast ABI" is provided, where instead of passing the token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the token ID is directly encoded into the name of the called function (e.g., `__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this offers more efficient instrumentation by avoiding the overhead of passing an additional argument at each allocation site. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1] --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 7, 2025
…56838) Introduce `AllocToken`, an instrumentation pass designed to provide tokens to memory allocators enabling various heap organization strategies, such as heap partitioning. Initially, the pass instruments functions marked with a new attribute `sanitize_alloc_token` by rewriting allocation calls to include a token ID, appended as a function argument with the default ABI. The design aims to provide a flexible framework for implementing different token generation schemes. It currently supports the following token modes: - TypeHash (default): token IDs based on a hash of the allocated type - Random: statically-assigned pseudo-random token IDs - Increment: incrementing token IDs per TU For the `TypeHash` mode introduce support for `!alloc_token` metadata: the metadata can be attached to allocation calls to provide richer semantic information to be consumed by the AllocToken pass. Optimization remarks can be enabled to show where no metadata was available. An alternative "fast ABI" is provided, where instead of passing the token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the token ID is directly encoded into the name of the called function (e.g., `__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this offers more efficient instrumentation by avoiding the overhead of passing an additional argument at each allocation site. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1] --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
Created using spr 1.3.8-beta.1 [skip ci]
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 7, 2025
…62099) For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/20335 Here is the relevant piece of the build log for the reference |
thurstond
added a commit
that referenced
this pull request
Oct 8, 2025
…)" This reverts commit 631719d.
thurstond
added a commit
that referenced
this pull request
Oct 8, 2025
) Reverts #162099 Reason: this commit depends on #162098, which I am reverting due to build breakage (see #162098 (comment)).
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 8, 2025
…ions" (#162412) Reverts llvm/llvm-project#162099 Reason: this commit depends on #162098, which I am reverting due to build breakage (see llvm/llvm-project#162098 (comment)).
melver
added a commit
that referenced
this pull request
Oct 8, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] Introduce the "alloc-token" sanitizer kind, in preparation of wiring it up. Currently this is a no-op, and any attempt to enable it will result in failure: clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu' In this step we can already wire up the `sanitize_alloc_token` IR attribute where the instrumentation is enabled. Subsequent changes will complete wiring up the AllocToken pass. --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
melver
added a commit
that referenced
this pull request
Oct 8, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 8, 2025
…162098) [ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] Introduce the "alloc-token" sanitizer kind, in preparation of wiring it up. Currently this is a no-op, and any attempt to enable it will result in failure: clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu' In this step we can already wire up the `sanitize_alloc_token` IR attribute where the instrumentation is enabled. Subsequent changes will complete wiring up the AllocToken pass. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 8, 2025
…62099) [ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
melver
added a commit
that referenced
this pull request
Oct 8, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.
The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.
Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.
One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.
Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.
See clang/docs/AllocToken.rst for more usage instructions.
Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434
---
This change is part of the following series:
1. #160131
2. #156838
3. #162098
4. #162099
5. #156839
6. #156840
7. #156841
8. #156842
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 8, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.
The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.
Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.
One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.
Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.
See clang/docs/AllocToken.rst for more usage instructions.
Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434
---
This change is part of the following series:
1. llvm/llvm-project#160131
2. llvm/llvm-project#156838
3. llvm/llvm-project#162098
4. llvm/llvm-project#162099
5. llvm/llvm-project#156839
6. llvm/llvm-project#156840
7. llvm/llvm-project#156841
8. llvm/llvm-project#156842
melver
added a commit
that referenced
this pull request
Oct 8, 2025
Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 8, 2025
…156840) Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
melver
added a commit
that referenced
this pull request
Oct 9, 2025
#156841) For the AllocToken pass to accurately calculate token ID hints, we need to attach `!alloc_token` metadata for allocation calls. Unlike new expressions, untyped allocation calls (like `malloc`, `calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no syntactic type associated with them. For -fsanitize=alloc-token, type hints are sufficient, and we can attempt to infer the type based on common idioms. When encountering allocation calls (with `__attribute__((malloc))` or `__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring the allocated type from (a) sizeof argument expressions such as `malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`. Note that non-standard allocation functions with these attributes are not instrumented by default. Use `-fsanitize-alloc-token-extended` to instrument them as well. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
llvm-sync bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Oct 9, 2025
…ns and casts (#156841) For the AllocToken pass to accurately calculate token ID hints, we need to attach `!alloc_token` metadata for allocation calls. Unlike new expressions, untyped allocation calls (like `malloc`, `calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no syntactic type associated with them. For -fsanitize=alloc-token, type hints are sufficient, and we can attempt to infer the type based on common idioms. When encountering allocation calls (with `__attribute__((malloc))` or `__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring the allocated type from (a) sizeof argument expressions such as `malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`. Note that non-standard allocation functions with these attributes are not instrumented by default. Use `-fsanitize-alloc-token-extended` to instrument them as well. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
svkeerthy
pushed a commit
that referenced
this pull request
Oct 9, 2025
) Reverts #162099 Reason: this commit depends on #162098, which I am reverting due to build breakage (see #162098 (comment)).
svkeerthy
pushed a commit
that referenced
this pull request
Oct 9, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] Introduce the "alloc-token" sanitizer kind, in preparation of wiring it up. Currently this is a no-op, and any attempt to enable it will result in failure: clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu' In this step we can already wire up the `sanitize_alloc_token` IR attribute where the instrumentation is enabled. Subsequent changes will complete wiring up the AllocToken pass. --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
svkeerthy
pushed a commit
that referenced
this pull request
Oct 9, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
svkeerthy
pushed a commit
that referenced
this pull request
Oct 9, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.
The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.
Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.
One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.
Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.
See clang/docs/AllocToken.rst for more usage instructions.
Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434
---
This change is part of the following series:
1. #160131
2. #156838
3. #162098
4. #162099
5. #156839
6. #156840
7. #156841
8. #156842
svkeerthy
pushed a commit
that referenced
this pull request
Oct 9, 2025
Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
svkeerthy
pushed a commit
that referenced
this pull request
Oct 9, 2025
#156841) For the AllocToken pass to accurately calculate token ID hints, we need to attach `!alloc_token` metadata for allocation calls. Unlike new expressions, untyped allocation calls (like `malloc`, `calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no syntactic type associated with them. For -fsanitize=alloc-token, type hints are sufficient, and we can attempt to infer the type based on common idioms. When encountering allocation calls (with `__attribute__((malloc))` or `__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring the allocated type from (a) sizeof argument expressions such as `malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`. Note that non-standard allocation functions with these attributes are not instrumented by default. Use `-fsanitize-alloc-token-extended` to instrument them as well. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. #160131 2. #156838 3. #162098 4. #162099 5. #156839 6. #156840 7. #156841 8. #156842
DharuniRAcharya
pushed a commit
to DharuniRAcharya/llvm-project
that referenced
this pull request
Oct 13, 2025
llvm#156841) For the AllocToken pass to accurately calculate token ID hints, we need to attach `!alloc_token` metadata for allocation calls. Unlike new expressions, untyped allocation calls (like `malloc`, `calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no syntactic type associated with them. For -fsanitize=alloc-token, type hints are sufficient, and we can attempt to infer the type based on common idioms. When encountering allocation calls (with `__attribute__((malloc))` or `__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring the allocated type from (a) sizeof argument expressions such as `malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`. Note that non-standard allocation functions with these attributes are not instrumented by default. Use `-fsanitize-alloc-token-extended` to instrument them as well. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. llvm#160131 2. llvm#156838 3. llvm#162098 4. llvm#162099 5. llvm#156839 6. llvm#156840 7. llvm#156841 8. llvm#156842
akadutta
pushed a commit
to akadutta/llvm-project
that referenced
this pull request
Oct 14, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] Introduce the "alloc-token" sanitizer kind, in preparation of wiring it up. Currently this is a no-op, and any attempt to enable it will result in failure: clang: error: unsupported option '-fsanitize=alloc-token' for target 'x86_64-unknown-linux-gnu' In this step we can already wire up the `sanitize_alloc_token` IR attribute where the instrumentation is enabled. Subsequent changes will complete wiring up the AllocToken pass. --- This change is part of the following series: 1. llvm#160131 2. llvm#156838 3. llvm#162098 4. llvm#162099 5. llvm#156839 6. llvm#156840 7. llvm#156841 8. llvm#156842
akadutta
pushed a commit
to akadutta/llvm-project
that referenced
this pull request
Oct 14, 2025
[ Reland after 7815df1 ("[Clang] Fix brittle print-header-json.c test") ] For new expressions, the allocated type is syntactically known and we can trivially emit the !alloc_token metadata. A subsequent change will wire up the AllocToken pass and introduce appropriate tests. --- This change is part of the following series: 1. llvm#160131 2. llvm#156838 3. llvm#162098 4. llvm#162099 5. llvm#156839 6. llvm#156840 7. llvm#156841 8. llvm#156842
akadutta
pushed a commit
to akadutta/llvm-project
that referenced
this pull request
Oct 14, 2025
Wire up the `-fsanitize=alloc-token` command-line option, hooking up
the `AllocToken` pass -- it provides allocation tokens to compatible
runtime allocators, enabling different heap organization strategies,
e.g. hardening schemes based on heap partitioning.
The instrumentation rewrites standard allocation calls into variants
that accept an additional `size_t token_id` argument. For example,
calls to `malloc(size)` become `__alloc_token_malloc(size, token_id)`,
and a C++ `new MyType` expression will call
`__alloc_token__Znwm(size, token_id)`.
Currently untyped allocation calls do not yet have `!alloc_token`
metadata, and therefore receive the fallback token only. This will be
fixed in subsequent changes through best-effort type-inference.
One benefit of the instrumentation approach is that it can be applied
transparently to large codebases, and scales in deployment as other
sanitizers.
Similarly to other sanitizers, instrumentation can selectively be
controlled using `__attribute__((no_sanitize("alloc-token")))`. Support
for sanitizer ignorelists to disable instrumentation for specific
functions or source files is implemented.
See clang/docs/AllocToken.rst for more usage instructions.
Link:
https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434
---
This change is part of the following series:
1. llvm#160131
2. llvm#156838
3. llvm#162098
4. llvm#162099
5. llvm#156839
6. llvm#156840
7. llvm#156841
8. llvm#156842
akadutta
pushed a commit
to akadutta/llvm-project
that referenced
this pull request
Oct 14, 2025
Implement the TypeHashPointerSplit mode: This mode assigns a token ID based on the hash of the allocated type's name, where the top half ID-space is reserved for types that contain pointers and the bottom half for types that do not contain pointers. This mode with max tokens of 2 (`-falloc-token-max=2`) may also be valuable for heap hardening strategies that simply separate pointer types from non-pointer types. Make it the new default mode. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. llvm#160131 2. llvm#156838 3. llvm#162098 4. llvm#162099 5. llvm#156839 6. llvm#156840 7. llvm#156841 8. llvm#156842
akadutta
pushed a commit
to akadutta/llvm-project
that referenced
this pull request
Oct 14, 2025
llvm#156841) For the AllocToken pass to accurately calculate token ID hints, we need to attach `!alloc_token` metadata for allocation calls. Unlike new expressions, untyped allocation calls (like `malloc`, `calloc`, `::operator new(..)`, `__builtin_operator_new`, etc.) have no syntactic type associated with them. For -fsanitize=alloc-token, type hints are sufficient, and we can attempt to infer the type based on common idioms. When encountering allocation calls (with `__attribute__((malloc))` or `__attribute__((alloc_size(..))`), attach `!alloc_token` by inferring the allocated type from (a) sizeof argument expressions such as `malloc(sizeof(MyType))`, and (b) casts such as `(MyType*)malloc(4096)`. Note that non-standard allocation functions with these attributes are not instrumented by default. Use `-fsanitize-alloc-token-extended` to instrument them as well. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 --- This change is part of the following series: 1. llvm#160131 2. llvm#156838 3. llvm#162098 4. llvm#162099 5. llvm#156839 6. llvm#156840 7. llvm#156841 8. llvm#156842
lanza
pushed a commit
to lanza/llvm.vim
that referenced
this pull request
Nov 2, 2025
… metadata (#160131) In preparation of adding the "AllocToken" pass, add the pre-requisite `sanitize_alloc_token` function attribute and `alloc_token` metadata. --- This change is part of the following series: 1. llvm/llvm-project#160131 2. llvm/llvm-project#156838 3. llvm/llvm-project#162098 4. llvm/llvm-project#162099 5. llvm/llvm-project#156839 6. llvm/llvm-project#156840 7. llvm/llvm-project#156841 8. llvm/llvm-project#156842
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For new expressions, the allocated type is syntactically known and we
can trivially emit the !alloc_token metadata. A subsequent change will
wire up the AllocToken pass and introduce appropriate tests.
This change is part of the following series: