Skip to content

[LangRef] Do not make align imply dereferenceability#158062

Merged
nikic merged 1 commit into
llvm:mainfrom
nikic:align-not-deref
Sep 24, 2025
Merged

[LangRef] Do not make align imply dereferenceability#158062
nikic merged 1 commit into
llvm:mainfrom
nikic:align-not-deref

Conversation

@nikic

@nikic nikic commented Sep 11, 2025

Copy link
Copy Markdown
Contributor

We currently specify that something like load i8, align 16384 implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type.

We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do).

There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.)

Fixes #90446.

We currently specify that something like `load i8, align 16384`
implies that the object is actually dereferenceable up to 16384
bytes, rather than only the one byte implied by the load type.

We should stop doing that, because it makes it invalid to infer
alignments larger than the load/store type, which is something we
do (and want to do).

There is some backend code that does make use of this property
by widening accesses and extracting part of them. However,
I believe we should be justifying that based on target-specific
guarantees, rather than a generic IR property. (The reasoning
goes something like this: Typically, memory protection has page
granularity, so widening a load to the alignment will not trap,
as long as the alignment is not larger than the page size, which
is true for any practically interesting access size.)

Fixes llvm#90446.
@llvmbot

llvmbot commented Sep 11, 2025

Copy link
Copy Markdown
Member

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

We currently specify that something like load i8, align 16384 implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type.

We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do).

There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.)

Fixes #90446.


Full diff: https://github.com/llvm/llvm-project/pull/158062.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+7-11)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 45ae2327323d6..a98fd351e54cd 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11239,11 +11239,9 @@ responsibility of the code emitter to ensure that the alignment information is
 correct. Overestimating the alignment results in undefined behavior.
 Underestimating the alignment may produce less efficient code. An alignment of
 1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+value higher than the size of the loaded type does *not* imply (without target
+specific knowledge) that memory up to the alignment value bytes can be safely
+loaded without trapping.
 
 The alignment is only optional when parsing textual IR; for in-memory IR, it is
 always present. An omitted ``align`` argument means that the operation has the
@@ -11379,12 +11377,10 @@ operation (that is, the alignment of the memory address). It is the
 responsibility of the code emitter to ensure that the alignment information is
 correct. Overestimating the alignment results in undefined behavior.
 Underestimating the alignment may produce less efficient code. An alignment of
-1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+1 is always safe. The maximum possible alignment is ``1 << 32``.  An alignment
+value higher than the size of the stored type does *not* imply (without target
+specific knowledge) that memory up to the alignment value bytes can be safely
+loaded without trapping.
 
 The alignment is only optional when parsing textual IR; for in-memory IR, it is
 always present. An omitted ``align`` argument means that the operation has the

@fhahn fhahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me and makes things clearer w.r.t. to certain extensions for security (e.g. AArch64's MTE limits the size of memoyr that can be accessed speculatively to 16 byte granulas)

Best to let this sit for a few days, to give others a chance to chime in

@nunoplopes

Copy link
Copy Markdown
Member

My concern with this is that we don't have a way to specify when the IR semantics changes. If someone runs codegenprepare and then takes that IR and reoptimizes it, bad things can happen if we have different semantics for different compiler phases.

While I'm favorable of this particular change because it makes the semantics more tight and doesn't require an artificial increase of object sizes, I do think it's not sufficient to fix the issue. Maybe we can have a module attribute that indicates that accesses and objects have their size round up?

@efriedma-quic

Copy link
Copy Markdown
Contributor

If someone runs codegenprepare and then takes that IR and reoptimizes it, bad things can happen if we have different semantics for different compiler phases.

To clarify, your concern is CodeGenPrepare performs transforms which are illegal under the new rules, and we need some way to represent the transformed load operation? That seems like a legitimate concern... but I don't know if we need to fix it before we fix LangRef. I don't think we're planning to make any specific changes to LLVM in response to this LangRef change.

Do you know exactly which transform we're talking about?

@nunoplopes

Copy link
Copy Markdown
Member

For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment.

However, LoadStoreVectorizer does something funky:

; Transforms/LoadStoreVectorizer/batch-aa-compile-time.ll
@global_mem = global 1 bytes, align 4

define void @compile-time-test() {
entry:
  %__constexpr_1 = ptrtoint ptr @global_mem to i32
  %__constexpr_0 = inttoptr i32 %__constexpr_1 to ptr
  %global_base_loads = gep ptr %__constexpr_0, 1 x i64 0
  %local_base_stores = alloca i64 512, align 4
  br label %loop

loop:
  %load_0 = load i8, ptr %global_base_loads, align 4
  %ptr_1 = gep ptr %global_base_loads, 1 x i64 1
  %load_1 = load i8, ptr %ptr_1, align 1
  %ptr_2 = gep ptr %global_base_loads, 1 x i64 2
  %load_2 = load i8, ptr %ptr_2, align 2
  %ptr_3 = gep ptr %global_base_loads, 1 x i64 3
  %load_3 = load i8, ptr %ptr_3, align 1
...
=>
define void @compile-time-test() {
entry:
  %__constexpr_1 = ptrtoint ptr @global_mem to i32
  %__constexpr_0 = inttoptr i32 %__constexpr_1 to ptr
  %global_base_loads = gep ptr %__constexpr_0, 1 x i64 0
  %local_base_stores = alloca i64 512, align 4
  br label %loop

loop:
  %#0 = load <4 x i8>, ptr %global_base_loads, align 4

It transforms 4x i8 loads into a <4 x i8> load. If alignment implies dereferenceability, this transformation is correct. Otherwise, it's not. We are working with pointers result from int2ptr, so p+1 may point to a different object. Since each load can only read from a single object at a time, widening the load to a vectorized one is wrong.
So this kind of vectorization is only correct over logical pointers, not physical ones (with the semantics in this PR).

@nikic

nikic commented Sep 12, 2025

Copy link
Copy Markdown
Contributor Author

Otherwise, it's not. We are working with pointers result from int2ptr, so p+1 may point to a different object.

Waaat? Surely int2ptr cannot bypass the provenance model like that?

@nikic

nikic commented Sep 12, 2025

Copy link
Copy Markdown
Contributor Author

Do I understand correctly that you are operating under a model where pointers (specifically int2ptr pointers) may have multiple provenances, but memory accesses can only work on a single object?

My expectation would be that either:

  • A pointer can only have at most one provenance (I have always assumed that this is the case...)
  • A pointer can have multiple provenances, but in that case memory accesses should be defined as requiring the pointer to have provenance for all accessed objects. Not to only access one object.

If it's not one of those options, then even without this LangRef change, you can't do under-aligned load/store widening, which is something we also do.

@nunoplopes

Copy link
Copy Markdown
Member

Well int2ptr pointers are not well defined yet. But they must have multiple provenance as they have to alias with potentially multiple int2ptr that happened before.
I don't think we can restrict these pointers to access a single object. See this example:

... = ptr2int %p
... = ptr2int %q
...

if (...) {
  %a = int2ptr %v
  store i8 0, %a
} else {
  %b = int2ptr %v
  %c = gep %b, 3
  store i8 0, %c
}

=>
%a = int2ptr %v
if (...) {
  store i8 0, %a
} else {
  %c = gep %a, 3
  store i8 0, %c
}

If %p/%q are consecutive objects, the accesses through %a and %c access different objects. But they both access through the same pointer in the optimized program (hoisting common instruction).

@nikic

nikic commented Sep 14, 2025

Copy link
Copy Markdown
Contributor Author

I think that specific example is compatible with single provenance (with angelic non-determinism). But a variant that executes both branches would not be able to CSE the inttoptrs in a single provenance model.

But anyway, I think that my main point still stands: If you allow a pointer to have multiple provenances, you also need to allow multiple provenances during memory accesses. As long as you do that, the LoopVectorize example would not be affected by this change.

See https://llvm.godbolt.org/z/ej7bvoPY8 for an example transform that merges multiple adjacent stores into a wide unaligned store.

@arsenm

arsenm commented Sep 16, 2025

Copy link
Copy Markdown
Contributor

For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment.

I believe we have some load widening that relies on this. Not sure if it was in AMDGPU or in the generic DAG combiner. We kind of need the alignment there as a hack, as we don't have properly propagated dereferenceable information. i.e., the flag on the MachineMemOperand is a boolean isDereferencable.

@nikic

nikic commented Sep 16, 2025

Copy link
Copy Markdown
Contributor Author

For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment.

I believe we have some load widening that relies on this. Not sure if it was in AMDGPU or in the generic DAG combiner. We kind of need the alignment there as a hack, as we don't have properly propagated dereferenceable information. i.e., the flag on the MachineMemOperand is a boolean isDereferencable.

Looks like it's an AMDGPU-specific combine:

SDValue SITargetLowering::widenLoad(LoadSDNode *Ld,

As this happens in SDAG rather than IR, it should be fine for the purpose of this change.

@nikic

nikic commented Sep 22, 2025

Copy link
Copy Markdown
Contributor Author

@nunoplopes Do you still have concerns with this change?

@nunoplopes

Copy link
Copy Markdown
Member

What convinced me was that example of merging multiple consecutive stores into one. We need that to be correct, irrespective of the alignment. So this change to langref doesn't make things any more difficult.

But I'm concerned about how prevalent might be the idea that a memory operation only touches a single object. We need to audit alias analysis to make sure there aren't decisions being made based on each operation touching a single object. Anyway, but that's a worry for another time.

Feel free to merge this. Thanks and sorry for keeping you waiting.

@nikic nikic merged commit 755ffa3 into llvm:main Sep 24, 2025
12 checks passed
@nikic nikic deleted the align-not-deref branch September 24, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align attribute doesn't imply dereferenceability

6 participants