[LangRef] Do not make align imply dereferenceability by nikic · Pull Request #158062 · llvm/llvm-project

nikic · 2025-09-11T12:55:04Z

We currently specify that something like load i8, align 16384 implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type.

We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do).

There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.)

Fixes #90446.

We currently specify that something like `load i8, align 16384` implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type. We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do). There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.) Fixes llvm#90446.

llvmbot · 2025-09-11T12:55:37Z

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

We currently specify that something like load i8, align 16384 implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type.

We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do).

There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.)

Fixes #90446.

Full diff: https://github.com/llvm/llvm-project/pull/158062.diff

1 Files Affected:

(modified) llvm/docs/LangRef.rst (+7-11)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 45ae2327323d6..a98fd351e54cd 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11239,11 +11239,9 @@ responsibility of the code emitter to ensure that the alignment information is
 correct. Overestimating the alignment results in undefined behavior.
 Underestimating the alignment may produce less efficient code. An alignment of
 1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+value higher than the size of the loaded type does *not* imply (without target
+specific knowledge) that memory up to the alignment value bytes can be safely
+loaded without trapping.
 
 The alignment is only optional when parsing textual IR; for in-memory IR, it is
 always present. An omitted ``align`` argument means that the operation has the
@@ -11379,12 +11377,10 @@ operation (that is, the alignment of the memory address). It is the
 responsibility of the code emitter to ensure that the alignment information is
 correct. Overestimating the alignment results in undefined behavior.
 Underestimating the alignment may produce less efficient code. An alignment of
-1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+1 is always safe. The maximum possible alignment is ``1 << 32``.  An alignment
+value higher than the size of the stored type does *not* imply (without target
+specific knowledge) that memory up to the alignment value bytes can be safely
+loaded without trapping.
 
 The alignment is only optional when parsing textual IR; for in-memory IR, it is
 always present. An omitted ``align`` argument means that the operation has the

fhahn

This makes sense to me and makes things clearer w.r.t. to certain extensions for security (e.g. AArch64's MTE limits the size of memoyr that can be accessed speculatively to 16 byte granulas)

Best to let this sit for a few days, to give others a chance to chime in

nunoplopes · 2025-09-11T18:07:46Z

My concern with this is that we don't have a way to specify when the IR semantics changes. If someone runs codegenprepare and then takes that IR and reoptimizes it, bad things can happen if we have different semantics for different compiler phases.

While I'm favorable of this particular change because it makes the semantics more tight and doesn't require an artificial increase of object sizes, I do think it's not sufficient to fix the issue. Maybe we can have a module attribute that indicates that accesses and objects have their size round up?

efriedma-quic · 2025-09-11T20:48:54Z

If someone runs codegenprepare and then takes that IR and reoptimizes it, bad things can happen if we have different semantics for different compiler phases.

To clarify, your concern is CodeGenPrepare performs transforms which are illegal under the new rules, and we need some way to represent the transformed load operation? That seems like a legitimate concern... but I don't know if we need to fix it before we fix LangRef. I don't think we're planning to make any specific changes to LLVM in response to this LangRef change.

Do you know exactly which transform we're talking about?

nunoplopes · 2025-09-12T14:28:06Z

For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment.

However, LoadStoreVectorizer does something funky:

; Transforms/LoadStoreVectorizer/batch-aa-compile-time.ll
@global_mem = global 1 bytes, align 4

define void @compile-time-test() {
entry:
  %__constexpr_1 = ptrtoint ptr @global_mem to i32
  %__constexpr_0 = inttoptr i32 %__constexpr_1 to ptr
  %global_base_loads = gep ptr %__constexpr_0, 1 x i64 0
  %local_base_stores = alloca i64 512, align 4
  br label %loop

loop:
  %load_0 = load i8, ptr %global_base_loads, align 4
  %ptr_1 = gep ptr %global_base_loads, 1 x i64 1
  %load_1 = load i8, ptr %ptr_1, align 1
  %ptr_2 = gep ptr %global_base_loads, 1 x i64 2
  %load_2 = load i8, ptr %ptr_2, align 2
  %ptr_3 = gep ptr %global_base_loads, 1 x i64 3
  %load_3 = load i8, ptr %ptr_3, align 1
...
=>
define void @compile-time-test() {
entry:
  %__constexpr_1 = ptrtoint ptr @global_mem to i32
  %__constexpr_0 = inttoptr i32 %__constexpr_1 to ptr
  %global_base_loads = gep ptr %__constexpr_0, 1 x i64 0
  %local_base_stores = alloca i64 512, align 4
  br label %loop

loop:
  %#0 = load <4 x i8>, ptr %global_base_loads, align 4

It transforms 4x i8 loads into a <4 x i8> load. If alignment implies dereferenceability, this transformation is correct. Otherwise, it's not. We are working with pointers result from int2ptr, so p+1 may point to a different object. Since each load can only read from a single object at a time, widening the load to a vectorized one is wrong.
So this kind of vectorization is only correct over logical pointers, not physical ones (with the semantics in this PR).

nikic · 2025-09-12T14:59:33Z

Otherwise, it's not. We are working with pointers result from int2ptr, so p+1 may point to a different object.

Waaat? Surely int2ptr cannot bypass the provenance model like that?

nikic · 2025-09-12T15:17:14Z

Do I understand correctly that you are operating under a model where pointers (specifically int2ptr pointers) may have multiple provenances, but memory accesses can only work on a single object?

My expectation would be that either:

A pointer can only have at most one provenance (I have always assumed that this is the case...)
A pointer can have multiple provenances, but in that case memory accesses should be defined as requiring the pointer to have provenance for all accessed objects. Not to only access one object.

If it's not one of those options, then even without this LangRef change, you can't do under-aligned load/store widening, which is something we also do.

nunoplopes · 2025-09-14T14:04:33Z

Well int2ptr pointers are not well defined yet. But they must have multiple provenance as they have to alias with potentially multiple int2ptr that happened before.
I don't think we can restrict these pointers to access a single object. See this example:

... = ptr2int %p
... = ptr2int %q
...

if (...) {
  %a = int2ptr %v
  store i8 0, %a
} else {
  %b = int2ptr %v
  %c = gep %b, 3
  store i8 0, %c
}

=>
%a = int2ptr %v
if (...) {
  store i8 0, %a
} else {
  %c = gep %a, 3
  store i8 0, %c
}

If %p/%q are consecutive objects, the accesses through %a and %c access different objects. But they both access through the same pointer in the optimized program (hoisting common instruction).

nikic · 2025-09-14T16:53:45Z

I think that specific example is compatible with single provenance (with angelic non-determinism). But a variant that executes both branches would not be able to CSE the inttoptrs in a single provenance model.

But anyway, I think that my main point still stands: If you allow a pointer to have multiple provenances, you also need to allow multiple provenances during memory accesses. As long as you do that, the LoopVectorize example would not be affected by this change.

See https://llvm.godbolt.org/z/ej7bvoPY8 for an example transform that merges multiple adjacent stores into a wide unaligned store.

arsenm · 2025-09-16T01:13:11Z

For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment.

I believe we have some load widening that relies on this. Not sure if it was in AMDGPU or in the generic DAG combiner. We kind of need the alignment there as a hack, as we don't have properly propagated dereferenceable information. i.e., the flag on the MachineMemOperand is a boolean isDereferencable.

nikic · 2025-09-16T07:39:24Z

For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment.

I believe we have some load widening that relies on this. Not sure if it was in AMDGPU or in the generic DAG combiner. We kind of need the alignment there as a hack, as we don't have properly propagated dereferenceable information. i.e., the flag on the MachineMemOperand is a boolean isDereferencable.

Looks like it's an AMDGPU-specific combine:

llvm-project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Line 11712 in 7af659d

SDValue SITargetLowering::widenLoad(LoadSDNode *Ld,

As this happens in SDAG rather than IR, it should be fine for the purpose of this change.

nikic · 2025-09-22T20:42:06Z

@nunoplopes Do you still have concerns with this change?

nunoplopes · 2025-09-23T18:56:58Z

What convinced me was that example of merging multiple consecutive stores into one. We need that to be correct, irrespective of the alignment. So this change to langref doesn't make things any more difficult.

But I'm concerned about how prevalent might be the idea that a memory operation only touches a single object. We need to audit alias analysis to make sure there aren't decisions being made based on each operation touching a single object. Anyway, but that's a worry for another time.

Feel free to merge this. Thanks and sorry for keeping you waiting.

nikic requested review from arsenm, efriedma-quic, fhahn and nunoplopes September 11, 2025 12:55

llvmbot added the llvm:ir label Sep 11, 2025

fhahn approved these changes Sep 11, 2025

View reviewed changes

nikic merged commit 755ffa3 into llvm:main Sep 24, 2025
12 checks passed

nikic deleted the align-not-deref branch September 24, 2025 10:37

Conversation

nikic commented Sep 11, 2025

Uh oh!

llvmbot commented Sep 11, 2025

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

nunoplopes commented Sep 11, 2025

Uh oh!

efriedma-quic commented Sep 11, 2025

Uh oh!

nunoplopes commented Sep 12, 2025

Uh oh!

nikic commented Sep 12, 2025

Uh oh!

nikic commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nunoplopes commented Sep 14, 2025

Uh oh!

nikic commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Sep 16, 2025

Uh oh!

nikic commented Sep 16, 2025

Uh oh!

nikic commented Sep 22, 2025

Uh oh!

nunoplopes commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nikic commented Sep 12, 2025 •

edited

Loading

nikic commented Sep 14, 2025 •

edited

Loading