[LangRef] Do not make align imply dereferenceability#158062
Conversation
We currently specify that something like `load i8, align 16384` implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type. We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do). There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.) Fixes llvm#90446.
|
@llvm/pr-subscribers-llvm-ir Author: Nikita Popov (nikic) ChangesWe currently specify that something like We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do). There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.) Fixes #90446. Full diff: https://github.com/llvm/llvm-project/pull/158062.diff 1 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 45ae2327323d6..a98fd351e54cd 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11239,11 +11239,9 @@ responsibility of the code emitter to ensure that the alignment information is
correct. Overestimating the alignment results in undefined behavior.
Underestimating the alignment may produce less efficient code. An alignment of
1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+value higher than the size of the loaded type does *not* imply (without target
+specific knowledge) that memory up to the alignment value bytes can be safely
+loaded without trapping.
The alignment is only optional when parsing textual IR; for in-memory IR, it is
always present. An omitted ``align`` argument means that the operation has the
@@ -11379,12 +11377,10 @@ operation (that is, the alignment of the memory address). It is the
responsibility of the code emitter to ensure that the alignment information is
correct. Overestimating the alignment results in undefined behavior.
Underestimating the alignment may produce less efficient code. An alignment of
-1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
-value higher than the size of the loaded type implies memory up to the
-alignment value bytes can be safely loaded without trapping in the default
-address space. Access of the high bytes can interfere with debugging tools, so
-should not be accessed if the function has the ``sanitize_thread`` or
-``sanitize_address`` attributes.
+1 is always safe. The maximum possible alignment is ``1 << 32``. An alignment
+value higher than the size of the stored type does *not* imply (without target
+specific knowledge) that memory up to the alignment value bytes can be safely
+loaded without trapping.
The alignment is only optional when parsing textual IR; for in-memory IR, it is
always present. An omitted ``align`` argument means that the operation has the
|
fhahn
left a comment
There was a problem hiding this comment.
This makes sense to me and makes things clearer w.r.t. to certain extensions for security (e.g. AArch64's MTE limits the size of memoyr that can be accessed speculatively to 16 byte granulas)
Best to let this sit for a few days, to give others a chance to chime in
|
My concern with this is that we don't have a way to specify when the IR semantics changes. If someone runs codegenprepare and then takes that IR and reoptimizes it, bad things can happen if we have different semantics for different compiler phases. While I'm favorable of this particular change because it makes the semantics more tight and doesn't require an artificial increase of object sizes, I do think it's not sufficient to fix the issue. Maybe we can have a module attribute that indicates that accesses and objects have their size round up? |
To clarify, your concern is CodeGenPrepare performs transforms which are illegal under the new rules, and we need some way to represent the transformed load operation? That seems like a legitimate concern... but I don't know if we need to fix it before we fix LangRef. I don't think we're planning to make any specific changes to LLVM in response to this LangRef change. Do you know exactly which transform we're talking about? |
|
For codegen, @arsenm can shime in. I'm not aware of any test case in trunk that takes advantage of the alignment. However, LoadStoreVectorizer does something funky: ; Transforms/LoadStoreVectorizer/batch-aa-compile-time.ll
@global_mem = global 1 bytes, align 4
define void @compile-time-test() {
entry:
%__constexpr_1 = ptrtoint ptr @global_mem to i32
%__constexpr_0 = inttoptr i32 %__constexpr_1 to ptr
%global_base_loads = gep ptr %__constexpr_0, 1 x i64 0
%local_base_stores = alloca i64 512, align 4
br label %loop
loop:
%load_0 = load i8, ptr %global_base_loads, align 4
%ptr_1 = gep ptr %global_base_loads, 1 x i64 1
%load_1 = load i8, ptr %ptr_1, align 1
%ptr_2 = gep ptr %global_base_loads, 1 x i64 2
%load_2 = load i8, ptr %ptr_2, align 2
%ptr_3 = gep ptr %global_base_loads, 1 x i64 3
%load_3 = load i8, ptr %ptr_3, align 1
...
=>
define void @compile-time-test() {
entry:
%__constexpr_1 = ptrtoint ptr @global_mem to i32
%__constexpr_0 = inttoptr i32 %__constexpr_1 to ptr
%global_base_loads = gep ptr %__constexpr_0, 1 x i64 0
%local_base_stores = alloca i64 512, align 4
br label %loop
loop:
%#0 = load <4 x i8>, ptr %global_base_loads, align 4It transforms 4x i8 loads into a <4 x i8> load. If alignment implies dereferenceability, this transformation is correct. Otherwise, it's not. We are working with pointers result from int2ptr, so p+1 may point to a different object. Since each load can only read from a single object at a time, widening the load to a vectorized one is wrong. |
Waaat? Surely int2ptr cannot bypass the provenance model like that? |
|
Do I understand correctly that you are operating under a model where pointers (specifically int2ptr pointers) may have multiple provenances, but memory accesses can only work on a single object? My expectation would be that either:
If it's not one of those options, then even without this LangRef change, you can't do under-aligned load/store widening, which is something we also do. |
|
Well int2ptr pointers are not well defined yet. But they must have multiple provenance as they have to alias with potentially multiple int2ptr that happened before. ... = ptr2int %p
... = ptr2int %q
...
if (...) {
%a = int2ptr %v
store i8 0, %a
} else {
%b = int2ptr %v
%c = gep %b, 3
store i8 0, %c
}
=>
%a = int2ptr %v
if (...) {
store i8 0, %a
} else {
%c = gep %a, 3
store i8 0, %c
}If %p/%q are consecutive objects, the accesses through %a and %c access different objects. But they both access through the same pointer in the optimized program (hoisting common instruction). |
|
I think that specific example is compatible with single provenance (with angelic non-determinism). But a variant that executes both branches would not be able to CSE the inttoptrs in a single provenance model. But anyway, I think that my main point still stands: If you allow a pointer to have multiple provenances, you also need to allow multiple provenances during memory accesses. As long as you do that, the LoopVectorize example would not be affected by this change. See https://llvm.godbolt.org/z/ej7bvoPY8 for an example transform that merges multiple adjacent stores into a wide unaligned store. |
I believe we have some load widening that relies on this. Not sure if it was in AMDGPU or in the generic DAG combiner. We kind of need the alignment there as a hack, as we don't have properly propagated dereferenceable information. i.e., the flag on the MachineMemOperand is a boolean isDereferencable. |
Looks like it's an AMDGPU-specific combine: As this happens in SDAG rather than IR, it should be fine for the purpose of this change. |
|
@nunoplopes Do you still have concerns with this change? |
|
What convinced me was that example of merging multiple consecutive stores into one. We need that to be correct, irrespective of the alignment. So this change to langref doesn't make things any more difficult. But I'm concerned about how prevalent might be the idea that a memory operation only touches a single object. We need to audit alias analysis to make sure there aren't decisions being made based on each operation touching a single object. Anyway, but that's a worry for another time. Feel free to merge this. Thanks and sorry for keeping you waiting. |
We currently specify that something like
load i8, align 16384implies that the object is actually dereferenceable up to 16384 bytes, rather than only the one byte implied by the load type.We should stop doing that, because it makes it invalid to infer alignments larger than the load/store type, which is something we do (and want to do).
There is some backend code that does make use of this property by widening accesses and extracting part of them. However, I believe we should be justifying that based on target-specific guarantees, rather than a generic IR property. (The reasoning goes something like this: Typically, memory protection has page granularity, so widening a load to the alignment will not trap, as long as the alignment is not larger than the page size, which is true for any practically interesting access size.)
Fixes #90446.