SPIR-V robust resource access

Posting here for criticism and comments. Once discussions have settled we could put this doc under and investigation/ directory in this repo.

# SPIR-V robust resource access

Corentin Wallez, David Neto, Google

This document explores how SPIR-V can be made robust to satisfy the constraints of WebGPU. It describes both shader code transformations and API-side validation that result in complete robust resource access in the SPIR-V shader (attributes aren't handled). The SPIR-V logical addressing model that would have to be required for portability simplifies the analysis immensely and leave only a few cases to handle.

# What is robust access

SPIR-V is a candidate shading language for WebGPU. One hard constraint of WebGPU is that it should be secure and in particular prevent shaders from reading memory that could contain data from other applications. This includes both accessing other applications' GPU addresses as well as reading uninitialized data that could contain leftover values. The same goes for preventing memory writes outside of the memory the application owns, and atomics.

The Vulkan API defines two primary types of resources: buffers and images. Buffers include the equivalent of OpenGL SSBO and UBO and views into those. (See [Vulkan section 11. Resource Creation](https://www.khronos.org/registry/vulkan/specs/1.0-wsi_extensions/html/vkspec.html#resources).)

OpenGL and Vulkan have introduced a concept of "robust buffer access" that can be enabled and uses implementation specific techniques to make sure read and writes to buffer resources cannot access out of the bounds of the resources passed to the shader. In bounds accesses aren't modified but out-of-bounds accesses can become any of the following:

*   Be discard for writes
*   Access any location within the resource for reads and writes
*   Return zero values for reads or (0, 0, 0, X) with X being 0, 1, -1, or extrema for integers, or -0.0, +0.0, -1.0, +1.0 for floating point values
*   Atomics can return undefined values.

Robust buffer access also covers vertex and index fetching which won't be discussed in this document. Vulkan and OpenGL don't cover robust image access however.

[OpenGL ES robust buffer access extension](https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_robust_buffer_access_behavior.txt)

[Vulkan robustBufferAccess feature](https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#features-features) (first item after the big feature table)

Because not all drivers implement robust buffer access (and some implement it incorrectly), we need to make sure we can add robust buffer access to existing shaders via a code transformation. Below we make a couple of assumptions:

*   Resources are cleared on creation, or appear as if they are, so that no uninitialized memory can be read.
*   On the API side we are able to validate at draw-time that the buffer views have a minimum size, and skip GPU work if the validation fails.

# SPIR-V logical addressing mode

SPIR-V modules must declare and use a single addressing model that describes the shape of pointers. The logical addressing model encodes the constraints of "shaders for graphics" in APIs like OpenGL, Vulkan and even D3D12. These environments use a model where no recursion is allowed, and where there are no general-purpose pointers, but instead references within objects allocated before the shader begins. Such a reference is either an object reference (an `OpVariable` definition), or derived from such a reference via sub-object indexing (an "access chain"). Examples of sub-object indexing include going from a struct to one of its members, or from an array to one of its elements. The environments also require that if a shader was completely inlined, then the implementation could statically infer which resource each memory access touches.

Note on terminology: SPIR-V calls an object reference a "pointer". Different environments define what you can do with a pointer. We are describing logical addressing mode which are designed to tightly constrain pointer semantics, and therefore logical addressing mode pointers are not general-purpose pointers like we are used to in C++ or C.

The [SPIR-V validation rules](https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html#_universal_validation_rules) add the following constraints when using the logical addressing model:

*   An object referenced by an `OpVariable` cannot contain any pointer. (You can't load or store a pointer.)
*   All constant indices to array accesses must be positive (or unsigned).
*   A pointer can only be created from `OpVariable`, `OpAccessChain`, `OpInBoundsAccessChain`, `OpFunctionParameter`, `OpImageTexelPointer`, `OpCopyObject`.
*   A pointer can only be used by `OpLoad`, `OpStore`, `OpAccessChain`, `OpInBoundsAccessChain`, `OpFunctionCall`, `OpImageTexelPointer`, `OpCopyMemory`, `OpCopyObject`, `OpAtomic`.

Basically in logical addressing mode, pointers correspond to partial address computations that can only be used to act on elementary types or create pointers to parts of the pointee:

*   A pointer is an opaque value:
    *   A pointer has no bit representation, no size, and can't be stored or loaded.
    *   A pointer cannot be reinterpreted as a pointer to a different type.
    *   A pointer cannot be converted to a numeric type.
    *   You can't convert a different type (e.g. unsigned integer) to a pointer.
*   All "pointer arithmetic" is done via the "access chain" instructions `OpAccessChain` and `OpInBoundsAccessChain`.
*   They can only point to a full object (structure, array, matrix, vector, scalar or boolean, or an opaque object without substructure such as an image). Pointers at a range in an array aren't possible.
*   Pointed-to objects cannot contain pointers (`OpVariable` constraint).
*   There are no function pointers.

Our strategy to add robust buffer access to existing shaders is to make pointers that would be out of bounds be "clamped" to be in bounds. This way all subsequent out-of-bounds load, store or atomic operations act on some in-bounds location, which is an allowed robust buffer access behavior.

Ways to get a pointer are the following:

*   Via `OpFunctionParameter` or `OpCopyObject` that have the same value as an existing pointer that is in-bounds by construction.
*   Via `OpAccessChain` and `OpInBoundsAccessChain` which are address calculations in the sense that they go from a reference to an object X to a reference to a subobject of X. (An `OpAccessChain` instruction is the equivalent of an LLVM `getelementptr` instruction where the first index element is zero.) *These operations will have to be instrumented to always return pointers that are in-bounds.*
*   Via `OpImageTexelPointer`, which are address calculations for texels inside a swizzled image. Likewise these will have to be instrumented.
*   Via top-level `OpVariables` that point to resources. The API side can make sure the resources are big enough to hold the pointed type entirely (up to unsized arrays, see discussion below).
*   Via function-local `OpVariables` that point to local variables. These are in-bounds by construction, have the a lifetime of the whole function, and cannot be returned by the function.

# Detail of transforms

## Clamping of image accesses {#clamping-of-image-accesses}

Access to images without a sampler must be done in-bounds as there is no guarantee from SPIR-V. This covers both `OpImageTexelPointer` and other sampler-less image operations like `OpImageRead`, `OpImageGather` etc.

All these operations take an `OpTypeImage` or a pointer to an `OpTypeImage`. The size of images can be queried via `OpImageQuerySize` and `OpImageQuerySizeLod` from the `OpTypeImage`. Thus for all sampler-less image operations it is possible to query the image size and clamp the coordinates before they are passed to the operation (the API can validate that all images' views are at least 1x1 on the base mip level). 

With such a transform, all pointers returned from `OpImageTexelPointer` are guaranteed to be in-bounds.

## Simple access chain clamping

`OpAccessChainInBounds` is equivalent to `OpAccessChain` except that the application pinky-promises that the access is in bounds. We don't trust that application so we'll treat `OpAccessChainInBounds` like `OpAccessChain`.

The `OpAccessChain` instruction acts on a reference (pointer) to a composite object. Composite objects are matrices, vectors, or structures or arrays of other objects (composite or scalar). `OpAccessChain` is used to create a pointer to a sub-object from a bunch of indices that are used at each step to select the structure member or array element.  (An `OpAccessChain` instruction is equivalent to an LLVM `getelementptr` instruction that has a first index of zero.)

The transform would insert code to clamp indices to a valid range.  The valid range for each index depends on the pointee type being referenced at that level of indirection.  For structure, matrix, vector, and fixed-size array types:

*   The minimum index is zero. SPIR-V requires each of these types (except structure) to have at least one member 
*   The maximum index is derived from the pointee type definition.  For example:
    *   The maximum index into an array is one less than the number of elements in the array.  An array length may be a "specialization constant" but the values for these are fixed at pipeline creation time.
    *   The maximum index into a structure is one less than the number of structure members.

An index into structure value must be a compile-time constant and will have been validated by a SPIR-V validator before the robust buffer access transform is run. Access into structures with no member will have been validated by the validator too.

Except for a pointee type of `OpTypeRuntimeArray`, the maximum index is known at compile time or pipeline creation time and a clamp of the index with a constant will be added. The only case remaining is for `OpTypeRuntimeArray`.

## Fat-pointers for unsized arrays

The top-level structure for a resource can have an `OpTypeRuntimeArray` as its last member that corresponds to an unsized array at the end of an SSBO in OpenGL / Vulkan. This is the only allowed case for unsized arrays.

The size of the array can be queried in the shader from a pointer to its parent struct via `OpArrayLength`. This means that when an access chain starts from the struct and also indexes the runtime array, we can easily add clamping. The only difficult case is when a pointer to an `OpTypeRuntimeArray` is created, as it is not possible to go back to its parent struct to query the size.

Pointers to unsized arrays must become "fat-pointers" that contain both the size and a pointer to the `OpTypeRuntimeArray`. This transformation isn't hard to make and can be done without force-inlining all the SPIR-V code. Then when an `OpAccessChain` operates on an `OpTypeRuntimeArray`, it clamps the first index with the size stored in the fat pointer. The API-side validation can ensure there is sufficient space for at least one element of the unsized array so the resulting pointer is in-bounds.

## Validation on the API side

The code transforms assume the following:

1.  The top level of image views isn't empty.
2.  Buffer views contain enough space for the "sized" part of the structures, and at least one element of the unsized part if present.

Assumption 1) is easy to validate and something that we want to check for in WebGPU anyways as, for example, creating an empty image view is invalid in Vulkan.

Assumption 2) will require additional checks that wouldn't happen on backends with correct robust buffer access behavior. Such validation would work by computing, for each resource of each pipeline, what the minimum size needs to be, then on draw commands the size of bound buffer views would be checked (optimized with dirty bits etc. of course).

# Conclusion

We've shown a practical way to instrument SPIR-V code to add robust resource access behavior, with help from a little API-side validation. All the statically checkable constraints on SPIR-V code for this behavior are already encoded in the logical addressing mode such that only some instrumentation for runtime validation is needed. The logical addressing is not too constraining however as it is sufficient for all Vulkan (and D3D12) games, and a requirement for WebGPU for portability.

A drawback of using API-side validation to add back robust resource access behavior is that depending on hardware / backing API support for robust buffer access, the application might see different results for the same program. A command buffer might mostly run when there is hardware support, but be validated out when emulation happens, leading to different rendering results. Requiring at least one element of the unsized part of resources is another wart that we couldn't get rid of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIR-V robust resource access #33

SPIR-V robust resource access

What is robust access

SPIR-V logical addressing mode

Detail of transforms

Clamping of image accesses {#clamping-of-image-accesses}

Simple access chain clamping

Fat-pointers for unsized arrays

Validation on the API side

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SPIR-V robust resource access #33

Description

SPIR-V robust resource access

What is robust access

SPIR-V logical addressing mode

Detail of transforms

Clamping of image accesses {#clamping-of-image-accesses}

Simple access chain clamping

Fat-pointers for unsized arrays

Validation on the API side

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions