Floating-point instructions on x86, ARM, and others produce NaN values with differing bit patterns. To support users that require fully deterministic computation, we should add a mode which enables automatic canonicalization of NaN values.
In particular, for WebAssembly, NaN bit patterns are the only non-determinism in the language. So if we had an option to make NaN bit patterns deterministic for users that want that, we could run WebAssembly entirely deterministically (other than resource limitations, and eventually threads and SIMD which are expected to come with their own nondeterminism). This isn't required by the WebAssembly spec, but it's not prohibited either.
Background: NaN bit patterns
Floating-point values are represented as a sign bit, an exponent, and a significand (sometimes called the mantissa field or the fraction field). See here for an illustration of what this looks like for f32; f64 is the same but with bigger fields.
NaN is defined as a special case: when the exponent field is all ones, and the significand has at least one non-zero bit, it's special-case defined to mean NaN. This means that there are many possible bit patterns which mean NaN.
IEEE 754 doesn't fully specify what the bits of NaN values should be when the hardware generates them. And, it turns out, popular hardware platforms ended up doing different things: when a NaN is generated on x86, it has a different bit pattern than when a NaN is generated on ARM, and so on.
And, the WebAssembly spec chose to expose this nondeterminsim, rather than take on extra overhead to hide it, so WebAssembly programs may see different NaN bit patterns depending on the underlying platform they're running on.
Background: Where NaN bit patterns matter
For most arithmetic instructions, all that matters is whether or not a value is some NaN, and not which specific bit pattern it is. So most of the time, the bit patterns don't matter. However, they are observable in a few specific circumstances: if a NaN is stored to memory, and the bytes of memory are later inspected, if a floating-point value is bitcasted to integer, if it is passed to a call or returned from a call to another function (which we assume might store the value), if is the operand of a copysign, or if it's the operand of certain target-specific instructions.
Canonicalizing NaNs
Canonicalizing here means checking to see if a value is some NaN, and if so, replacing it with a specific hand-crafted "canonical" or "default" NaN, so that the result has a deterministic bit pattern. This typically adds some overhead, and the benefit is determinism.
To test whether a value is NaN, we can use floating-point equality comparisons. NaN is defined to be not equal to itself, while all other values (including Infinity) are defined to be equal to themselves, so x != x is an efficient way to test whether x is some NaN.
A canonicalization sequence can consist of such a comparison, followed by a select, with an f32const or f64const to provide the canonical alternative.
Cretonne implementation
The first step can be to insert canonicalizing code after every floating-point arithmetic instruction, to eliminate the nondeterminism immediately whenever it appears. That would be a good way to start, and though it would have a fair amount of overhead, it would likely still be much faster than using software floating-point libraries.
Not all users will want this though, so we should create a new pass (see preopt.rs for example of how to write a pass) for this, which can be easily enabled or disabled. To start with, the pass can just visit all instructions in a function, and replace each floating-point arithmetic instruction with a sequence that does that operation followed by a canonicalization step.
For these purposes, fneg, fabs, and fcopysign are not considered to count as "arithmetic" instructions, because they're defined to just operate in terms of the sign bit and leave all other bits alone.
In follow-up steps, we can make the pass more clever and avoid inserting canonicalization code in places where bit patterns don't "escape" computations.
Floating-point instructions on x86, ARM, and others produce NaN values with differing bit patterns. To support users that require fully deterministic computation, we should add a mode which enables automatic canonicalization of NaN values.
In particular, for WebAssembly, NaN bit patterns are the only non-determinism in the language. So if we had an option to make NaN bit patterns deterministic for users that want that, we could run WebAssembly entirely deterministically (other than resource limitations, and eventually threads and SIMD which are expected to come with their own nondeterminism). This isn't required by the WebAssembly spec, but it's not prohibited either.
Background: NaN bit patterns
Floating-point values are represented as a sign bit, an exponent, and a significand (sometimes called the mantissa field or the fraction field). See here for an illustration of what this looks like for
f32;f64is the same but with bigger fields.NaN is defined as a special case: when the exponent field is all ones, and the significand has at least one non-zero bit, it's special-case defined to mean NaN. This means that there are many possible bit patterns which mean NaN.
IEEE 754 doesn't fully specify what the bits of NaN values should be when the hardware generates them. And, it turns out, popular hardware platforms ended up doing different things: when a NaN is generated on x86, it has a different bit pattern than when a NaN is generated on ARM, and so on.
And, the WebAssembly spec chose to expose this nondeterminsim, rather than take on extra overhead to hide it, so WebAssembly programs may see different NaN bit patterns depending on the underlying platform they're running on.
Background: Where NaN bit patterns matter
For most arithmetic instructions, all that matters is whether or not a value is some NaN, and not which specific bit pattern it is. So most of the time, the bit patterns don't matter. However, they are observable in a few specific circumstances: if a NaN is stored to memory, and the bytes of memory are later inspected, if a floating-point value is bitcasted to integer, if it is passed to a call or returned from a call to another function (which we assume might store the value), if is the operand of a copysign, or if it's the operand of certain target-specific instructions.
Canonicalizing NaNs
Canonicalizing here means checking to see if a value is some NaN, and if so, replacing it with a specific hand-crafted "canonical" or "default" NaN, so that the result has a deterministic bit pattern. This typically adds some overhead, and the benefit is determinism.
To test whether a value is NaN, we can use floating-point equality comparisons. NaN is defined to be not equal to itself, while all other values (including Infinity) are defined to be equal to themselves, so
x != xis an efficient way to test whetherxis some NaN.A canonicalization sequence can consist of such a comparison, followed by a select, with an f32const or f64const to provide the canonical alternative.
Cretonne implementation
The first step can be to insert canonicalizing code after every floating-point arithmetic instruction, to eliminate the nondeterminism immediately whenever it appears. That would be a good way to start, and though it would have a fair amount of overhead, it would likely still be much faster than using software floating-point libraries.
Not all users will want this though, so we should create a new pass (see preopt.rs for example of how to write a pass) for this, which can be easily enabled or disabled. To start with, the pass can just visit all instructions in a function, and replace each floating-point arithmetic instruction with a sequence that does that operation followed by a canonicalization step.
For these purposes,
fneg,fabs, andfcopysignare not considered to count as "arithmetic" instructions, because they're defined to just operate in terms of the sign bit and leave all other bits alone.In follow-up steps, we can make the pass more clever and avoid inserting canonicalization code in places where bit patterns don't "escape" computations.