Skip to content

Change Float.nan from sNaN to qNaN#10899

Merged
xavierleroy merged 13 commits intoocaml:trunkfrom
gretay-js:quiet_nan
Jul 22, 2022
Merged

Change Float.nan from sNaN to qNaN#10899
xavierleroy merged 13 commits intoocaml:trunkfrom
gretay-js:quiet_nan

Conversation

@gretay-js
Copy link
Copy Markdown
Contributor

@gasche
Copy link
Copy Markdown
Member

gasche commented Jan 14, 2022

It could be nice to have a comment to explain the reason for the new choice, and maybe event to document it in the .mli? Are there floating-point people that we want to warn, and get feedback from? (I'm thinking of the Frama-C people in particular.)

@gretay-js
Copy link
Copy Markdown
Contributor Author

It could be nice to have a comment to explain the reason for the new choice, and maybe event to document it in the .mli?

There is already a comment in float.mli saying "Floating-point operations never raise an exception..."
I think it would be better to add a comment about qNaN to stdlib.ml, not to the .mli because the representation is not exposed to the user. How about something like this:
"[nan] representation is qNaN, not sNaN, because floating-point operations in OCaml never raise an exception".

There is also a comment about nan in stdlib.mli that includes the following sentence:
"Any floating-point operation with [nan] as argument returns [nan] as result."
This is not accurate (for example pow nan 0. evaluates to 1.0), should it be rephrased to something like this:
"Any floating-point operation with [nan] as argument returns [nan] as result, unless otherwise specified in IEEE 754 standard."

@xavierleroy
Copy link
Copy Markdown
Contributor

xavierleroy commented Jan 14, 2022

(Adding @silene to the discussion.)

That's the paradigmatic PR where the code change is easy, but agreeing on the change and making sure it doesn't break anything is difficult!

Elaborating on my comment in #10862

I forgot that Float.nan is a signaling NaN. I wonder if this was a conscious decision or just bad luck...

Float.nan was introduced as Pervasives.nan in commit adc10ef back in 2001, with the same bit pattern 0x7F_F0_00_00_00_00_00_01L that it has today.

Suspiciously, this is the smallest unsigned 64-bit value that encodes a FP NaN, so this may explain the choice of this particular value...

Another possible explanation is that it was the default quiet NaN of some processor at the time -- I suspect the Alpha. The convention "top payload bit = 1 for quiet NaNs, = 0 for signaling NaNs" was not standardized until IEEE 754-2008. Some 1990's architectures used the opposite convention, most notably MIPS and probably Alpha too. In 2001 at Inria there was no MIPS machine left, but we still used Alpha servers, so this may be an explanation.

Enough history. Now the question is: if we put forward one specific NaN value, which is most useful, a quiet NaN or a signaling NaN?

It's not easy to cause signaling NaNs to actually raise signals (there is no ISO C standard API to do this), and OCaml doesn't know how to turn these signals into exceptions, so signaling NaNs can be assumed not to signal. However, some C library functions treat signaling NaNs and quiet NaNs differently (an example is in #10862)...

Of course we could add Float.snan and Float.qnan. But the question remains: what should be "the default NaN", i.e. Float.nan?

@silene
Copy link
Copy Markdown
Contributor

silene commented Jan 14, 2022

I don't have much to say on the topic, so I will just rehash some of the points above.

  1. The C standard mandates a quiet NaN for nan and NAN. So, the principle of least surprise would suggest using the same behavior for OCaml's nan.
  2. Generating a quiet NaN is easy (e.g., 0. /. 0.), but generating a signaling NaN is hard. So, having a snan function is a good idea. And it could even be named signaling_nan as in C++, as it will almost never be used, so better be explicit.
  3. Even on recently produced MIPS processors, the meaning of the most significant bit might be non-standard. Moreover, the behavior can be selected at runtime. So, it might be worth changing the patch to let nan = infinity -. infinity.

@sebfuric
Copy link
Copy Markdown

I agree with the comments by silene. I would like to add that keeping a sNaN in Float would be useful: it is typically the value one should choose to fill an uninitialized array of floats (sNaNs have been introduced for that purpose in IEEE 754).

Deciding between a sNaN and a qNan is difficult because even the documentation associated with Float.nan is contradictory (the NaN which is the result of 0.0 /. 0.0 is necessarily a qNaN while a NaN that always yield a NaN result when passed to any function is necessarily a sNaN). In case of a change to a qNaN, there should be a way to spot locations in the code where an old nan is used, maybe by providing two new definitions (Float.{snan, qnan}) and by decorating nan's definition with [@@ocaml.deprecated "Use Float.snan or Float.qnan instead."] ?

@gretay-js
Copy link
Copy Markdown
Contributor Author

Thank you all for the comments.
I added quiet_nan and signaling_nan to Stdlib and Float, defined Stdlib.nan in terms of quiet_nan, and updated comments. (Tests failing only because of the difference in line numbers.)

Would it be better to put quiet_nan and signaling_nan in Float only, and duplicate the definition of quiet_nan in Stdlib, or not expose quiet_nan at all?

@silene I changed quiet_nan to be defined as you suggested using infinity. By the way, it means that the least significant bit of quiet_nan is not set, unlike for the previous definition of nan (when it was signalling). I don't think it matters, because something like Float.of_string "nan" and other operations without nan inputs can return a qNaN with bottom bit 0 anyway.

@sebfuric I am afraid that adding a deprecation warning on nan would cause too many build failures in existing code that is mostly well behaved.

@xavierleroy
Copy link
Copy Markdown
Contributor

Would it be better to put quiet_nan and signaling_nan in Float only, and duplicate the definition of quiet_nan in Stdlib, or not expose quiet_nan at all?

I would be tempted to add Float.quiet_nan and Float.signaling_nan, but have only Stdlib.nan (and not Stdlib.quiet_nan and Stdlib.signaling_nan). The rationale being that definitions in Stdlib are kind of legacy definitions, and new features are best added to submodules like Float.

@gretay-js
Copy link
Copy Markdown
Contributor Author

I would be tempted to add Float.quiet_nan and Float.signaling_nan, but have only Stdlib.nan (and not Stdlib.quiet_nan and Stdlib.signaling_nan).

Fixed.

@silene I had to change the definition Stdlib.nan back to use a bit pattern instead of the expression infinity -. infinity, because the result of this expression has the most-significant bit set and it causes the tests in lib-format and lib-printf to fail ("-nan" is printed out instead of the expected "nan"). The expression Float.(neg (infinity -. infinity) would work too, at least in my setting, would it be better than a bit pattern?

@gretay-js
Copy link
Copy Markdown
Contributor Author

Ping :)

@silene
Copy link
Copy Markdown
Contributor

silene commented Feb 11, 2022

I don't have a strong opinion. I feel that the tests are somehow flawed, since they rely on the sign of NaN. Expression Float.(neg (infinity -. infinity)) is not much better, since it assumes that the default NaN is negative.

Perhaps using Float.(abs (infinity -. infinity)) would be a better choice. At least it would guarantee a positive NaN, whatever that could mean.

@gretay-js gretay-js force-pushed the quiet_nan branch 2 times, most recently from 5152d19 to f81fc85 Compare February 15, 2022 13:51
@gretay-js
Copy link
Copy Markdown
Contributor Author

I don't have a strong opinion. I feel that the tests are somehow flawed, since they rely on the sign of NaN. Expression Float.(neg (infinity -. infinity)) is not much better, since it assumes that the default NaN is negative.

Perhaps using Float.(abs (infinity -. infinity)) would be a better choice. At least it would guarantee a positive NaN, whatever that could mean.

Fixed, thanks! The CI is green.

@gretay-js
Copy link
Copy Markdown
Contributor Author

@xavierleroy I've addressed all the comments. Can this be merged?

@xavierleroy
Copy link
Copy Markdown
Contributor

Sorry, I was occupied with other things lately.

At any rate, I'm not a fan of abs_float (infinity -. infinity). A justification for using infinity -. infinity as "the" quiet NaN was that it's the default NaN on the host processor, so people who like to look at NaN payloads using Int64.bits_of_float would see the same bits for Stdlib.nan and for a default NaN generated by the processor.

This justification is lost if Stdlib.nan is defined as abs_float (infinity -. infinity). On some platforms you get the default NaN and on others (x86) you don't !

Also, for debugging purposes, a user may wish to turn NaN generation into a signal, using non-portable C code (feenableexcept in glibc) or a (hypothetical?) debugger offering this feature. It's not nice if NaNs are generated at OCaml program start-up, during the initialization of the standard library.

So, I'd much prefer to define the quiet NaN as float_of_bits 0x7F_F8_00_00_00_00_00_01L or float_of_bits 0x7F_FF_FF_FF_FF_FF_FF_FFL or any fixed bit pattern that denotes a qNaN and that you like.

@sebfuric
Copy link
Copy Markdown

sebfuric commented Mar 3, 2022

The payload does not include sign bit, so the sign argument should not be relevant as far as payloads only are to be considered (abs only changes the sign bit of a qNaN).
The debugging argument is however a very good reason to justify direct encoding of NaNs as literal values... but then one faces up with a portability problem because of disagreements in qNaN/sNaN encodings?

@gretay-js
Copy link
Copy Markdown
Contributor Author

So, I'd much prefer to define the quiet NaN as float_of_bits 0x7F_F8_00_00_00_00_00_01L or float_of_bits 0x7F_FF_FF_FF_FF_FF_FF_FFL or any fixed bit pattern that denotes a qNaN and that you like.

done!
Clearly I'm not the right person to prepare this PR, but I think the code is ready now. :)
Thank you for the explanations. It'll be helpful to have them here as a reference for the choice.

@xavierleroy xavierleroy merged commit 2d356aa into ocaml:trunk Jul 22, 2022
@xavierleroy
Copy link
Copy Markdown
Contributor

Sorry for leaving this PR hanging for so long. @Octachron and I agreed to not rush it in OCaml 5.0 and schedule it for 5.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants