fix spurious overflow for Float16(::Rational) by stevengj · Pull Request #52395 · JuliaLang/julia

stevengj · 2023-12-05T00:43:24Z

Update: also fixes Float32 for UInt128, since currently Float32((typemax(UInt128)-0x01) // typemax(UInt128)) gives Nan32.

base/rational.jl

test/float16.jl

base/rational.jl

test/float16.jl

Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859

Constructing a floating-point number from a `Rational` should now be correctly rounded. Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859

vtjnash · 2024-02-07T02:11:26Z

base/rational.jl

+Float16(x::Rational{<:Union{Int128,UInt128}}) =
+    Float16(Float64(x)) # UInt128 overflows Float32, include Int128 for consistency
+Float32(x::Rational{<:Union{Int128,UInt128}}) =
+    Float32(Float64(x)) # UInt128 overflows Float32, include Int128 for consistency


Wouldn't these suffer from double-rounding accuracy loss? @oscardssmith

they would. Doing this properly is fairly hard though.

Also worth noting that Float16(num)/Float16(den) also potentially has triple rounding.

Question is: is this problem worse than the problem this PR fixes? I.e. should we merge and improve this later, or should this wait until this PR does it "right" (whatever that means?)

we should merge and improve later.

That was my thought as well, but want to confirm.

Would doing it "properly" be something like:

a = Float64(x) b = Float32(a) # check if we double-rounded in the wrong direction if x > a && b < a && nextfloat(b) < a b = nextfloat(b) elseif x < a && b > a && prevfloat(b) > a b = prevfloat(b) end return b

A example of a test case where double rounding gives a different result is:

julia> r = 12928845309018111//18014398509481984 12928845309018111//18014398509481984 julia> Float32(r) == Float32(Float64(r)) false

@vtjnash's code doesn't fix this, however — its output matches Float32(Float64(r)) on this example.

(Example constructed by tweaking a number almost exactly halfway between two Float32 values.)

nsajko · 2024-02-07T15:25:46Z

Was there a point to merging this? This PR was just a quick fix compared to the thorough fix of #49749.

oscardssmith · 2024-02-07T15:27:58Z

Primarily that it was a quick fix, compared to the thorough fix of #49749 :). To be slightly more serious, I think we should merge the more thorough fix, but this was an improvement that was ready to merge first.

Constructing a floating-point number from a `Rational` should now be correctly rounded. Implementation approach: 1. Convert the (numerator, denominator) pair to a (sign bit, integral significand, exponent) triplet using integer arithmetic. The integer type in question must be wide enough. 2. Convert the above triplet into an instance of the chosen FP type. There is special support for IEEE 754 floating-point and for `BigFloat`, otherwise a fallback using `ldexp` is used. As a bonus, constructing a `BigFloat` from a `Rational` should now be thread-safe when the rounding mode and precision are provided to the constructor, because there is no access to the global precision or rounding mode settings. Updates JuliaLang#45213 Updates JuliaLang#50940 Updates JuliaLang#52507 Fixes JuliaLang#52394 Closes JuliaLang#52395 Fixes JuliaLang#52859

stevengj added bugfix This change fixes an existing bug float16 labels Dec 5, 2023

giordano added the rationals The Rational type and values thereof label Dec 5, 2023

andrewjradcliffe reviewed Dec 13, 2023

View reviewed changes

base/rational.jl Outdated Show resolved Hide resolved

stevengj commented Dec 13, 2023

View reviewed changes

base/rational.jl Outdated Show resolved Hide resolved

stevengj commented Dec 13, 2023

View reviewed changes

test/float16.jl Show resolved Hide resolved

stevengj commented Dec 13, 2023

View reviewed changes

base/rational.jl Show resolved Hide resolved

stevengj commented Dec 13, 2023

View reviewed changes

test/float16.jl Show resolved Hide resolved

stevengj force-pushed the rational_to_float16 branch from 3c40acc to 4270c36 Compare December 13, 2023 20:58

stevengj added 6 commits December 14, 2023 09:10

fix spurious overflow for Float16(::Rational)

fe92f1d

typo

e4d8979

Update base/rational.jl

b24c5ca

Update test/float16.jl

6aab59b

Update base/rational.jl

ce4b622

Update test/float16.jl

c34a75d

stevengj force-pushed the rational_to_float16 branch from 4270c36 to c34a75d Compare December 14, 2023 14:10

nsajko mentioned this pull request Jan 13, 2024

Base: correctly rounded floats constructed from rationals #49749

Draft

fingolfin approved these changes Feb 7, 2024

View reviewed changes

vtjnash approved these changes Feb 7, 2024

View reviewed changes

vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Feb 7, 2024

vtjnash merged commit bead1d3 into master Feb 7, 2024

vtjnash deleted the rational_to_float16 branch February 7, 2024 14:50

oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Feb 7, 2024

Uh oh!

Conversation

stevengj commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vtjnash Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

oscardssmith Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

oscardssmith Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

fingolfin Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

oscardssmith Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

vtjnash Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

vtjnash Feb 7, 2024

Choose a reason for hiding this comment

Uh oh!

stevengj Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsajko commented Feb 7, 2024

Uh oh!

oscardssmith commented Feb 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

stevengj commented Dec 5, 2023 •

edited

Loading

stevengj Feb 7, 2024 •

edited

Loading