Conversation
The `a + γ*(b-a)` introduced by JuliaLang/julia#16572 has the advantage that it increases with `γ` even when `a` and `b` are very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types. Conversely, the `(1-γ)*a + γ*b` which is currently used only for non-finite quantities is robust to overflow but may not always increase with `γ` as when `a` and `b` are very close or (more frequently) equal since precision loss can give a slightly smaller value for a larger `γ`. This can be problematic as it breaks an expected invariant. So keep using the `a + γ*(b-a)` formula when `a ≈ b`, in which case it's almost like returning either `a` or `b` but less arbitrary.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #145 +/- ##
==========================================
+ Coverage 96.98% 96.99% +0.01%
==========================================
Files 1 1
Lines 431 433 +2
==========================================
+ Hits 418 420 +2
Misses 13 13
☔ View full report in Codecov by Sentry. |
bkamins
left a comment
There was a problem hiding this comment.
It required also testing if the function is non-decreasing if we increase b and switch the formula, but I tested it and it holds.
It's already covered by the test added a long time ago by JuliaLang/julia#16572. That's how I realized the problem. ;-) EDIT: You mean |
|
In general I mean that it should be monotonic in |
|
OK. So you mean two tests like this are needed? @test issorted(quantile([1.0, 1.0+eps(), 1.0+2eps(), 1.0+3eps()], range(0, 1, length=100)))
@test issorted(quantile([1.0, 1.0+2eps(), 1.0+4eps(), 1.0+6eps()], range(0, 1, length=100))) |
|
Yes - something like this (this is not strictly needed 😄, but I run such tests and they were OK). |
Before #145 `Date` and `DateTime` were supported with `quantile` as long as the cut point falls between two equal values. Restore this behavior as some code may rely on this given that it is the most common situation with large datasets.
Before #145 `Date` and `DateTime` were supported with `quantile` as long as the cut point falls between two equal values. Restore this behavior as some code may rely on this given that it is the most common situation with large datasets.
The
a + γ*(b-a)introduced by JuliaLang/julia#16572 has the advantage that it increases withγeven whenaandbare very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types.Conversely, the
(1-γ)*a + γ*bwhich is currently used only for non-finite quantities is robust to overflow but may not always increase withγas whenaandbare very close or (more frequently) equal since precision loss can give a slightly smaller value for a largerγ. This can be problematic as it breaks an expected invariant.So keep using the
a + γ*(b-a)formula whena ≈ b, in which case it's almost like returning eitheraorbbut less arbitrary.Fixes #144.