Add handling of an empty iterator for mean and var#29033
Add handling of an empty iterator for mean and var#29033nalimilan merged 5 commits intoJuliaLang:masterfrom
Conversation
| @test ismissing(mean([missing, NaN])) | ||
| @test isequal(mean([missing 1.0; 2.0 3.0], dims=1), [missing 2.0]) | ||
| @test mean(skipmissing([1, missing, 2])) === 1.5 | ||
| @test isequal(mean(Complex{Float64}[]), NaN+NaN*im) |
There was a problem hiding this comment.
It would also make sense to test that an empty Int input gives a Float64 output, and that an error is thrown when the element type isn't known (e.g. using a generator).
BTW, you could probably use ===.
There was a problem hiding this comment.
This PR is in muddy waters :) You cannot use === as we have different NaN in Julia and here it would fail. Notice that:
julia> NaN === -NaN
false
and this is exactly the case here, that is why I had to use isequal.
I will add the tests.
There was a problem hiding this comment.
I must be missing something: you really expect NaN::Float64 here, right?
There was a problem hiding this comment.
The situation is more complex (and that is why I have used this as a test):
julia> x = mean(Complex{Float64}[])
NaN + NaN*im
julia> bitstring(x.re)
"1111111111111000000000000000000000000000000000000000000000000000"
julia> bitstring(x.im)
"1111111111111000000000000000000000000000000000000000000000000000"
julia> bitstring(NaN) # see that the bitstring is different
"0111111111111000000000000000000000000000000000000000000000000000"
also you have:
julia> x = mean(Int[])
NaN
julia> x === NaN
false
For the same reason that we have different NaNs in Julia (and as you see they actually happen in practice).
stdlib/Statistics/test/runtests.jl
Outdated
| @test f(skipmissing([1, missing, 2]), 0) === f([1, 2], 0) | ||
| end | ||
|
|
||
| @test isequal(mean(Complex{Float64}[]), NaN) |
There was a problem hiding this comment.
I guess you intended to test var?
stdlib/Statistics/src/Statistics.jl
Outdated
| if y === nothing | ||
| throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))")) | ||
| T = eltype(iterable) | ||
| return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN) |
There was a problem hiding this comment.
So this throws an error when calling zero if eltype returns Any. Maybe better call throw a more explicit error similar to the existing one?
Could use oftype.
There was a problem hiding this comment.
I would leave it as is, because this is how var for AbstractArray is implemented and we want to be non-breaking (so it should have 100% the same behavior - the same outputs and the same errors - so using array or iterator is transparent).
For sure in Julia 2.0 approach we could consider cleaning it up in both places.
There was a problem hiding this comment.
I just mean that instead of letting zero throw the error, it would be clearer to check T === Any and throw a more explicit error.
There was a problem hiding this comment.
The problem is that T does not have to be Any. It can be an arbitrary type that is or is not a subtype of Number. And upfront we do not know if this type:
- defines
zero - has
abs2defined - implements a constructor from
NaN
Any of those may or may not be true so the only way to be consistent is to repeat the same code that is used for a version of var defined for arrays (otherwise I will be able to create a situation where array and iterator var differs).
There was a problem hiding this comment.
OK. We could still print a nice error in the most common case, but then we'd lose consistency.
There was a problem hiding this comment.
This is the point for now. As discussed in Julia 2.0 we can probably clean it up in both methods.
There was a problem hiding this comment.
personally, would use oftype as @nalimilan suggested for clarity
b7d48b9 to
79be956
Compare
79be956 to
40b2ea7
Compare
| @test mean(skipmissing([1, missing, 2])) === 1.5 | ||
| @test isequal(mean(Complex{Float64}[]), NaN+NaN*im) | ||
| @test isequal(mean(skipmissing(Complex{Float64}[])), NaN+NaN*im) | ||
| @test mean(Complex{Float64}[]) isa Complex{Float64} |
There was a problem hiding this comment.
Why not also test the value? Same in other places.
There was a problem hiding this comment.
The value is tested above in line 89 (unless you meant something else).
There was a problem hiding this comment.
Sorry, this one is, I originally spotted mean(Int[]). It would probably be clearer to have them side by side.
There was a problem hiding this comment.
All cleaned up now (hopefully 😄, as you have a keener eye for details 👍)
720b43b to
9e0b0ba
Compare
nalimilan
left a comment
There was a problem hiding this comment.
Thanks, looks good to me, but another review wouldn't hurt.
|
One CI error seems unrelated. |
|
Who would be a good person to review? |
TotalVerb
left a comment
There was a problem hiding this comment.
overall lgtm
There's a mean(::AbstractRange{<:Real}) method on line 134. Might be worth changing that behavior too for consistency, even though ranges can't have missings.
stdlib/Statistics/src/Statistics.jl
Outdated
| if y === nothing | ||
| throw(ArgumentError("variance of empty collection undefined: $(repr(iterable))")) | ||
| T = eltype(iterable) | ||
| return typeof((abs2(zero(T)) + abs2(zero(T)))/2)(NaN) |
There was a problem hiding this comment.
personally, would use oftype as @nalimilan suggested for clarity
|
Thank you. Changed to |
0aa0683 to
0487d17
Compare
stdlib/Statistics/src/Statistics.jl
Outdated
|
|
||
| function mean(r::AbstractRange{<:Real}) | ||
| isempty(r) && throw(ArgumentError("mean of an empty range is undefined")) | ||
| function Statistics.mean(r::AbstractRange{<:Real}) |
|
How do we want to proceed with this PR? |
|
Just merge? |
|
I think just go ahead with this. We can always revert it. |
|
I like when everybody approves but nobody actually merges... :-) |
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
This is a follow up of #28777 fixing what is minimally needed to be consistent in Julia 1.0 between empty arrays and empty collections (needed for
skipmissing).Major cleanup of consistency of the outputs for different scenarios of application of functions from Statistics module are left for Julia 2.0.