Add ArrayHashingStyle trait for O(1) range hashing and use it only for Numbers#25822
Add ArrayHashingStyle trait for O(1) range hashing and use it only for Numbers#25822
Conversation
…r Numbers
O(1) range hashing requires widen() to avoid overflows when hashing heterogeneous
arrays. Since this requirement can be problematic, add an ArrayHashingStyle trait
to opt-in for this behavior, and use it only for Numbers by default. This assumes
that isequal(x::Number, y::T) will only return true when T<:Number, or at least
that counter-examples are rare enough that it's OK to requires these types to define
the trait: else, hash(::AbstractArray{T}) will be inconsistent with isequal across types.
Do not use O(1) hashing for Dates types, which fixes hashing heterogeneous arrays,
and add corresponding tests.
|
I don't think I agree that it's cleaner to define a trait equivalent to |
If you prefer I can transform the trait into a purely internal function defined and used only in abstractarray.jl. The problem with hardcoding |
This is a straw-man implementation of a simpler array hashing scheme. It's very basic and has room for further optimizations, but it's intention is to provide a starting point as an alternative to #25822. In short: This hashes the size of the array and then the first three distinct elements and their linear indices and then the last three distinct elements and their linear distance from the end of the array. The exact scheme here is open to bikeshedding -- we could use more or less distinct elements. I use "distinct elements" as a way of ensuring that all relatively empty sparse arrays don't hash similarly, and I hash elements at both the beginning and end of the array because they're the simplest to get to and final elements will be the "most expensive" to discover that they differ if we have to fall back to an equality check.
O(1) range hashing requires
widento avoid overflows when hashing heterogeneous arrays. Since this requirement can be problematic, add anArrayHashingStyletrait to opt-in for this behavior, and use it only forNumbers by default. This assumes thatisequal(x::Number, y::T)will only return true when T<:Number, or at least that counter-examples are rare enough that it's OK to requires these types to define the trait: else,hash(::AbstractArray{T})will be inconsistent with isequal across types.Do not use O(1) hashing for
Datestypes, which fixes hashing heterogeneous arrays, and add corresponding tests.See previous discussion. The new trait will probably not be used very often by external packages, but it sounds cleaner to define it than to special-case
Numberdirectly in the function. Again, it's annoying that this design is so complex, but at least this limits the burden of implementingwidentoNumbertypes, rather than requiring all types which support-to implement it.