[SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan#33993
[SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan#33993AngersZhuuuu wants to merge 16 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
...st/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #143262 has finished for PR 33993 at commit
|
| val elem = array.get(i, elementType) | ||
| if (isNaN(elem)) { | ||
| if (!hs.containsNaN) { | ||
| arrayBuffer += elem |
There was a problem hiding this comment.
For this, let's wait for the decision at the first PR.
There was a problem hiding this comment.
For this, let's wait for the decision at the first PR.
Done
|
Test build #143267 has finished for PR 33993 at commit
|
|
ping @cloud-fan @dongjoon-hyun |
| } | ||
|
|
||
| val processArray = withArrayNullAssignment( | ||
| def withNaNCheck(body: String): String = { |
There was a problem hiding this comment.
shall we move these codegen utils to SQLOpenHashSet as well to reduce duplicated code?
There was a problem hiding this comment.
shall we move these codegen utils to
SQLOpenHashSetas well to reduce duplicated code?
How about to do this after all done. Not only this can be refactored.
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143307 has finished for PR 33993 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| } | ||
|
|
||
| def valueNaN(dataType: DataType): Any = { | ||
| def isNaNFuncAndValueNaN(dataType: DataType, valueName: String): Option[(String, String)] = { |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143345 has finished for PR 33993 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143360 has finished for PR 33993 at commit
|
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143385 has finished for PR 33993 at commit
|
|
Test build #143387 has finished for PR 33993 at commit
|
|
Test build #143391 has finished for PR 33993 at commit
|
|
ping @cloud-fan |
…at.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value
### How was this patch tested?
Added UT
Closes #33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…at.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value
### How was this patch tested?
Added UT
Closes #33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…at.Nan
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
Fix bug
ArrayDistinct won't show duplicated `NaN` value
Added UT
Closes #33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
thanks, merging to master/3.2/3.1/3.0! |
…at.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on apache#33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value
### How was this patch tested?
Added UT
Closes apache#33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
For query
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by
OpenHashSetcan't handleDouble.NaNandFloat.NaNtoo.In this pr fix this based on #33955
Why are the changes needed?
Fix bug
Does this PR introduce any user-facing change?
ArrayDistinct won't show duplicated
NaNvalueHow was this patch tested?
Added UT