[SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN#33995
[SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN#33995AngersZhuuuu wants to merge 23 commits intoapache:masterfrom
Conversation
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
| hsResult.add(elem) | ||
| if (isNaN(elem)) { | ||
| if (hs.containsNaN() && !hsResult.containsNaN()) { | ||
| arrayBuffer += elem |
There was a problem hiding this comment.
For this, let's wait a little bit for the decision at the first PR.
|
Test build #143265 has finished for PR 33995 at commit
|
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #143308 has finished for PR 33995 at commit
|
|
ping @cloud-fan |
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #143396 has finished for PR 33995 at commit
|
|
ping @cloud-fan |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143412 has finished for PR 33995 at commit
|
…oat.NaN
### What changes were proposed in this pull request?
For query
```
select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [NaN], but it should return [].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayIntersect won't show equal `NaN` value
### How was this patch tested?
Added UT
Closes #33995 from AngersZhuuuu/SPARK-36754.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 2fc7f2f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…oat.NaN
### What changes were proposed in this pull request?
For query
```
select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [NaN], but it should return [].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayIntersect won't show equal `NaN` value
### How was this patch tested?
Added UT
Closes #33995 from AngersZhuuuu/SPARK-36754.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 2fc7f2f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…oat.NaN
### What changes were proposed in this pull request?
For query
```
select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [NaN], but it should return [].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayIntersect won't show equal `NaN` value
### How was this patch tested?
Added UT
Closes #33995 from AngersZhuuuu/SPARK-36754.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 2fc7f2f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
thanks, merging to master/3.2/3.1/3.0! |
|
For clarification @AngersZhuuuu: the PR description says:
Is this the right way around? It seems like we now correctly return |
Oh, sorry for the mistake. Correct is we should return [NaN] |
…oat.NaN
### What changes were proposed in this pull request?
For query
```
select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
```
This returns [NaN], but it should return [].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on apache#33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayIntersect won't show equal `NaN` value
### How was this patch tested?
Added UT
Closes apache#33995 from AngersZhuuuu/SPARK-36754.
Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 2fc7f2f)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
For query
This returns [NaN], but it should return [].
This issue is caused by
OpenHashSetcan't handleDouble.NaNandFloat.NaNtoo.In this pr fix this based on #33955
Why are the changes needed?
Fix bug
Does this PR introduce any user-facing change?
ArrayIntersect won't show equal
NaNvalueHow was this patch tested?
Added UT