Discrepancy in behavior of `ak.any()` for GPU backend

### Version of Awkward Array

2.8.1

### Description and code to reproduce

The evaluation of `ak.any()` for arrays with the `cuda` backend seems to be incorrect in some cases. 

For example, if you download these:
```bash
wget http://uaf-10.t2.ucsd.edu/~kmohrman/public_html_backup/files/parquet_files/100k_from_lindsey_file/test_pq_100k.parquet
wget http://uaf-10.t2.ucsd.edu/~kmohrman/public_html_backup/files/py_files/fromLindsey/ak_from_cudf.py
```

Then the issue can be reproduced with this code:
```python
import pandas as df
import awkward as ak
import cudf
from ak_from_cudf import cudf_to_awkward

filepath = "test_pq_100k.parquet"

# CPU
table_cpu   = df.read_parquet(filepath, columns=["Muon_pt"])
Muon_pt_cpu = ak.Array(table_cpu["Muon_pt"])
mupair_cpu  = ak.combinations(Muon_pt_cpu, 2, fields=["mu1", "mu2"])
ptsum_cpu   = mupair_cpu.mu1 + mupair_cpu.mu2
mask_cpu    = ak.any(ptsum_cpu>30,axis=1)

# GPU
table_gpu   = cudf.read_parquet(filepath, columns=["Muon_pt"])
Muon_pt_gpu = cudf_to_awkward(table_gpu["Muon_pt"])
mupair_gpu  = ak.combinations(Muon_pt_gpu, 2, fields=["mu1", "mu2"])
ptsum_gpu   = mupair_gpu.mu1 + mupair_gpu.mu2
mask_gpu    = ak.any(ptsum_gpu>30,axis=1)

for i,x in enumerate(mupair_cpu):
    mask_agree = mask_cpu[i] == mask_gpu[i]
    if not mask_agree:
        print(f"\nEvent {i}: mask_cpu={mask_cpu[i]}, mask_gpu={mask_gpu[i]}, mask_agree={mask_agree}")
        print(f"\tptsum_cpu: {ptsum_cpu[i]}")
        print(f"\tptsum_gpu: {ptsum_gpu[i]}")
    if i > 54290:
        break
```

If the CPU and GPU results were the same, we would expect nothing to be printed. However, we see that there are many events where the evaluation of the `ak.any()` differs between CPU and GPU (the `break` statement is just to keep the output readable so that too many are not printed), so the reproducer above prints the following:
```
Event 54272: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54273: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54274: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54275: mask_cpu=True, mask_gpu=False, mask_agree=False
	ptsum_cpu: [36.2, 36.1, 37.1, 6.95, 7.95, 7.85]
	ptsum_gpu: [36.219772, 36.116978, 37.125095, 6.945659, 7.953775, 7.8509827]

Event 54277: mask_cpu=True, mask_gpu=False, mask_agree=False
	ptsum_cpu: [37.1]
	ptsum_gpu: [37.097]

Event 54280: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54281: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: [26.9]
	ptsum_gpu: [26.853153]

Event 54282: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54284: mask_cpu=True, mask_gpu=False, mask_agree=False
	ptsum_cpu: [34.2]
	ptsum_gpu: [34.161404]

Event 54285: mask_cpu=True, mask_gpu=False, mask_agree=False
	ptsum_cpu: [86.7, 41, 52.5]
	ptsum_gpu: [86.69585, 41.000923, 52.51365]

Event 54288: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54289: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []

Event 54290: mask_cpu=False, mask_gpu=True, mask_agree=False
	ptsum_cpu: []
	ptsum_gpu: []
```

In these cases where the GPU and CPU results differ, we see the CPU results are correct (based on the `pt` values of the pairs of muons in the events) and the GPU results are incorrect. 


In case it's useful, here are some interesting/odd things about this potential bug:
* The `ptsum_gpu>30` part of the mask seems to agree (between CPU and GPU) for all events, it is only after the `ak.any()` where the discrepancy arises. 
* As can be seen in some of the events printed above, the discrepancy arises even in events where the array of muon pt values is empty. 
* A discrepancy does not arise until the 54272th event in the sample (more than half way through the sample). The value of the mask for all events before this agree perfectly. But then for the events after this there are very frequent discrepancies (as seen in the print output above). 
* Somehow, the bug seems to depend on the length of the array. If you load only a subset of the events around the 54272th event (e.g. just grab a subset like `[54260:54290]` in the `Muon_pt_gpu` and `Muon_pt_cpu` lines), the `ak.any()` mask value for that event (and all the other ones loaded) are evaluated correctly. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in behavior of `ak.any()` for GPU backend #3503

Version of Awkward Array

Description and code to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancy in behavior of ak.any() for GPU backend #3503

Description

Version of Awkward Array

Description and code to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Discrepancy in behavior of `ak.any()` for GPU backend #3503