numpy function very slow on DataArray compared to DataArray.values

First I create some fake latitude and longitude points. I stash them in a dataset, and compute a 2d histogram on those.

```python
#!/usr/bin/env python

import xarray as xr
import numpy as np

lat = np.random.rand(50000) * 180 - 90
lon = np.random.rand(50000) * 360 - 180
d = xr.Dataset({'latitude':lat, 'longitude':lon})

latbins = np.r_[-90:90:2.]
lonbins = np.r_[-180:180:2.]
h, xx, yy = np.histogram2d(d['longitude'], d['latitude'], bins=(lonbins, latbins))
```

When I run this I get some underwhelming performance:

```
> time ./test_with_xarray.py

real	0m28.152s
user	0m27.201s
sys	0m0.630s
```

If I change the last line to 

```python
h, xx, yy = np.histogram2d(d['longitude'].values, d['latitude'].values, bins=(lonbins, latbins))
```

(i.e. I pass the numpy arrays directly to the histogram2d function), things are very different:

```
> time ./test_with_xarray.py

real	0m0.996s
user	0m0.569s
sys	0m0.253s
```

It's ~28 times slower to call histogram2d on the DataArrays, compared to calling it on the underlying numpy arrays. I ran into this issue while histogramming quite large lon/lat vectors from multiple netCDF files. I got tired waiting for the computation to end, added the `.values` to the call and went through very quickly. 

It seems problematic that using xarray can slow down your code by 28 times with no real way for you to know about it...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

numpy function very slow on DataArray compared to DataArray.values #1247

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

numpy function very slow on DataArray compared to DataArray.values #1247

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions