Bug (?) concatenating arrays of strings

Hi,

I've run across what I think is a small bug with concatenating dask arrays of strings in which the dtypes of the arrays to be concatenated are different:

```
In [106]: a = np.array(['CA-0', 'CA-1'])

In [107]: b = np.array(['TX-0', 'TX-10', 'TX-101', 'TX-102'])

In [108]: a = da.from_array(a, chunks=2)

In [109]: b = da.from_array(b, chunks=4)

In [110]: da.concatenate([a, b]).compute()
Out[110]: 
array(['CA-0', 'CA-1', 'TX-0', 'TX-1', 'TX-1', 'TX-1'],
      dtype='|S4')

In [111]: da.concatenate([b, a]).compute()
Out[111]: 
array(['TX-0', 'TX-10', 'TX-101', 'TX-102', 'CA-0', 'CA-1'],
      dtype='|S6')
```

If the array with the "smaller" dtype (in this case, S4) is the first array in the sequence to be concatenated, then this "smaller" dtype is used for the end result, truncating the entries in the array with the "larger" dtype (in this case, S6). If the order of the arrays is swapped so that the array with the "larger" dtype comes first, then the concatenation works properly.

It looks to me like the error occurs in the [dask.array.core.concatenate3](https://github.com/dask/dask/blob/master/dask/array/core.py#L2952) function where the dtype of the result is inferred from the first array in the sequence, rather than using the dtype computed in the [concatenate](https://github.com/dask/dask/blob/master/dask/array/core.py#L1748) function itself.

Todd


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug (?) concatenating arrays of strings #1050

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Bug (?) concatenating arrays of strings #1050

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions