Use argument dtype to inform coercion#17779
Conversation
|
cc @wesm for the arrow part. The background is that dask has scalar objects that duck-type numpy scalars. We'd like |
jreback
left a comment
There was a problem hiding this comment.
seems reasonable if all passing
maybe want to check some perf
but i doubt it affect things
pandas/core/dtypes/cast.py
Outdated
| return arr.dtype, arr | ||
|
|
||
|
|
||
| def _maybe_infer_dtype_type(element): |
Master:
```python
>>> import dask.dataframe as dd
>>> s = dd.core.Scalar({('s', 0): 10}, 's', 'i8')
>>> pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7],
... 'b': [7, 6, 5, 4, 3, 2, 1]})
>>> (pdf + s).dtypes
a object
b object
dtype: object
Head:
```
>>> (pdf + s).dtypes
a int64
b int64
dtype: object
```
This is more consistent with 0.20.3, while still most of the changes in
pandas-dev#16821
Closes pandas-dev#17767
a3a2ee2 to
b34c61f
Compare
Codecov Report
@@ Coverage Diff @@
## master #17779 +/- ##
==========================================
- Coverage 91.24% 91.22% -0.02%
==========================================
Files 163 163
Lines 49916 49923 +7
==========================================
- Hits 45544 45542 -2
- Misses 4372 4381 +9
Continue to review full report at Codecov.
|
| (1.0, 'f8'), | ||
| (1j, 'complex128'), | ||
| (True, 'bool'), | ||
| # (np.timedelta64(20, 'ns'), '<m8[ns]'), |
There was a problem hiding this comment.
is this commented for a reason?
There was a problem hiding this comment.
Fixed, forgot to uncomment it while testing.
jreback
left a comment
There was a problem hiding this comment.
lgtm. just a small comment. merge when satisfied.
|
@TomAugspurger no one's put in any effort on making the pyarrow scalar types usable in any kind of analytical setting, they're mostly for display purposes and coercion to Python types for the moment. We will need to address it at some point in the not-too-distant future |
Ok. I don't think the changes here do anything to help with that, but they shouldn't make anything more difficult. |
| (1.0, 'f8'), | ||
| (1j, 'complex128'), | ||
| (True, 'bool'), | ||
| (np.timedelta64(20, 'ns'), '<m8[ns]'), |
There was a problem hiding this comment.
actually ought to add Timedelta/Timestamp, NaT, np.nan, Period
There was a problem hiding this comment.
The missing data ones are already covered in test_nanops. Timedelta / timestamps in handled in tests/scalar/. Period doesn't support binops. So we should be good.
* Use argument dtype to inform coercion
Master:
```python
>>> import dask.dataframe as dd
>>> s = dd.core.Scalar({('s', 0): 10}, 's', 'i8')
>>> pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7],
... 'b': [7, 6, 5, 4, 3, 2, 1]})
>>> (pdf + s).dtypes
a object
b object
dtype: object
Head:
```
>>> (pdf + s).dtypes
a int64
b int64
dtype: object
```
This is more consistent with 0.20.3, while still most of the changes in
pandas-dev#16821
Closes pandas-dev#17767
* Compat for older numpy where bool(dtype) is False
* Added timedelta
* Use argument dtype to inform coercion
Master:
```python
>>> import dask.dataframe as dd
>>> s = dd.core.Scalar({('s', 0): 10}, 's', 'i8')
>>> pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7],
... 'b': [7, 6, 5, 4, 3, 2, 1]})
>>> (pdf + s).dtypes
a object
b object
dtype: object
Head:
```
>>> (pdf + s).dtypes
a int64
b int64
dtype: object
```
This is more consistent with 0.20.3, while still most of the changes in
pandas-dev#16821
Closes pandas-dev#17767
* Compat for older numpy where bool(dtype) is False
* Added timedelta
* Use argument dtype to inform coercion
Master:
```python
>>> import dask.dataframe as dd
>>> s = dd.core.Scalar({('s', 0): 10}, 's', 'i8')
>>> pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7],
... 'b': [7, 6, 5, 4, 3, 2, 1]})
>>> (pdf + s).dtypes
a object
b object
dtype: object
Head:
```
>>> (pdf + s).dtypes
a int64
b int64
dtype: object
```
This is more consistent with 0.20.3, while still most of the changes in
pandas-dev#16821
Closes pandas-dev#17767
* Compat for older numpy where bool(dtype) is False
* Added timedelta
Master:
Head:
This is more consistent with 0.20.3, while keeping the benefits from #16821
Closes #17767
I may play around with this a bit more. I feel like this is a bit too numpy specific, for example, adding a
DataFrameand apa.Scalarstill fails:Do we want that to work?