Skip to content

Series / DataFrame view method not implemented. #8439

@ocqco

Description

@ocqco

When attempting to convert a column from one dtype to another using astype, there is a FutureWarning raised by Dask and Pandas regarding the move to using view.

Pandas suggests replacing the astype method with view, however view appears not to be implemented in dask.dataframe.core.

This is the FutureWarning from Pandas:

FutureWarning: casting datetime64[ns] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.
df['TIMESTAMP_astype'] = df['TIMESTAMP'].astype('int64')

Setup:

import pandas as pd
import dask.dataframe as dd

data = {
    'TIMESTAMP': [
        '2021-11-27 00:05:02.175274',
        '2021-11-27 00:05:05.205596',
        '2021-11-27 00:05:29.212572',
        '2021-11-27 00:05:25.708343',
        '2021-11-27 00:05:47.714958',
    ]
}

df = pd.DataFrame(data)
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'])

ddf = dd.from_pandas(df, npartitions=1)
ddf['TIMESTAMP'] = dd.to_datetime(ddf['TIMESTAMP'])

Pandas code:

df['TIMESTAMP_astype'] = df['TIMESTAMP'].astype('int64')  # works - throws FutureWarning
df['TIMESTAMP_view'] = df['TIMESTAMP'].view('int64')  # works

assert (df['TIMESTAMP_astype'] == df['TIMESTAMP_view']).all()  # True

Dask code:

ddf['TIMESTAMP_astype'] = ddf['TIMESTAMP'].astype('int64')  # works - throws FutureWarning
ddf['TIMESTAMP_view'] = ddf['TIMESTAMP'].view('int64')  # error - below

Not implemented view in dask.dataframe.core:

AttributeError: 'Series' object has no attribute 'view'

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataframegood second issueClearly described, educational, but less trivial than "good first issue".

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions