Skip to content

Regression in index when using fastparquet #6348

@gforsyth

Description

@gforsyth

What happened:
on 2.19.0+6.g7138f470f

import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'t': ['2017-02-03', '2017-03-03', '2017-01-01']})
df['t'] = pd.to_datetime(df.t, utc=True).dt.tz_convert('US/Eastern')
print(df)
print(df.index)
df.to_parquet('test.parquet', engine='fastparquet')
df = dd.read_parquet('test.parquet', engine='fastparquet').compute()
print(df)
print(df.index)
                          t
0 2017-02-02 19:00:00-05:00
1 2017-03-02 19:00:00-05:00
2 2016-12-31 19:00:00-05:00
RangeIndex(start=0, stop=3, step=1)
                                      t
7.342211e-309 2017-02-02 19:00:00-05:00
7.354163e-309 2017-03-02 19:00:00-05:00
7.328124e-309 2016-12-31 19:00:00-05:00
Float64Index([7.342210749717597e-309, 7.35416318582179e-309,
              7.32812395002337e-309],
             dtype='float64')

What you expected to happen:
on 2.19.0

import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'t': ['2017-02-03', '2017-03-03', '2017-01-01']})
df['t'] = pd.to_datetime(df.t, utc=True).dt.tz_convert('US/Eastern')
print(df)
print(df.index)
df.to_parquet('test.parquet', engine='fastparquet')
df = dd.read_parquet('test.parquet', engine='fastparquet').compute()
print(df)
print(df.index)
                          t
0 2017-02-02 19:00:00-05:00
1 2017-03-02 19:00:00-05:00
2 2016-12-31 19:00:00-05:00
RangeIndex(start=0, stop=3, step=1)
                          t
0 2017-02-02 19:00:00-05:00
1 2017-03-02 19:00:00-05:00
2 2016-12-31 19:00:00-05:00
RangeIndex(start=0, stop=3, step=1)

Environment:

  • Dask version: 2.19.0+6.g7138f470f
  • Python version: 3.8.2 conda-forge
  • Operating System: OSX
  • Install method (conda, pip, source): source

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions