Skip to content

[C++] Unable to parse strings into timestamps #26330

@asfimport

Description

@asfimport

Hi,

I'm working with parquet files generated by a AWS RDS Postgres snapshot export. 

I'm trying to parse a date column stored as a string into a timestamp, but it fails.

I've managed to parse the same date format (as in the first example below) when reading from a csv, so I tried to investigate it as far as I could on my own, and here's my results:

import pyarrow as pa
import pytz

#################################################################################
## the format I get from the database
us_tz_arr = pa.array([
  "2014-12-07 07:48:59.285332+00",
  "2014-12-07 08:01:49.758975+00",
  "2014-12-07 10:11:35.884304+00"])

us_tz_arr.cast(pa.timestamp('us', tz=pytz.UTC))
-> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304+00

#################################################################################
## tried removing the timezone
us_arr = pa.array([
  "2014-12-07 07:48:59.285332",
  "2014-12-07 08:01:49.758975",
  "2014-12-07 10:11:35.884304"])

us_arr.cast(pa.timestamp('us'))
-> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304

#################################################################################
## tried removing the microseconds but keeping the timezone
second_tz_arr = pa.array([
  "2014-12-07 07:48:59+00",
  "2014-12-07 08:01:49+00",
  "2014-12-07 10:11:35+00"])

second_tz_arr.cast(pa.timestamp('s', tz=pytz.UTC))
-> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35+00

#################################################################################
## removing microseconds and timezone, makes it work!
s_arr = pa.array([
  "2014-12-07 07:48:59",
  "2014-12-07 08:01:49",
  "2014-12-07 10:11:35"])

s_arr.cast(pa.timestamp('s'))
-> <pyarrow.lib.TimestampArray object at 0x7fbdf81ae460>
[
  2014-12-07 07:48:59,
  2014-12-07 08:01:49,
  2014-12-07 10:11:35
]

 PS. This is my first bug report, so apologies if important things are missing.

Environment: macOS 10.15.7, Python 3.8.2
Reporter: Niclas Roos
Watchers: Rok Mihevc / @rok

Related issues:

Note: This issue was originally created as ARROW-10343. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions