-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Hi,
I'm working with parquet files generated by a AWS RDS Postgres snapshot export.
I'm trying to parse a date column stored as a string into a timestamp, but it fails.
I've managed to parse the same date format (as in the first example below) when reading from a csv, so I tried to investigate it as far as I could on my own, and here's my results:
import pyarrow as pa
import pytz
#################################################################################
## the format I get from the database
us_tz_arr = pa.array([
"2014-12-07 07:48:59.285332+00",
"2014-12-07 08:01:49.758975+00",
"2014-12-07 10:11:35.884304+00"])
us_tz_arr.cast(pa.timestamp('us', tz=pytz.UTC))
-> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304+00
#################################################################################
## tried removing the timezone
us_arr = pa.array([
"2014-12-07 07:48:59.285332",
"2014-12-07 08:01:49.758975",
"2014-12-07 10:11:35.884304"])
us_arr.cast(pa.timestamp('us'))
-> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304
#################################################################################
## tried removing the microseconds but keeping the timezone
second_tz_arr = pa.array([
"2014-12-07 07:48:59+00",
"2014-12-07 08:01:49+00",
"2014-12-07 10:11:35+00"])
second_tz_arr.cast(pa.timestamp('s', tz=pytz.UTC))
-> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35+00
#################################################################################
## removing microseconds and timezone, makes it work!
s_arr = pa.array([
"2014-12-07 07:48:59",
"2014-12-07 08:01:49",
"2014-12-07 10:11:35"])
s_arr.cast(pa.timestamp('s'))
-> <pyarrow.lib.TimestampArray object at 0x7fbdf81ae460>
[
2014-12-07 07:48:59,
2014-12-07 08:01:49,
2014-12-07 10:11:35
]PS. This is my first bug report, so apologies if important things are missing.
Environment: macOS 10.15.7, Python 3.8.2
Reporter: Niclas Roos
Watchers: Rok Mihevc / @rok
Related issues:
- [C++] Strptime ignores timezone information (is fixed by)
- [C++][CSV] Timestamp parsing should accept any valid ISO 8601 without requiring custom parse strings (is related to)
Note: This issue was originally created as ARROW-10343. Please see the migration documentation for further details.