-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14231: [C++] Support casting timestamp with timezone to string #11328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we keep some indication of that it has a time zone (was an instant) in the case of UTC as well? (otherwise a roundtrip to csv would give a native timestamp without timezone for UTC, but a timestamp with timezone (UTC) for non-UTC timezones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, you're right. I'll fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want to append the timezone and localize the timestamp? Pretty-printing currently doesn't.
I don't know what the user expects here (we may actually need to make this customizable at some point :-)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, Pandas does:
>>> print(pd.DataFrame({"a": [pd.Timestamp("1968-11-30 13:30:45.123456789", tz="America/Phoenix")]}).to_csv())
,a
0,1968-11-30 13:30:45.123456789-07:00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I wonder if those files are roundtrippable (is our CSV reader able to read them?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! Good thing you pointed that out. I'll dig a bit.
/home/lidavidm/Code/upstream/arrow-14231/cpp/src/arrow/csv/converter_test.cc:68: Failure
Failed
'_error_or_value12.status()' failed with Invalid: CSV conversion error to timestamp[ns, tz=UTC]: invalid value '1970-01-01 00:00:00+0000'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty-printing currently doesn't.
Personally, I find this a "bug" in our current pretty printing (I find it rather confusing that it doesn't show at all that you have timezone-aware timestamps)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we have https://issues.apache.org/jira/browse/ARROW-13625 (and the linked issues) about the parsing side not yet supporting timezones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, we can tackle the reading side separately then :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging those up!
|
Rebased on top of the recent temporal kernel refactoring. |
|
This and ARROW-12820/#11358 now pass CI and should handle both reading/writing of timestamps with timezones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but static const std::string perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be "Z" rather than "+0000" for UTC timestamps?
See Wikipedia: https://en.wikipedia.org/wiki/ISO_8601#Coordinated_Universal_Time_(UTC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. I've fixed this + added a UTC timestamp to the CSV writer test as well.
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @lidavidm !
| for (auto string_type : {utf8(), large_utf8()}) { | ||
| ASSERT_RAISES(NotImplemented, Cast(ArrayFromJSON(timestamp(TimeUnit::SECOND, "UTC"), | ||
| "[-34226955, 1456767743]"), | ||
| CastOptions::Safe(string_type))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTC could probably be made to work on Windows, can you perhaps create a followup JIRA for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed ARROW-14643.
|
Benchmark runs are scheduled for baseline = ed8c76e and contender = f3f4423. f3f4423 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.