Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Oct 5, 2021

No description provided.

@github-actions
Copy link

github-actions bot commented Oct 5, 2021

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep some indication of that it has a time zone (was an instant) in the case of UTC as well? (otherwise a roundtrip to csv would give a native timestamp without timezone for UTC, but a timestamp with timezone (UTC) for non-UTC timezones)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, you're right. I'll fix that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want to append the timezone and localize the timestamp? Pretty-printing currently doesn't.

I don't know what the user expects here (we may actually need to make this customizable at some point :-)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, Pandas does:

>>> print(pd.DataFrame({"a": [pd.Timestamp("1968-11-30 13:30:45.123456789", tz="America/Phoenix")]}).to_csv())
,a
0,1968-11-30 13:30:45.123456789-07:00

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I wonder if those files are roundtrippable (is our CSV reader able to read them?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Good thing you pointed that out. I'll dig a bit.

/home/lidavidm/Code/upstream/arrow-14231/cpp/src/arrow/csv/converter_test.cc:68: Failure
Failed
'_error_or_value12.status()' failed with Invalid: CSV conversion error to timestamp[ns, tz=UTC]: invalid value '1970-01-01 00:00:00+0000'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty-printing currently doesn't.

Personally, I find this a "bug" in our current pretty printing (I find it rather confusing that it doesn't show at all that you have timezone-aware timestamps)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we have https://issues.apache.org/jira/browse/ARROW-13625 (and the linked issues) about the parsing side not yet supporting timezones

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, we can tackle the reading side separately then :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging those up!

@lidavidm
Copy link
Member Author

lidavidm commented Oct 7, 2021

Rebased on top of the recent temporal kernel refactoring.

@lidavidm
Copy link
Member Author

lidavidm commented Nov 5, 2021

This and ARROW-12820/#11358 now pass CI and should handle both reading/writing of timestamps with timezones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but static const std::string perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be "Z" rather than "+0000" for UTC timestamps?
See Wikipedia: https://en.wikipedia.org/wiki/ISO_8601#Coordinated_Universal_Time_(UTC)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I've fixed this + added a UTC timestamp to the CSV writer test as well.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @lidavidm !

for (auto string_type : {utf8(), large_utf8()}) {
ASSERT_RAISES(NotImplemented, Cast(ArrayFromJSON(timestamp(TimeUnit::SECOND, "UTC"),
"[-34226955, 1456767743]"),
CastOptions::Safe(string_type)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTC could probably be made to work on Windows, can you perhaps create a followup JIRA for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed ARROW-14643.

@pitrou pitrou closed this in f3f4423 Nov 9, 2021
@ursabot
Copy link

ursabot commented Nov 9, 2021

Benchmark runs are scheduled for baseline = ed8c76e and contender = f3f4423. f3f4423 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.45% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants