Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Aug 25, 2021

This is to resolve ARROW-13684.

  1. Default strftime string is now %Y-%m-%dT%H:%M:%S. Perhaps %Y-%m-%dT%H:%M:%S%z would be better?
  2. Timestamps without timezone are now strftime-ed as if they were in UTC. Not sure this is the way to go. What if the local time is really invalid but we can't tell?
  3. Document %S behavior. What would be a good location to do that?

@rok
Copy link
Member Author

rok commented Aug 25, 2021

@jorisvandenbossche

@github-actions
Copy link

@jorisvandenbossche
Copy link
Member

Default strftime string is now %Y-%m-%dT%H:%M:%S. Perhaps %Y-%m-%dT%H:%M:%S%z would be better?

Not fully sure about this one. In #10647 (comment) I mentioned the options of showing no timezone information (what you did now), include a numeric offset (so adding %z) or erroring. And for UTC it's also an option to use the literal "Z".

Including a numeric offset of using literal Z (for UTC) both mean that the default format depends on the type (because for timestamp without timezone we shouldn't add an offset).

Timestamps without timezone are now strftime-ed as if they were in UTC. Not sure this is the way to go. What if the local time is really invalid but we can't tell?

Timestamps without timezone should be formatted as is, showing the wall clock time they represent. So I think formatting as if they are in UTC but without any timezone indication is correct. Although we should not allow adding timezone information with %Z or %z in those cases.
I don't think we need to care about invalid local times when printing, that's a user responsibility (and anyway, you only know if a time is invalid once you assume a certain timezone, which we don't do in strftime).

Document %S behavior. What would be a good location to do that?

You could start with the compute.rst table notes, but personally I would also include it in the docstring.

@rok
Copy link
Member Author

rok commented Aug 25, 2021

Default strftime string is now %Y-%m-%dT%H:%M:%S. Perhaps %Y-%m-%dT%H:%M:%S%z would be better?

Not fully sure about this one. In #10647 (comment) I mentioned the options of showing no timezone information (what you did now), include a numeric offset (so adding %z) or erroring. And for UTC it's also an option to use the literal "Z".

Including a numeric offset of using literal Z (for UTC) both mean that the default format depends on the type (because for timestamp without timezone we shouldn't add an offset).

Right. And for the option where we would have UTC timestamps having "Z" at the end would require something like:

if (timezone == "UTC" && self.options.format == "") {
  self.options.format = "%Y-%m-%dT%H:%M:%SZ";
} else {
  self.options.format = "%Y-%m-%dT%H:%M:%S";
}

Where we would have self.options.format = "" by default.
I'm not sure this is really worth it. I'd keep the timezone out of the default format.

Timestamps without timezone are now strftime-ed as if they were in UTC. Not sure this is the way to go. What if the local time is really invalid but we can't tell?

Timestamps without timezone should be formatted as is, showing the wall clock time they represent. So I think formatting as if they are in UTC but without any timezone indication is correct. Although we should not allow adding timezone information with %Z or %z in those cases.
I don't think we need to care about invalid local times when printing, that's a user responsibility (and anyway, you only know if a time is invalid once you assume a certain timezone, which we don't do in strftime).

Ok, let's return invalid when detecting %z/%Z and a timezone-less timestamp then?

Document %S behavior. What would be a good location to do that?

You could start with the compute.rst table notes, but personally I would also include it in the docstring.

Sounds good.

@rok rok force-pushed the ARROW-13684 branch 3 times, most recently from da3581a to de9fa8b Compare August 25, 2021 21:04
@rok
Copy link
Member Author

rok commented Aug 25, 2021

@jorisvandenbossche I've pushed changes for second and third (printing timezoneless timestamps and docs) point. I left the first as was (not timezone data by default).
Please see if this looks ok.

@rok rok marked this pull request as ready for review August 25, 2021 21:12
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor wording nits but overall the change seems good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't know if cannot print is quite right. The user might be running strftime to store or display on a GUI or send to some other tool. Perhaps cannot convert to string? Or cannot format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I went with your first proposal.

Comment on lines 843 to 845
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to suggest that only microsecond and second are supported. I like the wording down below which makes it more clear that seconds through nanoseconds are supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Changed the language. Could you check if it's good now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear. Thanks.

@rok
Copy link
Member Author

rok commented Aug 26, 2021

Thanks for the comments @westonpace !

@rok
Copy link
Member Author

rok commented Aug 27, 2021

@jorisvandenbossche @westonpace I've made a minor change in the text (decimal points -> decimal places).
Other than that I think this is done.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add something about "To obtain integer seconds, cast to timestamp with second resolution" to be explicit about the workaround.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer up to date

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems error description is outdated for all extract kernels. Thanks for pointing it out! Fixing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate? It doesn't depend on the number of seconds present, but solely on the unit of the timestamp type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you used some explanation about this from https://howardhinnant.github.io/date/date.html#to_stream_formatting, but I would maybe use the same text as you added above in strftime_doc, as I think that's clearer / more relevant from Arrow's perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense yeah. Done.

@rok
Copy link
Member Author

rok commented Aug 31, 2021

Thanks for the review @jorisvandenbossche @pitrou.
I addressed the comments and pushed the changes.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @rok. I will merge now if CI is green.

@pitrou pitrou closed this in 02343c8 Sep 6, 2021
@rok
Copy link
Member Author

rok commented Sep 6, 2021

Thanks @westonpace @jorisvandenbossche @pitrou !

ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This is to resolve [ARROW-13684](https://issues.apache.org/jira/browse/ARROW-13684).

1. Default strftime string is now `%Y-%m-%dT%H:%M:%S`. Perhaps `%Y-%m-%dT%H:%M:%S%z` would be better?
2. Timestamps without timezone are now strftime-ed as if they were in `UTC`. Not sure this is the way to go. What if the local time is really invalid but we can't tell?
3. Document `%S` behavior. What would be a good location to do that?

Closes apache#10998 from rok/ARROW-13684

Lead-authored-by: Rok <rok@mihevc.org>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants