Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Apr 27, 2021

This is to resolve ARROW-11759.

@github-actions
Copy link

@rok
Copy link
Member Author

rok commented Apr 27, 2021

Missing features at moment:

  • isocalendar
  • millisecond
  • microsecond
  • nanosecond

Different input / output types should also be handled.
Does this need to be timezone aware?

@jorisvandenbossche
Copy link
Member

Does this need to be timezone aware?

Unfortunately yes .. But since that's a lot more complicated, and I don't think we already have tz-related functionality, for now those kernels could maybe just raise an error if the timestamp type has a tz and is thus not tz-naive?

@rok rok force-pushed the ARROW-11759 branch 4 times, most recently from 2c4968d to 2765cef Compare May 6, 2021 16:20
@rok
Copy link
Member Author

rok commented May 6, 2021

It seems tests on windows are not able to download IANA timezone database or access it post-download (https://howardhinnant.github.io/date/tz.html#Installation). I'll solve it but meanwhile I I'd like to ask for review.

Open questions I have at the moment:

  • Is it possible to match kernels to zoned timestamps when registering or do we have to keep matching at runtime like so.
  • What would be a good array type for isodate?

@rok rok marked this pull request as ready for review May 6, 2021 17:22
@rok
Copy link
Member Author

rok commented May 8, 2021

I've adopted some code from iso_week.h. Would it make more sense to add iso_week.h as a vendored library instead? Should we add a note to our license? (iso_week.h is MIT licensed).

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rok. There's a bunch of things that look unexpected to me, see below.

@jorisvandenbossche
Copy link
Member

General comment: the PR is quite big, and the timezone-related logic further added complexity. So from a workflow perspective (to ensure reviewing / getting this merged is manageable), could it make sense to keep the timezone-related logic for a separate PR? (didn't check the code to see whether that's actually feasible)

@rok
Copy link
Member Author

rok commented May 12, 2021

General comment: the PR is quite big, and the timezone-related logic further added complexity. So from a workflow perspective (to ensure reviewing / getting this merged is manageable), could it make sense to keep the timezone-related logic for a separate PR? (didn't check the code to see whether that's actually feasible)

Keeping tz-aware logic would be fairly easy to move to a separate PR. I'll move it out.

@rok rok force-pushed the ARROW-11759 branch 8 times, most recently from f8c6665 to 23f51e7 Compare May 17, 2021 02:44
@rok rok force-pushed the ARROW-11759 branch 3 times, most recently from 7bcc910 to 69c5760 Compare May 20, 2021 18:26
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Did a final pass through the docs.

Also, it would maybe be good to add a test for a timestamp that doesn't fit into the nanosecond range?

{"values"}};

const FunctionDoc millisecond_doc{
"Extract millisecond values",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here the clarification about it being the number of milliseconds since the last full second, as you did for the C++ functions? (and same for microsecond/nanosecond below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

{"values"}};

const FunctionDoc subsecond_doc{
"Extract subsecond values",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also best to add the clarification of what subsecond is from the Subsecond docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
return {static_cast<int64_t>(static_cast<int32_t>(y)),
static_cast<int64_t>(trunc<weeks>(t - start).count() + 1),
static_cast<int64_t>(weekday(ymd).iso_encoding())};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DayOfWeek implementation you subtract 1 (to have 0-6), while here not. I assume that's the ISO definition? If so, we should clearly document that difference in the ISOCalendar documentation (since now it seems to imply that the "day_of_week" field is the same as the "day_of_week" kernel. Maybe also prepend it with "iso_" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to docs and prepended.

@rok rok force-pushed the ARROW-11759 branch 2 times, most recently from 4dc50cb to ee53452 Compare June 9, 2021 12:03
@rok
Copy link
Member Author

rok commented Jun 9, 2021

Thanks for the comments and suggestions @jorisvandenbossche! :)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@pitrou
Copy link
Member

pitrou commented Jun 9, 2021

Thank you for this PR @rok ! I believe this is ready for merging. I'll just wait for CI.

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@rok
Copy link
Member Author

rok commented Jun 9, 2021

Just noticed I missed this:

@jorisvandenbossche
Also, it would maybe be good to add a test for a timestamp that doesn't fit into the nanosecond range?

Do you mind if I move it to ARROW-12980?

@rok
Copy link
Member Author

rok commented Jun 9, 2021

Actually @jorisvandenbossche what do you mean by "doesn't fit into the nanosecond range"?
Like ` "1970-01-01T00:00:59.1234567890123"?

@pitrou pitrou closed this in 00cc41f Jun 9, 2021
@rok rok deleted the ARROW-11759 branch June 9, 2021 19:04
@jorisvandenbossche
Copy link
Member

Cool, this is a nice start for datetime kernels!
@thisisnic opened a JIRA for expsosing those in R as well (https://issues.apache.org/jira/browse/ARROW-13022)

what do you mean by "doesn't fit into the nanosecond range"?

I meant a date that falls outside the date range of 1677-09-21 - 2262-04-11 (the range that ns resolution can cover).

@rok
Copy link
Member Author

rok commented Jun 10, 2021

Thanks @jorisvandenbossche @pitrou for the help and feedback! I'm very glad this is merged! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants