-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13174: [C++][Compute] Add strftime kernel #10647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2436df1 to
45acfcd
Compare
652d46d to
78d3ffc
Compare
|
@westonpace strftime kernel is almost ready for review. |
cfa23ee to
9e8b775
Compare
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Looks great.
cpp/src/arrow/compute/api_scalar.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default format string include the trailing %z so that it is ISO-8601 compliant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that %z adds a trailing z as in the ISO format. It adds a numeric offset (eg https://www.cplusplus.com/reference/ctime/strftime/):
In [44]: datetime.datetime(2012, 1, 1, tzinfo=datetime.timezone.utc).strftime("%Y-%m-%dT%H:%M:%S%z")
Out[44]: '2012-01-01T00:00:00+0000'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to wikipedia these are valid formats:
- 2021-07-15T11:24:54+00:00
- 2021-07-15T11:24:54Z
- 20210715T112454Z
%z will give us option 1. I don't have a strong preference and pandas doesn't seem to have a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is significant effort I agree that +00:00 is good enough but I personally prefer Z if there is a quick option to enable it when the time zone is UTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to %Y-%m-%dT%H:%M:%SZ (2.) as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: How precise is this 30? Is it based on the default format string? If I look at 2021-07-12T23:55:41+00:00 I would think you only need ~25, maybe more if subsecond resolution. Could we base this on the resolution? It's not a big deal so if it seems like too much don't worry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a random date's string size times a fudge factor? :)
int expected_string_size = round(get_timestamp<Duration>(0, &options).size() * 1.1);
RETURN_NOT_OK(string_builder->Reserve(in.length * expected_string_size));
2033695 to
8ce4ab4
Compare
|
Thanks for the review @westonpace! One thing I'm not sure about is if we should be using |
|
Yes, you can. Make it return a |
@pitrou Done. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note this is only testing for C locale because that is all that is available in CI at the moment. Ideally we would cover several.
Co-authored-by: Weston Pace <weston.pace@gmail.com>
ff49e17 to
4b1d68a
Compare
|
Will merge if green. |
|
Nice! Thanks @westonpace @jorisvandenbossche @pitrou |
|
Sorry for the slow response here, but I think there are still a few behavioural aspects to fix/clarify:
|
|
Thanks for the review @jorisvandenbossche !
|
|
Thanks! |
|
Follow up PR #10998 |
This is to resolve ARROW-13174.