-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13054: [C++] Add option to specify the first day of the week for the "day_of_week" temporal kernel #10598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3647567 to
65c3019
Compare
|
Is this missing an option we will need in the near future? Should we remove something. |
Thanks for this! I've taken a look at the
So, yeah, I'm thinking |
|
Hey @thisisnic! Thanks for the input :).
Is this just an arbitrary month user sets or is there a standard?
I was wondering about locale as well. Would it be a good idea to be able to set an arbitrary locale in some cases? Pandas appears not to offer this option. It would be useful for strftime but could cause misinterpretations for say day_of_week. Added strftime jira. |
The default value in
Again, I'd like input from others on this (I am not experienced in this and may be missing something!), but my initial thoughts are that if we do choose to extract the day of week and month as integers only (and not implement the
Thanks! |
Well my thinking is to enable this |
Pandas actually has a different method (the one you link is an attribute that has been there for a long time, so but pandas added an additional method so it could have arguments) that takes a locale arguments: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.day_name.html Personally, I would (at least for now) leave all locale-specific handling to the bindings / downstream applications (except for About the actual options in this PR: that seems like a good start, but I don't think all those options would be needed in a single Options struct? For example, the ambiguous/non-existent handling, those can be put in a specific options struct for the kernel that needs it? |
Oh, nice to know!
Yeah, leaving locale out of this does seem like the sane option for now.
I'm ok with single Options struct or multiple. |
Those make sense to me. I think there may be other Options structs we might need; for example, ones for kernels that do maths with dates. However, we don't need to know all of these in advance, right? |
Indeed. Even the ones I listed above should be implemented just-in-time with their respective kernels IMO. So I'll only implement |
e0f2ab7 to
3fb529f
Compare
|
@rok you mention this PR now only has |
@jorisvandenbossche well the beginning of the week could be e.g.:
Alternatively we can have How about |
I'm not sure if this might be a little confusing combining them. Could we perhaps define ourselves which integer maps to which day, and then just have a single parameter, |
|
@thisisnic I like |
|
@rok Sounds good to me. Let me know if you want me to take a look at the code that will need to be added to fix the R builds which are now failing because you've implemented those options. |
|
@rok it's still unclear to me what the There are two aspects that could be controlled:
ISO starts counting at 1 on Monday, C++ starts counting at 0 on Sunday (but lubridate's Taking a step back: for the "start index", would users ever want something else as 0 or 1 ? (I don't think that counting from 2 to 8 would ever make sense). In addition, going from 0-indexed to 1-indexed is a simple operation (addition +1). On the other hand, the "start day" is more difficult to change afterwards (since you need to wrap around, not a simple +1, but eg 7 needs to become 0). So if we want to add some option, "start day" seems the more useful one to add? |
e102fb0 to
0e0a5a1
Compare
|
Thanks for the input @thisisnic & @jorisvandenbossche! I've refactored this to use |
a359103 to
73c0abb
Compare
|
@pitrou ping :) |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok to me as far as C++ changes are concerned. There are a couple nits below.
I'll let @jorisvandenbossche validate the rest and then merge when desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can you usually pass options as const&.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This member doesn't seem used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could be const&.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Thanks for the review @pitrou! I've implemented the suggestions. |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just 3 tiny last comments
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
|
@jorisvandenbossche Thanks for the review! I've pushed the suggested changes. |
|
Thanks! |
This is to resolve ARROW-13054.
This will be needed for casting timezone-naive timestamps ARROW-13033 and defining starting day of the week.