Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Feb 15, 2023

Rationale for this change

We have assume_timezone to go from wall time timestamp to timestamp with a timezone. We might want a reverse operation to go from timestamp with a timezone to wall time.

This is not needed for computation within Arrow, but would be needed if an application outsides of Arrow consumes wall time. E.g.: https://stackoverflow.com/questions/73275465/how-to-keep-original-datatime-in-pyarrow-table/73276431

What changes are included in this PR?

This adds a local_time C++ kernel.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, local_time C++ kernel is added.

@github-actions
Copy link

@rok rok force-pushed the 33143-get-walltime branch from 4a3bded to a5185a2 Compare February 16, 2023 16:39
@rok rok marked this pull request as ready for review February 16, 2023 17:11
@rok rok requested a review from lidavidm February 16, 2023 17:12
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, as always, for your diligence here.

+--------------------+------------+-------------------+---------------+----------------------------+-------+
| is_leap_year | Unary | Timestamp, Date | Boolean | | |
+--------------------+------------+-------------------+---------------+----------------------------+-------+
| local_time | Unary | Timestamp | Timestamp | | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a note? I don't think it will be obvious what this function does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added some text and moved under the Timezone handling chapter. I'll trigger a docs preview to make sure I've got the markup right.

/// \return the resulting datum
///
/// \since 12.0.0
/// \note API not yet finalized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest removing this but now I see this is everywhere. Hmm.... :( I'm not sure how true this is these days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you could see it as an opportunity to sync the API with substrait :D

@rok
Copy link
Member Author

rok commented Feb 17, 2023

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: 115347e

Submitted crossbow builds: ursacomputing/crossbow @ actions-ffd22963ef

Task Status
preview-docs Github Actions

@rok
Copy link
Member Author

rok commented Feb 17, 2023

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the description is clear to me. Two small grammar nits.

Co-authored-by: Weston Pace <weston.pace@gmail.com>
@rok
Copy link
Member Author

rok commented Feb 17, 2023

@github-actions crossbow submit preview-docs

@rok
Copy link
Member Author

rok commented Feb 17, 2023

Thanks @westonpace, merged the grammar changes.

@github-actions
Copy link

Revision: cf2eb81

Submitted crossbow builds: ursacomputing/crossbow @ actions-760f71e893

Task Status
preview-docs Github Actions

@westonpace
Copy link
Member

The test failures seem unrelated (I've opened a new issue for one which appears to be an intermittent segfault).

@westonpace westonpace merged commit daa5e26 into apache:main Feb 17, 2023
@ursabot
Copy link

ursabot commented Feb 18, 2023

Benchmark runs are scheduled for baseline = ae1f98f and contender = daa5e26. daa5e26 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.31% ⬆️0.03%] test-mac-arm
[Finished ⬇️0.77% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.54% ⬆️0.03%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] daa5e267 ec2-t3-xlarge-us-east-2
[Failed] daa5e267 test-mac-arm
[Finished] daa5e267 ursa-i9-9960x
[Finished] daa5e267 ursa-thinkcentre-m75q
[Finished] ae1f98f5 ec2-t3-xlarge-us-east-2
[Finished] ae1f98f5 test-mac-arm
[Finished] ae1f98f5 ursa-i9-9960x
[Finished] ae1f98f5 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@jorisvandenbossche
Copy link
Member

Nice to see this functionality!

One comment on the name of the function: given that "local_time" could also be interpreted as returning just the time (a time32/64() type) and not both date and time (and theoretically, this could also be a kernel we would want to add at some point), we could maybe use "local_timestamp" instead?
(I know that some will day that "local timestamp" is a contradictio in terminis, and that it should be "local datetime" instead, but since we don't use "datetime" for any other kernel name and "timestamp" is the name of the type anyway, that seems the best option).

For reference, in Joda they call this toLocalDateTime (and they also have a toLocalDate and toLocalTime), in pandas the naming is confusing (tz_localize(None)), and in R's lubridate there is a local_time that does somewhat related, except that it returns the difference in seconds instead of new timestamp object.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a few doc comments/questions

"cannot be found in the timezone database."),
"If timezone is not given then timezone naive timestamp in UTC are\n"
"returned. An error is returned if the values have a defined timezone\n"
"but it cannot be found in the timezone database."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intentional? Or was it meant to be in local_time_doc?
(and if intentional, why only for round and not ceil/floor?)


const FunctionDoc local_time_doc{
"Convert timestamp to a timezone-naive local time timestamp",
("LocalTime converts a timestamp to a local time of timestamps timezone\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is missing in this sentence?

Comment on lines +1813 to +1814
"and removes timezone metadata. If input is in UTC or doesn't have\n"
"timezone metadata, it is returned as is.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned "as is" is not entirely correct for UTC, as I assume the "UTC" timezone is still removed?

@rok
Copy link
Member Author

rok commented Feb 20, 2023

Thanks for the review @jorisvandenbossche. I've opened a PR to address it: #34263.

rok added a commit that referenced this pull request Feb 23, 2023
…rnel (#34263)

### Rationale for this change

A better naming for local_time kernel was proposed in post merge review of #34208.

### What changes are included in this PR?

Change `local_time` to `local_timestamp` and related docs/test changes.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Changing `local_time` to `local_timestamp` is a user facing change. But since it was not yet released we can probably treat it as non-breaking.
* Closes: #33143

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Rok Mihevc <rok@mihevc.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Kernel to convert timestamp with timezone to wall time

4 participants