Skip to content

Conversation

@dragosmg
Copy link
Contributor

@dragosmg dragosmg commented May 18, 2022

This PR enables the tz argument for the strptime binding by using the assume_timezone compute function.

Current behaviour:

library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

test_df <- tibble::tibble(
    timestamp_naive_string = c("2018-10-07 19:04:05", NA)
)

test_df %>% 
    arrow_table() %>% 
    mutate(
        timestamp = 
            strptime(
                timestamp_naive_string, 
                format = "%Y-%m-%d %H:%M:%S", 
                tz = "Pacific/Marquesas"
            )
    ) %>%
    collect()
#> Warning: In strptime(timestamp_naive_string, format = "%Y-%m-%d %H:%M:%S", ...,
#> Time zone argument not supported in Arrow; pulling data into R
#> # A tibble: 2 × 2
#>   timestamp_naive_string timestamp          
#>   <chr>                  <dttm>             
#> 1 2018-10-07 19:04:05    2018-10-07 19:04:05
#> 2 <NA>                   NA

Created on 2022-05-18 by the reprex package (v2.0.1)

Future/desired behaviour:

library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

test_df <- tibble::tibble(
  timestamp_naive_string = c("2018-10-07 19:04:05", NA)
)

a <- test_df %>% 
  arrow_table() %>% 
  mutate(
    timestamp = 
      strptime(
        timestamp_naive_string, 
        format = "%Y-%m-%d %H:%M:%S", 
        tz = "Pacific/Marquesas"
      )
  ) %>%
  collect()

a
#> # A tibble: 2 × 2
#>   timestamp_naive_string timestamp          
#>   <chr>                  <dttm>             
#> 1 2018-10-07 19:04:05    2018-10-07 19:04:05
#> 2 <NA>                   NA
attributes(a$timestamp)
#> $class
#> [1] "POSIXlt" "POSIXt" 
#> 
#> $tzone
#> [1] "Pacific/Marquesas"

Created on 2022-05-19 by the reprex package (v2.0.1)

@github-actions
Copy link

@dragosmg dragosmg changed the title ARROW-16415: [R] Update strptime bindings to use tz ARROW-16415: [R] Update strptime binding signature with the tz argument May 18, 2022
@dragosmg dragosmg marked this pull request as ready for review May 18, 2022 20:53
Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have some issues where the default argument of tz in strptime() in R is different to the one here. I tried running this test I wrote, but got an error:

> t_string %>%
+       arrow_table() %>%
+       mutate(
+         x = strptime(x, format = "%Y-%m-%d %H:%M:%S", tz = "")
+       ) %>%
+       collect() -> out
Error in `collect()`:
! Invalid: Cannot locate timezone '':  not found in timezone database
/home/nic2/arrow/cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc:128  LocateZone(options.timezone)
/home/nic2/arrow/cpp/src/arrow/compute/exec.cc:701  kernel_->exec(kernel_ctx_, batch, &out)
/home/nic2/arrow/cpp/src/arrow/compute/exec.cc:642  ExecuteBatch(batch, listener)
/home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:597  executor->Execute(arguments, &listener)
/home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:90  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
/home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:467  iterator_.Next()
/home/nic2/arrow/cpp/src/arrow/record_batch.cc:337  ReadNext(&batch)
/home/nic2/arrow/cpp/src/arrow/record_batch.cc:351  ToRecordBatches()

@dragosmg dragosmg requested a review from thisisnic May 25, 2022 12:41
Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the comment it'd be great to see added, I think this is good to go. @jonkeane - mind taking a look when you have a chance, in case I've missed anything?

@thisisnic thisisnic requested a review from jonkeane May 26, 2022 11:11
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good — a comment about an additional test + a few more comments that would be helpful

@dragosmg dragosmg requested a review from jonkeane June 6, 2022 09:42
@dragosmg dragosmg requested a review from jonkeane June 7, 2022 09:02
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for sticking with it

kou pushed a commit that referenced this pull request Feb 20, 2023
…Hub issue numbers (#34260)

Rewrite the Jira issue numbers to the GitHub issue numbers, so that the GitHub issue numbers are automatically linked to the issues by pkgdown's auto-linking feature.

Issue numbers have been rewritten based on the following correspondence.
Also, the pkgdown settings have been changed and updated to link to GitHub.

I generated the Changelog page using the `pkgdown::build_news()` function and verified that the links work correctly.

---
ARROW-6338	#5198
ARROW-6364	#5201
ARROW-6323	#5169
ARROW-6278	#5141
ARROW-6360	#5329
ARROW-6533	#5450
ARROW-6348	#5223
ARROW-6337	#5399
ARROW-10850	#9128
ARROW-10624	#9092
ARROW-10386	#8549
ARROW-6994	#23308
ARROW-12774	#10320
ARROW-12670	#10287
ARROW-16828	#13484
ARROW-14989	#13482
ARROW-16977	#13514
ARROW-13404	#10999
ARROW-16887	#13601
ARROW-15906	#13206
ARROW-15280	#13171
ARROW-16144	#13183
ARROW-16511	#13105
ARROW-16085	#13088
ARROW-16715	#13555
ARROW-16268	#13550
ARROW-16700	#13518
ARROW-16807	#13583
ARROW-16871	#13517
ARROW-16415	#13190
ARROW-14821	#12154
ARROW-16439	#13174
ARROW-16394	#13118
ARROW-16516	#13163
ARROW-16395	#13627
ARROW-14848	#12589
ARROW-16407	#13196
ARROW-16653	#13506
ARROW-14575	#13160
ARROW-15271	#13170
ARROW-16703	#13650
ARROW-16444	#13397
ARROW-15016	#13541
ARROW-16776	#13563
ARROW-15622	#13090
ARROW-18131	#14484
ARROW-18305	#14581
ARROW-18285	#14615
* Closes: #33631

Authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants