GH-38828: [R] Ensure that streams can be written to socket connections#38897
GH-38828: [R] Ensure that streams can be written to socket connections#38897paleolimbot merged 4 commits intoapache:mainfrom
Conversation
|
|
adc23cd to
c640eef
Compare
|
Hi @paleolimbot this looks good to me and I don't think an automated test is critical. I'd like to see the documentation updated to reflect the support this adds (happy to do that in a follow-up PR). Is it safe to say |
|
In general, we can use a connection in any place where we read or write anything! Most R functions work like this and I'm not sure we documented it anywhere. I'll find a few places where I can add it in while I've got the tab open. |
amoeba
left a comment
There was a problem hiding this comment.
Hey @paleolimbot, thanks for the most recent changes. This looks good to me outside the two failing CI jobs. The error seems like it could be related, though I can't reproduce locally. Any ideas?
cc5f30f to
0df755b
Compare
|
@amoeba I think those errors were the duckdb failures (unless I'm looking at the wrong ones). I rebased, but now we have an entirely new set of errors ( |
|
Sounds like #40949 which should be resolved soon. |
0df755b to
04e7053
Compare
|
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit e0d73c5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ections (apache#38897) ### Rationale for this change Currently we can't write to socket connection from R. This is a very useful way to send Arrow data around and should work! ### What changes are included in this PR? Implements `Tell()` for non-seekable output streams. Apparently some Arrow code calls this to figure out how many bytes have been written. ### Are these changes tested? I'm not quite sure how to test this...all output streams we can easily test are seekable. We could try to spin up a socket server on another thread (like the reprex below) but I'm worried that will be flaky. ### Are there any user-facing changes? Yes (something that should have previously worked now works), although there is no place where we currently document anything about how connections can be used. ``` r tmp <- tempfile() proc <- callr::r_bg(function() { server <- function() { library(arrow) while (TRUE) { writeLines("Listening...") con <- socketConnection(host = "localhost", port = 6011, blocking = TRUE, server = TRUE, open = "r+b") socketTimeout(con, 3600) data <- arrow::read_ipc_stream(con, as_data_frame = FALSE) print(head(as.data.frame(data))) } } server() }, stdout = tmp) Sys.sleep(0.5) library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. rb <- arrow::record_batch(iris) socketDriver <- socketConnection(host = "localhost", port = "6011", blocking = TRUE, server = FALSE, open = "w+b") write_ipc_stream(rb, socketDriver) Sys.sleep(0.5) cat(brio::read_file(tmp)) #> Listening... #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3.0 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5.0 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> Listening... # Shutdown server proc$interrupt() #> [1] TRUE Sys.sleep(0.5) proc$is_alive() #> [1] FALSE ``` <sup>Created on 2023-11-27 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> * Closes: apache#38828 * GitHub Issue: apache#38828 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Dewey Dunnington <dewey@voltrondata.com>
Rationale for this change
Currently we can't write to socket connection from R. This is a very useful way to send Arrow data around and should work!
What changes are included in this PR?
Implements
Tell()for non-seekable output streams. Apparently some Arrow code calls this to figure out how many bytes have been written.Are these changes tested?
I'm not quite sure how to test this...all output streams we can easily test are seekable. We could try to spin up a socket server on another thread (like the reprex below) but I'm worried that will be flaky.
Are there any user-facing changes?
Yes (something that should have previously worked now works), although there is no place where we currently document anything about how connections can be used.
Created on 2023-11-27 with reprex v2.0.2