Skip to content

Commit 54ff758

Browse files
dgreissthisisnicamoeba
authored
GH-35875: [R] Update Readme (#40148)
### Rationale for this change #35875 #35082 and #32895 make a number of recommendations to update the the Readme ### What changes are included in this PR? Rewording and reorganizing the Readme and sidebar. ### Are these changes tested? n/a ### Are there any user-facing changes? Yes * Closes: #35875 Lead-authored-by: David Greiss <david.dgreiss@gmail.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Co-authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
1 parent d3d96a1 commit 54ff758

2 files changed

Lines changed: 73 additions & 71 deletions

File tree

r/README.md

Lines changed: 68 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,99 +1,100 @@
11
# arrow <img src="https://arrow.apache.org/img/arrow-logo_hex_black-txt_white-bg.png" align="right" alt="" width="120" />
22

3+
<!-- badges: start -->
4+
35
[![cran](https://www.r-pkg.org/badges/version-last-release/arrow)](https://cran.r-project.org/package=arrow)
46
[![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amain+event%3Apush)
57
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
68

7-
[Apache Arrow](https://arrow.apache.org/) is a cross-language
8-
development platform for in-memory and larger-than-memory data. It specifies a standardized
9-
language-independent columnar memory format for flat and hierarchical
10-
data, organized for efficient analytic operations on modern hardware. It
11-
also provides computational libraries and zero-copy streaming, messaging,
12-
and interprocess communication.
13-
14-
The arrow R package exposes an interface to the Arrow C++ library,
15-
enabling access to many of its features in R. It provides low-level
16-
access to the Arrow C++ library API and higher-level access through a
17-
`{dplyr}` backend and familiar R functions.
18-
19-
## What can the arrow package do?
20-
21-
The arrow package provides functionality for a wide range of data analysis
22-
tasks. It allows users to read and write data in a variety formats:
9+
<!-- badges: end -->
2310

24-
- Read and write Parquet files, an efficient and widely used columnar format
25-
- Read and write Arrow (formerly known as Feather) files, a format optimized for speed and
26-
interoperability
27-
- Read and write CSV files with excellent speed and efficiency
28-
- Read and write multi-file and larger-than-memory datasets
29-
- Read JSON files
11+
## Overview
3012

31-
It provides data analysis tools for both in-memory and larger-than-memory data sets
32-
33-
- Analyze and process larger-than-memory datasets
34-
- Manipulate and analyze Arrow data with dplyr verbs
35-
36-
It provides access to remote filesystems and servers
37-
38-
- Read and write files in Amazon S3 and Google Cloud Storage buckets
39-
- Connect to Arrow Flight servers to transport large datasets over networks
40-
41-
Additional features include:
13+
The R `{arrow}` package provides access to many of the features of the [Apache Arrow C++ library](https://arrow.apache.org/docs/cpp/index.html) for R users. The goal of arrow is to provide an Arrow C++ backend to `{dplyr}`, and access to the Arrow C++ library through familiar base R and tidyverse functions, or `{R6}` classes.
4214

43-
- Zero-copy data sharing between R and Python
44-
- Fine control over column types to work seamlessly
45-
with databases and data warehouses
46-
- Support for compression codecs including Snappy, gzip, Brotli,
47-
Zstandard, LZ4, LZO, and bzip2
48-
- Access and manipulate Arrow objects through low-level bindings
49-
to the C++ library
50-
- Toolkit for building connectors to other applications
51-
and services that use Arrow
15+
To learn more about the Apache Arrow project, see the parent documentation of the [Arrow Project](https://arrow.apache.org/). The Arrow project provides functionality for a wide range of data analysis tasks to store, process and move data fast. See the [read/write article](articles/read_write.html) to learn about reading and writing data files, [data wrangling](articles/data_wrangling.html) to learn how to use dplyr syntax with arrow objects, and the [function documentation](reference/acero.html) for a full list of supported functions within dplyr queries.
5216

5317
## Installation
5418

55-
Most R users will probably want to install the latest release of arrow
56-
from CRAN:
19+
The latest release of arrow can be installed from CRAN. In most cases installing the latest release should work without requiring any additional system dependencies, especially if you are using
20+
Windows or macOS.
5721

58-
``` r
22+
```r
5923
install.packages("arrow")
6024
```
6125

6226
Alternatively, if you are using conda you can install arrow from conda-forge:
6327

64-
``` shell
28+
```sh
6529
conda install -c conda-forge --strict-channel-priority r-arrow
6630
```
6731

68-
In most cases installing the latest release should work without
69-
requiring any additional system dependencies, especially if you are using
70-
Window or a Mac. For those users, CRAN hosts binary packages that contain
71-
the Arrow C++ library upon which the arrow package relies, and no
72-
additional steps should be required.
73-
7432
There are some special cases to note:
7533

76-
- On macOS, the R you use with Arrow should match the architecture of the machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use R compiled for arm64. If you're using an Intel based mac, use R compiled for x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will result in segfaults and crashes.
34+
- On macOS, the R you use with Arrow should match the architecture of the machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use R compiled for arm64. If you're using an Intel based mac, use R compiled for x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will result in segfaults and crashes.
35+
36+
- On Linux the installation process can sometimes be more involved because CRAN does not host binaries for Linux. For more information please see the [installation guide](articles/install.html).
37+
38+
- If you are compiling arrow from source, please note that as of version 10.0.0, arrow requires C++17 to build. This has implications on Windows and CentOS 7. For Windows users it means you need to be running an R version of 4.0 or later. On CentOS 7, it means you need to install a newer compiler than the default system compiler gcc. See the [installation details article](https://arrow.apache.org/docs/r/articles/developers/install_details.html) for guidance.
39+
40+
- Development versions of arrow are released nightly. For information on how to installl nighhtly builds please see the [installing nightly builds](articles/install_nightly.html) article.
41+
42+
## What can the arrow package do?
43+
44+
The Arrow C++ library is comprised of different parts, each of which serves a specific purpose. The arrow package provides binding to the C++ functionality for a wide range of data analysis
45+
tasks.
46+
47+
It allows users to read and write data in a variety formats:
7748

78-
- On Linux the installation process can sometimes be more involved because
79-
CRAN does not host binaries for Linux. For more information please see the [installation guide](https://arrow.apache.org/docs/r/articles/install.html).
49+
- Read and write Parquet files, an efficient and widely used columnar format
50+
- Read and write Arrow (formerly known as Feather) files, a format optimized for speed and
51+
interoperability
52+
- Read and write CSV files with excellent speed and efficiency
53+
- Read and write multi-file and larger-than-memory datasets
54+
- Read JSON files
8055

81-
- If you are compiling arrow from source, please note that as of version
82-
10.0.0, arrow requires C++17 to build. This has implications on Windows and
83-
CentOS 7. For Windows users it means you need to be running an R version of
84-
4.0 or later. On CentOS 7, it means you need to install a newer compiler
85-
than the default system compiler gcc 4.8. See the [installation details article](https://arrow.apache.org/docs/r/articles/developers/install_details.html) for guidance. Note that
86-
this does not affect users who are installing a binary version of the package.
56+
It provides access to remote filesystems and servers:
8757

88-
- Development versions of arrow are released nightly. Most users will not
89-
need to install nightly builds, but if you do please see the article on [installing nightly builds](https://arrow.apache.org/docs/r/articles/install_nightly.html) for more information.
58+
- Read and write files in Amazon S3 and Google Cloud Storage buckets
59+
- Connect to Arrow Flight servers to transport large datasets over networks
9060

91-
## Arrow resources
61+
Additional features include:
62+
63+
- Manipulate and analyze Arrow data with dplyr verbs
64+
- Zero-copy data sharing between R and Python
65+
- Fine control over column types to work seamlessly with databases and data warehouses
66+
- Toolkit for building connectors to other applications and services that use Arrow
67+
68+
## What is Apache Arrow?
69+
70+
Apache Arrow is a cross-language development platform for in-memory and
71+
larger-than-memory data. It specifies a standardized language-independent
72+
columnar memory format for flat and hierarchical data, organized for efficient
73+
analytic operations on modern hardware. It also provides computational libraries
74+
and zero-copy streaming, messaging, and interprocess communication.
75+
76+
This package exposes an interface to the Arrow C++ library, enabling access to
77+
many of its features in R. It provides low-level access to the Arrow C++ library
78+
API and higher-level access through a dplyr backend and familiar R functions.
9279

93-
In addition to the official [Arrow R package documentation](https://arrow.apache.org/docs/r/), the [Arrow for R cheatsheet](https://github.com/apache/arrow/blob/-/r/cheatsheet/arrow-cheatsheet.pdf), and the [Apache Arrow R Cookbook](https://arrow.apache.org/cookbook/r/index.html) are useful resources for getting started with arrow.
80+
81+
## Arrow resources
82+
83+
There are a few additional resources that you may find useful for getting started with arrow:
84+
85+
- The official [Arrow R package documentation](https://arrow.apache.org/docs/r/)
86+
- [Arrow for R cheatsheet](https://github.com/apache/arrow/blob/-/r/cheatsheet/arrow-cheatsheet.pdf)
87+
- [Apache Arrow R Cookbook](https://arrow.apache.org/cookbook/r/index.html)
88+
- R for Data Science [Chapter on Arrow](https://r4ds.hadley.nz/arrow)
89+
- [Awesome Arrow R](https://github.com/thisisnic/awesome-arrow-r)
9490

9591
## Getting help
9692

93+
We welcome questions, discussion, and contributions from users of the
94+
arrow package. For information about mailing lists and other venues
95+
for engaging with the Arrow developer and user communities, please see
96+
the [Apache Arrow Community](https://arrow.apache.org/community/) page.
97+
9798
If you encounter a bug, please file an issue with a minimal reproducible
9899
example on [GitHub issues](https://github.com/apache/arrow/issues).
99100
Log in to your GitHub account, click on **New issue** and select the type of
@@ -104,11 +105,8 @@ features** section of the [Contributing to Apache
104105
Arrow](https://arrow.apache.org/docs/developers/#contributing) page
105106
in the Arrow developer documentation.
106107

107-
We welcome questions, discussion, and contributions from users of the
108-
arrow package. For information about mailing lists and other venues
109-
for engaging with the Arrow developer and user communities, please see
110-
the [Apache Arrow Community](https://arrow.apache.org/community/) page.
108+
## Code of Conduct
111109

112-
Please note that all participation in the Apache Arrow project is
110+
Please note that all participation in the Apache Arrow project is
113111
governed by the Apache Software Foundation's [code of
114112
conduct](https://www.apache.org/foundation/policies/conduct.html).

r/_pkgdown.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,10 +57,10 @@ home:
5757
sidebar:
5858
structure:
5959
- project
60-
- implementations
6160
- links
6261
- license
6362
- community
63+
- implementations
6464
- citation
6565
- authors
6666
- dev
@@ -85,6 +85,10 @@ home:
8585
[R](index.html) <br>
8686
[Ruby](https://github.com/apache/arrow/blob/main/ruby/README.md) <br>
8787
[Rust](https://docs.rs/crate/arrow/latest)
88+
community:
89+
title: Community
90+
text: >
91+
[Code of conduct](https://www.apache.org/foundation/policies/conduct.html)
8892
8993
navbar:
9094
bg: black

0 commit comments

Comments
 (0)