GH-35875: [R] Update Readme (#40148)

dgreiss · thisisnic · amoeba · web-flow · commit 54ff758a4570 · 2024-03-14T21:40:35.000-04:00
### Rationale for this change #35875 #35082 and #32895 make a number of recommendations to update the the Readme ### What changes are included in this PR? Rewording and reorganizing the Readme and sidebar. ### Are these changes tested? n/a ### Are there any user-facing changes? Yes * Closes: #35875 Lead-authored-by: David Greiss <david.dgreiss@gmail.com> Co-authored-by: Nic Crane <thisisnic@gmail.com> Co-authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
diff --git a/r/README.md b/r/README.md
@@ -1,99 +1,100 @@
 # arrow <img src="https://arrow.apache.org/img/arrow-logo_hex_black-txt_white-bg.png" align="right" alt="" width="120" />
 
+<!-- badges: start -->
+
 [![cran](https://www.r-pkg.org/badges/version-last-release/arrow)](https://cran.r-project.org/package=arrow)
 [![CI](https://github.com/apache/arrow/workflows/R/badge.svg?event=push)](https://github.com/apache/arrow/actions?query=workflow%3AR+branch%3Amain+event%3Apush)
 [![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
 
-[Apache Arrow](https://arrow.apache.org/) is a cross-language
-development platform for in-memory and larger-than-memory data. It specifies a standardized
-language-independent columnar memory format for flat and hierarchical
-data, organized for efficient analytic operations on modern hardware. It
-also provides computational libraries and zero-copy streaming, messaging,
-and interprocess communication.
-
-The arrow R package exposes an interface to the Arrow C++ library,
-enabling access to many of its features in R. It provides low-level
-access to the Arrow C++ library API and higher-level access through a
-`{dplyr}` backend and familiar R functions.
-
-## What can the arrow package do?
-
-The arrow package provides functionality for a wide range of data analysis
-tasks. It allows users to read and write data in a variety formats:
+<!-- badges: end -->
 
--   Read and write Parquet files, an efficient and widely used columnar format
--   Read and write Arrow (formerly known as Feather) files, a format optimized for speed and
-    interoperability
--   Read and write CSV files with excellent speed and efficiency
--   Read and write multi-file and larger-than-memory datasets
--   Read JSON files
+## Overview
 
-It provides data analysis tools for both in-memory and larger-than-memory data sets
-
--   Analyze and process larger-than-memory datasets
--   Manipulate and analyze Arrow data with dplyr verbs
-
-It provides access to remote filesystems and servers
-
--   Read and write files in Amazon S3 and Google Cloud Storage buckets
--   Connect to Arrow Flight servers to transport large datasets over networks  
-    
-Additional features include:
+The R `{arrow}` package provides access to many of the features of the [Apache Arrow C++ library](https://arrow.apache.org/docs/cpp/index.html) for R users. The goal of arrow is to provide an Arrow C++ backend to `{dplyr}`, and access to the Arrow C++ library through familiar base R and tidyverse functions, or `{R6}` classes.
 
--   Zero-copy data sharing between R and Python
--   Fine control over column types to work seamlessly
-    with databases and data warehouses
--   Support for compression codecs including Snappy, gzip, Brotli,
-    Zstandard, LZ4, LZO, and bzip2
--   Access and manipulate Arrow objects through low-level bindings
-    to the C++ library
--   Toolkit for building connectors to other applications
-    and services that use Arrow
+To learn more about the Apache Arrow project, see the parent documentation of the [Arrow Project](https://arrow.apache.org/). The Arrow project provides functionality for a wide range of data analysis tasks to store, process and move data fast. See the [read/write article](articles/read_write.html) to learn about reading and writing data files, [data wrangling](articles/data_wrangling.html) to learn how to use dplyr syntax with arrow objects, and the [function documentation](reference/acero.html) for a full list of supported functions within dplyr queries.
 
 ## Installation
 
-Most R users will probably want to install the latest release of arrow 
-from CRAN:
+The latest release of arrow can be installed from CRAN. In most cases installing the latest release should work without requiring any additional system dependencies, especially if you are using
+Windows or macOS.
 
-``` r
+```r
 install.packages("arrow")
 ```
 
 Alternatively, if you are using conda you can install arrow from conda-forge:
 
-``` shell
+```sh
 conda install -c conda-forge --strict-channel-priority r-arrow
 ```
 
-In most cases installing the latest release should work without 
-requiring any additional system dependencies, especially if you are using 
-Window or a Mac. For those users, CRAN hosts binary packages that contain 
-the Arrow C++ library upon which the arrow package relies, and no 
-additional steps should be required.
-
 There are some special cases to note:
 
-- On macOS, the R you use with Arrow should match the architecture of the machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use R compiled for arm64. If you're using an Intel based mac, use R compiled for x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will result in segfaults and crashes. 
+- On macOS, the R you use with Arrow should match the architecture of the machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use R compiled for arm64. If you're using an Intel based mac, use R compiled for x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will result in segfaults and crashes.
+
+- On Linux the installation process can sometimes be more involved because CRAN does not host binaries for Linux. For more information please see the [installation guide](articles/install.html).
+
+- If you are compiling arrow from source, please note that as of version 10.0.0, arrow requires C++17 to build. This has implications on Windows and CentOS 7. For Windows users it means you need to be running an R version of 4.0 or later. On CentOS 7, it means you need to install a newer compiler than the default system compiler gcc. See the [installation details article](https://arrow.apache.org/docs/r/articles/developers/install_details.html) for guidance.
+
+- Development versions of arrow are released nightly. For information on how to installl nighhtly builds please see the [installing nightly builds](articles/install_nightly.html) article.
+
+## What can the arrow package do?
+
+The Arrow C++ library is comprised of different parts, each of which serves a specific purpose. The arrow package provides binding to the C++ functionality for a wide range of data analysis
+tasks.
+
+It allows users to read and write data in a variety formats:
 
-- On Linux the installation process can sometimes be more involved because 
-CRAN does not host binaries for Linux. For more information please see the [installation guide](https://arrow.apache.org/docs/r/articles/install.html).
+- Read and write Parquet files, an efficient and widely used columnar format
+- Read and write Arrow (formerly known as Feather) files, a format optimized for speed and
+  interoperability
+- Read and write CSV files with excellent speed and efficiency
+- Read and write multi-file and larger-than-memory datasets
+- Read JSON files
 
-- If you are compiling arrow from source, please note that as of version 
-10.0.0, arrow requires C++17 to build. This has implications on Windows and
-CentOS 7. For Windows users it means you need to be running an R version of 
-4.0 or later. On CentOS 7, it means you need to install a newer compiler 
-than the default system compiler gcc 4.8. See the [installation details article](https://arrow.apache.org/docs/r/articles/developers/install_details.html) for guidance. Note that 
-this does not affect users who are installing a binary version of the package.
+It provides access to remote filesystems and servers:
 
-- Development versions of arrow are released nightly. Most users will not 
-need to install nightly builds, but if you do please see the article on [installing nightly builds](https://arrow.apache.org/docs/r/articles/install_nightly.html) for more information.
+- Read and write files in Amazon S3 and Google Cloud Storage buckets
+- Connect to Arrow Flight servers to transport large datasets over networks
 
-## Arrow resources 
+Additional features include:
+
+- Manipulate and analyze Arrow data with dplyr verbs
+- Zero-copy data sharing between R and Python
+- Fine control over column types to work seamlessly with databases and data warehouses
+- Toolkit for building connectors to other applications and services that use Arrow
+
+## What is Apache Arrow?
+
+Apache Arrow is a cross-language development platform for in-memory and
+larger-than-memory data. It specifies a standardized language-independent
+columnar memory format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational libraries
+and zero-copy streaming, messaging, and interprocess communication.
+
+This package exposes an interface to the Arrow C++ library, enabling access to
+many of its features in R. It provides low-level access to the Arrow C++ library
+API and higher-level access through a dplyr backend and familiar R functions.
 
-In addition to the official [Arrow R package documentation](https://arrow.apache.org/docs/r/), the [Arrow for R cheatsheet](https://github.com/apache/arrow/blob/-/r/cheatsheet/arrow-cheatsheet.pdf), and the [Apache Arrow R Cookbook](https://arrow.apache.org/cookbook/r/index.html) are useful resources for getting started with arrow.
+
+## Arrow resources
+
+There are a few additional resources that you may find useful for getting started with arrow:
+
+- The official [Arrow R package documentation](https://arrow.apache.org/docs/r/)
+- [Arrow for R cheatsheet](https://github.com/apache/arrow/blob/-/r/cheatsheet/arrow-cheatsheet.pdf)
+- [Apache Arrow R Cookbook](https://arrow.apache.org/cookbook/r/index.html)
+- R for Data Science [Chapter on Arrow](https://r4ds.hadley.nz/arrow)
+- [Awesome Arrow R](https://github.com/thisisnic/awesome-arrow-r)
 
 ## Getting help
 
+We welcome questions, discussion, and contributions from users of the
+arrow package. For information about mailing lists and other venues
+for engaging with the Arrow developer and user communities, please see
+the [Apache Arrow Community](https://arrow.apache.org/community/) page.
+
 If you encounter a bug, please file an issue with a minimal reproducible
 example on [GitHub issues](https://github.com/apache/arrow/issues).
 Log in to your GitHub account, click on **New issue** and select the type of
@@ -104,11 +105,8 @@ features** section of the [Contributing to Apache
 Arrow](https://arrow.apache.org/docs/developers/#contributing) page
 in the Arrow developer documentation.
 
-We welcome questions, discussion, and contributions from users of the
-arrow package. For information about mailing lists and other venues
-for engaging with the Arrow developer and user communities, please see
-the [Apache Arrow Community](https://arrow.apache.org/community/) page.
+## Code of Conduct
 
-Please note that all participation in the Apache Arrow project is 
+Please note that all participation in the Apache Arrow project is
 governed by the Apache Software Foundation's [code of
 conduct](https://www.apache.org/foundation/policies/conduct.html).
diff --git a/r/_pkgdown.yml b/r/_pkgdown.yml
@@ -57,10 +57,10 @@ home:
   sidebar:
     structure:
       - project
-      - implementations
       - links
       - license
       - community
+      - implementations
       - citation
       - authors
       - dev
@@ -85,6 +85,10 @@ home:
           [R](index.html) <br>
           [Ruby](https://github.com/apache/arrow/blob/main/ruby/README.md) <br>
           [Rust](https://docs.rs/crate/arrow/latest)
+      community:
+        title: Community
+        text: >
+          [Code of conduct](https://www.apache.org/foundation/policies/conduct.html)
 
 navbar:
   bg: black