Topological data analysis (TDA) relies heavily on mature libraries like PHAT, Dionysus, and GUDHI. While these libraries have interfaces to Python and, through the {TDA} package, R, they have been developed primarily by and for statistical topologists. As TDA matures and standard workflows emerge, the need arises for more accessible and modular implementations. The SciKit-TDA project, an extension of SciPy, is underway in Python for this purpose. The tdaverse collection is intended to meet these needs in R through a Tidyverse lens.
The Tidyverse consists of numerous R packages that are built upon a shared set of syntactic and grammatical conventions and designed to interface naturally with each other. With its sibling collection Tidymodels, it provides a comprehensive toolkit for building advanced data analysis and modeling pipelines. The goal of tdaverse is to provide the data structures, computational engines, statistical models, and visualization tools needed to efficiently explore and analyze topological data in R and to integrate these tasks into tidy workflows.
Methods for detecting topological structure from point cloud data sets are often validated by applying them to point clouds sampled from spaces with known topology. Functions that generate such samples are therefore valuable to developers of topological–statistical software. The goal of {tdaunif} is to assemble a comprehensive collection of such samplers for convenient use.
In addition testing TDA software, {tdaunif} will be used with {simplextree} to generate geometric random simplicial complexes and on its own as an educational tool for the study of ≥3-dimensional manifolds.
An R package aimed at simplifying computation for simplicial complexes. The package provides R bindings to a simplex tree data structure implemented in C++11 and exported as an Rcpp module. Instances can be created from abstract or geometric data and exported and imported via serialization, and they can be efficiently inspected, queried, modified, and traversed using both Rcpp and S3 methods. The underlying library implementation also exports a C++ header, which can be specified as a dependency and used in other packages via {Rcpp} attributes.
{simplextree} will interface with other packages for various tasks: to sample geometric complexes based on arbitrary manifolds with {tdaunif}, to construct and update the nerves of mappers in {Mapper}, and to perform computations involving simplicial complexes stored in other formats via {interplex}.
An R interface to the Ripser and Cubical Ripser persistent homology computational engines from C++ via {Rcpp}. It can be used as a convenient and efficient tool in TDA pipelines involving point cloud data (Risper) or image and volume data (Cubical Ripser).
{ripserr} is designed as a minimal standalone package and will be called to compute persistence data when underlying simplicial filtrations are not needed.
An R interface to the ReebGraphPairing Java program to pair critical points of Reeb graphs. Reeb graphs may be represented as {igraph} or {network} objects, or using a new minimal S3 class. Pairings can then be post-processed to extended persistent homology.
A helper package comprising low-level persistent homology tools (PH utilities) to be shared by multiple tdaverse packages, currently an S3 class for persistence data and a {cpp11} interface to the Hera library to compute Wasserstein distances. (Persistence is phutil.)
A collection of coercers between different data structures that encode simplicial complexes (including graphs/networks), inspired by {intergraph}.
{interplex} enables tdaverse users to couple functionality from other packages into their workflows, for example layout algorithms from {igraph} and simplicial filtrations from GUDHI (via {reticulate}).
A {recipes} and {dials} extension that provides pre-processing steps to compute persistent homology from data and to calculate vectorizations of persistence diagrams. It relies on {TDA} and {ripserr} to compute PH and on the {TDAvec} package to compute vectorizations.
{tdarec} enables Tidymodels users to seamlessly build topological transformations into their machine learning workflows.
{TDAstats} was originally designed with three goals in mind: to compute persistent homology, to visualize persistence data, and to perform topological statistical inferenence between data sets. Since its release, these tasks have been superseded by {risperr}, {ggtda}, {phutil}, and {inphr}. Because it is widely used and has few dependencies, {TDAstats} will be maintained as legacy software.
The {landmark} package provides functions to calculate landmark sets for finite metric spaces using the maxmin procedure (for fixed-radius balls) or an adaptation of it for rank data (for roughly fixed-cardinality nearest neighborhoods). These procedures can also return membership lists for the covers centered at these landmark sets. These covering method engines will be invoked by {Mapper} and other arbitrary cover–based constructions.
The {Mapper} package provides a set of tools for computing the mapper construction. Previous versions of this package included the simplex tree class and the maxmin procedure, which have been or are being spun off and expanded as the {simplextree} and {landmark} packages.
An {Rcpp} interface to the Persistence Landscapes Toolbox. The C++ class for persistence landscapes is exposed as an Rcpp module and wrapped as an S4 class. Vector space operations and additional routines are provided through R.
Statistical transformations, geometric constructions, and other {ggplot2} elements for publication-quality plots of data arising from topological objects and models. Persistent homology can be computed for continuous functions and Reeb graphs as well as point clouds, and ggtda layers are in development for numerous plot types that have been proposed to gain insight from persistence data. In addition, ggtda also provides layers to conveniently plot ball covers, Vietoris–Rips complexes, and Čech complexes for 2-dimensional point clouds.
Covers of data sets are ubiquitous in lower-level topological methods, including mapper-like constructions.
In order to allow more flexible implementations, the object-oriented package {Cover} would spin off the CoverRef R6 class from {Mapper} and introduce tools for efficiently storing and analyzing towers and other aggregates of covers.
A great advantage of GUDHI is the ability to work directly with simplicial filtrations, including to construct them from raw data and to compute persistent data from them. {ripserr} sidesteps these objects, but they can be performed using {TDA}. The idea for this package is to port different engines for computing and processing filtrations, analogously to {parsnip}.
Analogous to {tidygraph}, this package would provide a "tidy" API to print, summarize, annotate, and perhaps visualize simplicial complexes and filtrations.
To learn more and contribute to package design or development, please visit the GitHub repositories and consider commenting on or creating an issue! Or check this list of low-, medium-, and high-hanging fruit.
- Raoul R. Wadhwa (Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Ohio, USA)
- Matt Piekenbrock (Khoury College of Computer Sciences, Northeastern University, Massachusetts, USA)
- Jason Cory Brunson (Laboratory for Systems Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, University of Florida, Florida, USA)
- James Otto (Department of Biometrics, Alcon Laboratories, Texas, USA)
- Aymeric Stamm (Jean Leray Mathematics Laboratory, National Center for Scientific Research, France)