Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Data wrangling

Anton Antonov
MathematicaForPrediction at GitHub
MathematicaVsR project at GitHub
November, 2016

Introduction

This project has multiple sub-projects for the different data wrangling tasks needed to statistics (machine learning and data mining).

Comparison

Data wrangling R is heavily influenced by the creation (publication and description) of the packages "plyr", [1,2], and "reshape2", [3].

The need in R for a package like "plyr" is because of R's central data structures, (vectors, lists, data frames) and the complicated system data structure transformation functions. (See, for example, Circle 4 of the book "The R inferno", [4].) In Mathematica the functionalities in "plyr" are easily programmed with common, base Mathematica functions.

Nevertheless, the know-how of data wrangling in R is much more streamlined -- both in base functions and packages -- and there are multiple easy to find resources on Internet for doing particular data wrangling tasks (with R.)

A list of some basic comparison documents and codes.

References

[1] Hadley Wickham, "plyr: Tools for Splitting, Applying and Combining Data", CRAN. Also see http://had.co.nz/plyr/.

[2] Hadley Wickham, "The Split-Apply-Combine Strategy for Data Analysis", (2011), Volume 40, Issue 1, Journ. of Stat. Soft.

[3] Hadley Wickham, "reshape2: Flexibly Reshape Data: A Reboot of the Reshape Package", CRAN.

[4] Patrick Burns, The R inferno, 2012, free PDF link.