Following mlpack convention we split data sets that may be in one common data frame containing both predictors and dependent variable into two as this faciliates loading into two distinct (temporary) tables from which the functions added here read them.
This is a variant of the standard R example where the trees data set is used in logs. We simply
save X and y after a simple log() transformation shown in some other R examples, i.e.
> data(trees)
> X <- with(trees, cbind(log(Girth), log(Height)))
> write.csv(X, "trees_x.csv", row.names=FALSE)
>
y <- with(trees, cbind(log(Volume)))
> write.csv(y, "trees_y.csv", row.names=FALSE)
>
The reference given in help(trees) is Atkinson, A. C. (1985) Plots, Transformations and
Regression. Oxford University Press. The help pages actually a different multiplicative model for
tree volume, we really only this as a minimal example.
This well known data set is included in R (see help(iris)) and the UC Irvine data repository for
machine learning and is described on this page in more detail. We copied the
[mlpack data][mldata] files that already split into features and labels. We use .csv for both files.
This is standard mlpack example for random forests. The data set originates from the UC
Irvine data repository for machine learning and is described on this
page in more detail. We took the subset containing 10k rows from the mlpack data
page, i.e. we did not sample ourselves. We split the 55th column labels off into its
own file, and kept the other 54. The dimensions are now 10,000 x 54 and 10,000 x 1, respectively.