This package includes special functions and datasets for a course on Exploratory Data Analysis based on John Tukey's EDA text.
It can be installed from Jim Albert's Github site:
library(remotes)
install_github("bayesball/LearnEDAfunctions")
The only prerequisite packages are the dplyr and ggplot2 packages.
The LearnEDAfunctions package is loaded by the library() function:
library(LearnEDAfunctions)
Also there are several functions from other packages used in the EDA course. One will need to install these packages to use this functions.
- stem.leaf() from the aplpack package
- rootogram() from the vcd package
act.scores.06.07ACT scores of states in the US.baseball.attendanceAttendance at baseball teams.batting.historyYearly Batting Statistics for Major League BaseballbeatlesLengths of songs from Beatles' albums.boston.marathonBoston Marathon completion times of women of different ages.boston.marathon.wtimesBoston marathon winning timesbraves.attendanceAttendance at games of a baseball teamcar.measurementsCar measurements -church.2wayChurch attendance as a two-way table.church.tseriesTime series of worship attendancecollege.ratingsRatings of National Universities in the U.S.farmsNumber of farms in the states of the U.S.fit.gaussianFitting a Gaussian curve to binned datafootballAmerican football scoresgestation.periodsGestation periods for different animals.grandma.19.40Grandmas Marathon completion timesheaviest.fishFish world record catcheshome.pricesHome sales prices in the U.S.homeruns.2000Team home run numbers for different seasons.homeruns.61Home run counts in 1961immigrantsImmigrant counts to US.island.areasAreas of islands from different continents.lakeLake measurementsmortality.ratesInfant mortality rates of countriesolympics.runOlympics running times.olympics.speed.skatingOlympics speed skating times.olympics.swimOlympics swimming times.pitching.historyYearly Pitching Statistics for Major League Baseballpop.changePopulation change for all states in the U.S.pop.densitiesPopulation densities of states for different years.pop.englandEngland populationrent.pricesRent prices in different cities.salariesSalaries of different professions in different cities.snowfallSnowfall amounts of two citiesstudentdataStudent datasettemperaturesTemperatures for different cities.tukey.24aTukey straightening exercise 24atukey.24bTukey straightening exercise 24btukey.26aTukey straightening exercise 24atukey.26bTukey straightening exercise 24btukey.26cTukey straightening exercise 24cus.popPopulation of United States
Fits a Gaussian curve to binned data.
data <- rt(200,df=5)
bins <- pretty(range(data))
g.mean <- 0
g.sd <- 1
fit.gaussian(data, bins, g.mean, g.sd)
Half-slope ratio
sx <- c(10,30,50)
sy <- c(5,8,20)
half.slope.ratio(sx, sy, 1, 1)
half.slope.ratio(sx, sy, -0.5, -0.5)
Hanning a sequence.
plot(WWWusage)
plot(smooth(WWWusage, kind="3RSS"))
plot(han(smooth(WWWusage, kind="3RSS")))
Hinkley's quick method.
raw <- state.x77[, "Population"]
hinkley(raw)
logs <- log(raw)
hinkley(logs)
Finds outliers by group using Tukey's rule
lval_plus(beatles, time, album)
Letter values
lval(rnorm(100))
raw <- state.x77[,"Population"]
matched.roots <- mtrans(raw,0.5)
matched.logs <- mtrans(raw,0)
boxplot(data.frame(raw, matched.roots,
matched.logs))
Plot of an additive fit.
temps <- matrix(data=c(50, 30, 35, 21, 38,
73, 58, 65, 57, 63,
88, 83, 89, 84, 86,
73, 62, 68, 59, 66),
nrow=5,ncol=4,
dimnames=list(c("Atlanta",
"Detroit", "Kansas City",
"Minneapolis", "Philadelphia"),
c("January", "April",
"July", "October")))
fit <- medpolish(temps)
plot2way(fit$row + fit$overall, fit$col,
dimnames(temps)[[1]],
dimnames(temps)[[2]])
Power transformation.
power.t(c(3, 6, 5, 4, 7), 0.5)
Computation of a resistant line.
df <- data.frame(x = 1:10,
y = 3 * (1:10) + rnorm(10))
rline(y ~ x, df, iter=5)
Spread versus level plot.
spread_level_plot(beatles, time, album)
Symmetry plot.
symplot(rnorm(100))
# symmetry plot for exponential data
symplot(rexp(100))