nestedLogit/vignettes/plotting-ggplot.Rmd at 7fa607e6d891591eeb2b201598862fef3197bd1b · friendly/nestedLogit

History

304 lines (251 loc) · 10.9 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

---

title: "Plotting nestedLogit models with ggplot2"

author: "Michael Friendly and John Fox"

date: "`r Sys.Date()`"

package: nestedLogit

output:

rmarkdown::html_vignette:

fig_caption: yes

bibliography: ["references.bib", "packages.bib"]

csl: apa.csl

vignette: >

%\VignetteIndexEntry{Plotting nestedLogit models with ggplot2}

%\VignetteEngine{knitr::rmarkdown}

%\VignetteEncoding{UTF-8}

---

```{r, include = FALSE}

knitr::opts_chunk$set(

collapse = TRUE,

message = FALSE,

warning = FALSE,

fig.align = "center",

fig.height = 6,

fig.width = 7,

fig.path = "fig/",

dev = "png",

comment = "#>"

)

# save some typing

knitr::set_alias(w = "fig.width",

h = "fig.height",

cap = "fig.cap")

# colorize text

colorize <- function(x, color) {

if (knitr::is_latex_output()) {

sprintf("\\textcolor{%s}{%s}", color, x)

} else if (knitr::is_html_output()) {

sprintf("<span style='color: %s;'>%s</span>", color,

} else x

}

set.seed(47)

.opts <- options(digits = 4)

# packages to be cited here. Code at the end automatically updates packages.bib

#to.cite <- c("ggplot2", "geomtextpath", "equatiomatic")

```

Load the packages we'll use here:

```{r setup}

library(nestedLogit) # Nested Dichotomy Logistic Regression Models

library(knitr) # A General-Purpose Package for Dynamic Report Generation in R

library(dplyr) # A Grammar of Data Manipulation

library(tidyr) # Tidy Messy Data

library(ggplot2) # Create Elegant Data Visualisations Using the Grammar of Graphics

library(geomtextpath) # Curved Text in 'ggplot2'

```

The main vignette illustrated the basic plot method, `plot.nestedLogit()` in the package.

However, for better control of the details and possibly more pleasing graphs,

it is useful to describe how graphs can be constructed using `ggplot2` [@R-ggplot2]. We'll use the

same example of women's labor force participation, using the original dichotomies:

```{r wlf-model}

data(Womenlf, package = "carData")

comparisons <- logits(work=dichotomy("not.work", c("parttime", "fulltime")),

full=dichotomy("parttime", "fulltime"))

wlf.nested <- nestedLogit(partic ~ hincome + children,

dichotomies = comparisons,

data=Womenlf)

```

The advantages of this approach are that

* you can handle more complicated designs with more predictors by faceting,

* it allows you to plot either the fitted probabilities or their transformed logits,

* you can also obtain plots for log odds corresponding to each of the dichotomies which comprise the nested logit model.

As we will illustrate, this provides a nice visual interpretation of the alternative

specification of dichotomies for the `Womenlf` data discussed in the section

"Alternative models for the `Womenlf` data" of the main vignette.

## Fitted probabilities

To draw a plot, it is sufficient to calculate predicted probabilities over a grid of

values of the predictor variables. Here, we select a range of 0 - 45 in steps of 5,

combined with the two values of `children`.

```{r pred.nested}

new <- expand.grid(hincome=seq(0, 45, by = 5),

children=c("absent", "present"))

pred.nested <- predict(wlf.nested, newdata = new)

names(pred.nested)

```

As explained in `help(predict.nestedLogit)`, the predict method returns a complicated structure --

a list of four data frames corresponding to the predicted probabilities for the response categories,

the corresponding logits, and each of their standard errors.

```{r}

head(pred.nested[["p"]])

```

However, `ggplot` wants the data in long format.

This is easily done using the `as.data.frame()` method, which also includes the values of the predictors in the `newdata` data set:

```{r}

plotdata <- as.data.frame(pred.nested, newdata=new)

head(plotdata)

```

## Plotting with `ggplot2`

Then we can plot probability against one predictor, use `color` to distinguish the levels of the response

(`partic`) and facet the plot by `children`. The point-wise standard errors are drawn

in a 68% confidence envelope using `geom_ribbon()`. We've also plotted the predicted values as points to

show where the predictions are obtained.

```{r wlf-ggplot-p1}

#| out.width = "100%",

#| fig.height = 4,

#| fig.cap = "**ggplot**: Predicted probabilities of working at all or working part time or full time versus husband's income, by `children`"

theme_set(theme_bw(base_size = 14))

gg1 <- ggplot(plotdata,

aes(x=hincome, y=p, color=response)) +

geom_line(linewidth = 2) +

geom_point(size = 1.5, shape = 16, color = "black") +

labs(x="Husband's Income", y= "Probability") +

facet_wrap(~ children, labeller = label_both) +

geom_ribbon(aes(ymin=p - se.p,

ymax=p + se.p,

fill = response), alpha = 0.3)

gg1

```

It is noteworthy that the confidence envelopes are wider for not-working women at higher levels of husband's income, where there are fewer observations.

### Direct labels

Plot legends are somewhat hard to read and take up unnecessary space in the plot, so it is often better

to label the curves directly. The `geomtextpath` package [@R-geomtextpath] produces a nicer plot.

```{r wlf-ggplot-p2}

#| out.width = "100%",

#| fig.height = 4,

#| fig.cap = "The same plot using direct labels on the curves rather than a legend."

gg1 + geom_textline(aes(label = response),

hjust = -0.01, vjust=-0.5, size=5) +

theme(legend.position = "none")

```

### Plotting log-odds

The advantage of the data structure returned by `as.data.frame()` is that you can just as easily plot the predicted probabilities on the scale of log-odds

($\text{logit}(p) = \log(p / (1-p))$),

using the `logit` and

`logit.se` components.

```{r wlf-ggplot-logit}

#| out.width = "100%",

#| fig.height = 4,

#| fig.cap = "Predicted log odds of working at all or working part time or full time versus husband's income, by `children`"

ggplot(plotdata,

aes(x=hincome, y=logit, color=response)) +

geom_line(linewidth = 2) +

geom_point(size = 1.5, shape = 16, color = "black") +

labs(x="Husband's Income", y= "Log Odds") +

facet_wrap(~ children, labeller = label_both) +

geom_ribbon(aes(ymin=logit - se.logit,

ymax=logit + se.logit,

fill = response), alpha = 0.3) +

geom_textline(aes(label = response),

hjust = -0.01, vjust=-0.5, size=5) +

theme(legend.position = "none")

```

## Predicted logit values for the dichotomies

The nested logit model `wlf.nested` comprises the two binary logistic regression

models for the `work` and `full` dichotomies. We can plot these as follows.

```{r}

names(models(wlf.nested))

```

The `predict()` method can also generate predicted values and their standard errors for the logits of these dichotomies, using the `model = "dichotomies"` argument:

```{r}

pred.dichot <- predict(wlf.nested, newdata = new,

model = "dichotomies")

str(pred.dichot)

```

Transforming this to a data frame, we get an analogous result for plotting:

```{r}

plotlogit <- as.data.frame(pred.dichot, newdata = new)

head(plotlogit)

```

Then, plot the `logit` vs. husband's income, with separate curves for the two

sub-models:

```{r wlf-ggplot-dichot1}

#| out.width = "100%",

#| fig.height = 4,

#| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `children`"

ggplot(plotlogit,

aes(x=hincome, y=logit, color=response)) +

geom_line(linewidth = 2) +

geom_point(size = 1.5, shape = 16, color = "black") +

labs(x="Husband's Income", y= "Log Odds") +

facet_wrap(~ children, labeller = label_both) +

geom_ribbon(aes(ymin=logit - se.logit,

ymax=logit + se.logit,

fill = response), alpha = 0.3) +

geom_textline(aes(label = response),

hjust = -0.01, vjust=-0.5, size=5) +

theme(legend.position = "none")

```

Or, interchanging the roles of `children` and `response`, we can plot these the other way, faceting by `response`.

```{r wlf-ggplot-dichot2}

#| out.width = "100%",

#| fig.height = 4,

#| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `response`"

ggplot(plotlogit,

aes(x=hincome, y=logit, color=children)) +

geom_line(linewidth = 2) +

geom_point(size = 1.5, shape = 16, color = "black") +

labs(x="Husband's Income", y= "Log Odds") +

facet_wrap(~ response, labeller = label_both) +

geom_ribbon(aes(ymin=logit - se.logit,

ymax=logit + se.logit,

fill = children), alpha = 0.3) +

geom_textline(aes(label = children),

hjust = -0.01, vjust=-0.5, size=5) +

theme(legend.position = "none")

```

This nicely illustrates the nature of the fitted logit models: The lines in each panel

have the same slopes for the two levels of `children`, differing only in their intercepts.

The `full` distinction between working full-time vs. part-time decreases faster with husband's

income than for the `work` dichotomy between not working at all and working either

part-time or full-time.

## Alternative model

In the main vignette we mentioned that

an alternative set of nested dichotomies first contrasts full-time work with the other categories, {full-time} vs. {not working, part-time}, and then {not working} vs. {part-time}.

```{r alt-model}

wlf.nested.alt <- nestedLogit(partic ~ hincome + children,

logits(full=dichotomy(nonfulltime=c("not.work", "parttime"), "fulltime"),

part=dichotomy("not.work", "parttime")),

data=Womenlf)

```

Proceeding in the same way as above, we get predicted logits and standard errors for each

of the dichotomies.

```{r}

pred.dichot.alt <- predict(wlf.nested.alt, newdata = new,

model = "dichotomies")

plotlogit.alt <- as.data.frame(pred.dichot.alt, newdata = new)

head(plotlogit.alt)

```

Plotting these as before:

```{r wlf-ggplot-alt1}

#| out.width = "100%",

#| fig.height = 4,

#| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `response`"

ggplot(plotlogit.alt,

aes(x=hincome, y=logit, color=children)) +

geom_line(linewidth = 2) +

geom_point(size = 1.5, shape = 16, color = "black") +

labs(x="Husband's Income", y= "Log Odds") +

facet_wrap(~ response, labeller = label_both) +

geom_ribbon(aes(ymin=logit - se.logit,

ymax=logit + se.logit,

fill = children), alpha = 0.3) +

geom_textline(aes(label = children),

hjust = -0.01, vjust=-0.5, size=5) +

theme(legend.position = "none")

```

It's apparent that the alternative model produces a simpler description of the data: The predictors husband's income and presence of children affect the decision to work full-time, but not the decision to work part-time among those who aren't engaged in full-time work.

In particular it is clear that neither husband's income nor having young children has any effect on the

decision to work part-time.

```{r, include = FALSE}

options(.opts)

```

## References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

plotting-ggplot.Rmd

Latest commit

History

plotting-ggplot.Rmd

File metadata and controls