-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathplotting-ggplot.Rmd
More file actions
304 lines (251 loc) · 10.9 KB
/
plotting-ggplot.Rmd
File metadata and controls
304 lines (251 loc) · 10.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
---
title: "Plotting nestedLogit models with ggplot2"
author: "Michael Friendly and John Fox"
date: "`r Sys.Date()`"
package: nestedLogit
output:
rmarkdown::html_vignette:
fig_caption: yes
bibliography: ["references.bib", "packages.bib"]
csl: apa.csl
vignette: >
%\VignetteIndexEntry{Plotting nestedLogit models with ggplot2}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
message = FALSE,
warning = FALSE,
fig.align = "center",
fig.height = 6,
fig.width = 7,
fig.path = "fig/",
dev = "png",
comment = "#>"
)
# save some typing
knitr::set_alias(w = "fig.width",
h = "fig.height",
cap = "fig.cap")
# colorize text
colorize <- function(x, color) {
if (knitr::is_latex_output()) {
sprintf("\\textcolor{%s}{%s}", color, x)
} else if (knitr::is_html_output()) {
sprintf("<span style='color: %s;'>%s</span>", color,
x)
} else x
}
set.seed(47)
.opts <- options(digits = 4)
# packages to be cited here. Code at the end automatically updates packages.bib
#to.cite <- c("ggplot2", "geomtextpath", "equatiomatic")
```
Load the packages we'll use here:
```{r setup}
library(nestedLogit) # Nested Dichotomy Logistic Regression Models
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R
library(dplyr) # A Grammar of Data Manipulation
library(tidyr) # Tidy Messy Data
library(ggplot2) # Create Elegant Data Visualisations Using the Grammar of Graphics
library(geomtextpath) # Curved Text in 'ggplot2'
```
The main vignette illustrated the basic plot method, `plot.nestedLogit()` in the package.
However, for better control of the details and possibly more pleasing graphs,
it is useful to describe how graphs can be constructed using `ggplot2` [@R-ggplot2]. We'll use the
same example of women's labor force participation, using the original dichotomies:
```{r wlf-model}
data(Womenlf, package = "carData")
comparisons <- logits(work=dichotomy("not.work", c("parttime", "fulltime")),
full=dichotomy("parttime", "fulltime"))
wlf.nested <- nestedLogit(partic ~ hincome + children,
dichotomies = comparisons,
data=Womenlf)
```
The advantages of this approach are that
* you can handle more complicated designs with more predictors by faceting,
* it allows you to plot either the fitted probabilities or their transformed logits,
* you can also obtain plots for log odds corresponding to each of the dichotomies which comprise the nested logit model.
As we will illustrate, this provides a nice visual interpretation of the alternative
specification of dichotomies for the `Womenlf` data discussed in the section
"Alternative models for the `Womenlf` data" of the main vignette.
## Fitted probabilities
To draw a plot, it is sufficient to calculate predicted probabilities over a grid of
values of the predictor variables. Here, we select a range of 0 - 45 in steps of 5,
combined with the two values of `children`.
```{r pred.nested}
new <- expand.grid(hincome=seq(0, 45, by = 5),
children=c("absent", "present"))
pred.nested <- predict(wlf.nested, newdata = new)
names(pred.nested)
```
As explained in `help(predict.nestedLogit)`, the predict method returns a complicated structure --
a list of four data frames corresponding to the predicted probabilities for the response categories,
the corresponding logits, and each of their standard errors.
```{r}
head(pred.nested[["p"]])
```
However, `ggplot` wants the data in long format.
This is easily done using the `as.data.frame()` method, which also includes the values of the predictors in the `newdata` data set:
```{r}
plotdata <- as.data.frame(pred.nested, newdata=new)
head(plotdata)
```
## Plotting with `ggplot2`
Then we can plot probability against one predictor, use `color` to distinguish the levels of the response
(`partic`) and facet the plot by `children`. The point-wise standard errors are drawn
in a 68% confidence envelope using `geom_ribbon()`. We've also plotted the predicted values as points to
show where the predictions are obtained.
```{r wlf-ggplot-p1}
#| out.width = "100%",
#| fig.height = 4,
#| fig.cap = "**ggplot**: Predicted probabilities of working at all or working part time or full time versus husband's income, by `children`"
theme_set(theme_bw(base_size = 14))
gg1 <- ggplot(plotdata,
aes(x=hincome, y=p, color=response)) +
geom_line(linewidth = 2) +
geom_point(size = 1.5, shape = 16, color = "black") +
labs(x="Husband's Income", y= "Probability") +
facet_wrap(~ children, labeller = label_both) +
geom_ribbon(aes(ymin=p - se.p,
ymax=p + se.p,
fill = response), alpha = 0.3)
gg1
```
It is noteworthy that the confidence envelopes are wider for not-working women at higher levels of husband's income, where there are fewer observations.
### Direct labels
Plot legends are somewhat hard to read and take up unnecessary space in the plot, so it is often better
to label the curves directly. The `geomtextpath` package [@R-geomtextpath] produces a nicer plot.
```{r wlf-ggplot-p2}
#| out.width = "100%",
#| fig.height = 4,
#| fig.cap = "The same plot using direct labels on the curves rather than a legend."
gg1 + geom_textline(aes(label = response),
hjust = -0.01, vjust=-0.5, size=5) +
theme(legend.position = "none")
```
### Plotting log-odds
The advantage of the data structure returned by `as.data.frame()` is that you can just as easily plot the predicted probabilities on the scale of log-odds
($\text{logit}(p) = \log(p / (1-p))$),
using the `logit` and
`logit.se` components.
```{r wlf-ggplot-logit}
#| out.width = "100%",
#| fig.height = 4,
#| fig.cap = "Predicted log odds of working at all or working part time or full time versus husband's income, by `children`"
ggplot(plotdata,
aes(x=hincome, y=logit, color=response)) +
geom_line(linewidth = 2) +
geom_point(size = 1.5, shape = 16, color = "black") +
labs(x="Husband's Income", y= "Log Odds") +
facet_wrap(~ children, labeller = label_both) +
geom_ribbon(aes(ymin=logit - se.logit,
ymax=logit + se.logit,
fill = response), alpha = 0.3) +
geom_textline(aes(label = response),
hjust = -0.01, vjust=-0.5, size=5) +
theme(legend.position = "none")
```
## Predicted logit values for the dichotomies
The nested logit model `wlf.nested` comprises the two binary logistic regression
models for the `work` and `full` dichotomies. We can plot these as follows.
```{r}
names(models(wlf.nested))
```
The `predict()` method can also generate predicted values and their standard errors for the logits of these dichotomies, using the `model = "dichotomies"` argument:
```{r}
pred.dichot <- predict(wlf.nested, newdata = new,
model = "dichotomies")
str(pred.dichot)
```
Transforming this to a data frame, we get an analogous result for plotting:
```{r}
plotlogit <- as.data.frame(pred.dichot, newdata = new)
head(plotlogit)
```
Then, plot the `logit` vs. husband's income, with separate curves for the two
sub-models:
```{r wlf-ggplot-dichot1}
#| out.width = "100%",
#| fig.height = 4,
#| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `children`"
ggplot(plotlogit,
aes(x=hincome, y=logit, color=response)) +
geom_line(linewidth = 2) +
geom_point(size = 1.5, shape = 16, color = "black") +
labs(x="Husband's Income", y= "Log Odds") +
facet_wrap(~ children, labeller = label_both) +
geom_ribbon(aes(ymin=logit - se.logit,
ymax=logit + se.logit,
fill = response), alpha = 0.3) +
geom_textline(aes(label = response),
hjust = -0.01, vjust=-0.5, size=5) +
theme(legend.position = "none")
```
Or, interchanging the roles of `children` and `response`, we can plot these the other way, faceting by `response`.
```{r wlf-ggplot-dichot2}
#| out.width = "100%",
#| fig.height = 4,
#| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `response`"
ggplot(plotlogit,
aes(x=hincome, y=logit, color=children)) +
geom_line(linewidth = 2) +
geom_point(size = 1.5, shape = 16, color = "black") +
labs(x="Husband's Income", y= "Log Odds") +
facet_wrap(~ response, labeller = label_both) +
geom_ribbon(aes(ymin=logit - se.logit,
ymax=logit + se.logit,
fill = children), alpha = 0.3) +
geom_textline(aes(label = children),
hjust = -0.01, vjust=-0.5, size=5) +
theme(legend.position = "none")
```
This nicely illustrates the nature of the fitted logit models: The lines in each panel
have the same slopes for the two levels of `children`, differing only in their intercepts.
The `full` distinction between working full-time vs. part-time decreases faster with husband's
income than for the `work` dichotomy between not working at all and working either
part-time or full-time.
## Alternative model
In the main vignette we mentioned that
an alternative set of nested dichotomies first contrasts full-time work with the other categories, {full-time} vs. {not working, part-time}, and then {not working} vs. {part-time}.
```{r alt-model}
wlf.nested.alt <- nestedLogit(partic ~ hincome + children,
logits(full=dichotomy(nonfulltime=c("not.work", "parttime"), "fulltime"),
part=dichotomy("not.work", "parttime")),
data=Womenlf)
```
Proceeding in the same way as above, we get predicted logits and standard errors for each
of the dichotomies.
```{r}
pred.dichot.alt <- predict(wlf.nested.alt, newdata = new,
model = "dichotomies")
plotlogit.alt <- as.data.frame(pred.dichot.alt, newdata = new)
head(plotlogit.alt)
```
Plotting these as before:
```{r wlf-ggplot-alt1}
#| out.width = "100%",
#| fig.height = 4,
#| fig.cap = "Predicted logits for the two dichotomies, `work` and `full` versus `hincome`, by `response`"
ggplot(plotlogit.alt,
aes(x=hincome, y=logit, color=children)) +
geom_line(linewidth = 2) +
geom_point(size = 1.5, shape = 16, color = "black") +
labs(x="Husband's Income", y= "Log Odds") +
facet_wrap(~ response, labeller = label_both) +
geom_ribbon(aes(ymin=logit - se.logit,
ymax=logit + se.logit,
fill = children), alpha = 0.3) +
geom_textline(aes(label = children),
hjust = -0.01, vjust=-0.5, size=5) +
theme(legend.position = "none")
```
It's apparent that the alternative model produces a simpler description of the data: The predictors husband's income and presence of children affect the decision to work full-time, but not the decision to work part-time among those who aren't engaged in full-time work.
In particular it is clear that neither husband's income nor having young children has any effect on the
decision to work part-time.
```{r, include = FALSE}
options(.opts)
```
## References