{"id":6954,"date":"2022-11-03T07:00:49","date_gmt":"2022-11-03T11:00:49","guid":{"rendered":"http:\/\/datacolada.org\/?p=6954"},"modified":"2022-11-29T09:20:06","modified_gmt":"2022-11-29T14:20:06","slug":"105","status":"publish","type":"post","link":"https:\/\/datacolada.org\/105","title":{"rendered":"[105] Meaningless Means #1: The Average Effect<BR> of Nudging Is d = .43"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><em>This post is the second in a series (see its introduction: <a href=\"https:\/\/datacolada.org\/104\" target=\"_blank\" rel=\"noopener\">htm<\/a>) arguing that meta-analytic means are often meaningless, because (1) they include results from invalid tests of the research question of interest to the meta-analyst, and (2) they average across fundamentally incommensurable results. In this post we focus primarily on problem (2), though problem (1) makes one or two guest appearances. <\/em><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><em>We did not choose this meta-analysis because it is atypical or especially problematic. We chose it because we believe it is typical and representatively problematic. Indeed, our critique is aimed at the enterprise of meta-analysis rather than at these particular meta-analysts, who, as far as we can tell, were simply doing what meta-analysts are instructed to do.<\/em><\/span><\/p>\n<hr \/>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Authors of a recent PNAS article meta-analyzed 447 effects of \u201cnudging\u201d on behavior (<a href=\"https:\/\/www.pnas.org\/doi\/abs\/10.1073\/pnas.2107346118\">.htm<\/a>). In the abstract, the authors report \u201cthat choice architecture interventions [i.e., nudges] overall promote behavior change with a small to medium effect size of Cohen\u2019s d = 0.43\u201d, and that \u201cthe effectiveness of choice architecture interventions varies significantly as a function of technique and domain\u201d.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">This paper has gotten a fair bit of attention, both because of its claims about the average effects of nudges, and because letters published in PNAS, responding to this article, have proposed that the overall average effect may be much smaller \u2013 perhaps as low as zero \u2013 when correcting for publication bias (<a href=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.2200300119\">letter 1<\/a> | <a href=\"https:\/\/www.pnas.org\/doi\/10.1073\/pnas.2200732119\">letter 2<\/a>). Indeed, letter 1 is titled, \u201cNo evidence for nudging after adjusting for publication bias.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">In this post, we walk through a few studies included in this meta-analysis, and in so doing illustrate the problems with meta-analytic averages. We argue that these problems render meaningless both the published averages as well as the (ostensibly) bias-corrected versions of those averages.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Combining Incommensurable Results<br \/>\n<\/strong>In this post, we focus our discussion on the fact that this meta-analysis averages across incommensurable results. That doesn\u2019t mean that the authors ignored differences across studies. On the contrary, they conducted two separate moderator analyses, one that breaks down the results by nudge type (e.g., defaults vs. reminders) and one that breaks them down by domain (e.g., finance vs. food). In this section, we review studies <em>within <\/em>two of these categories, essentially asking whether it makes sense to average within a subset of studies investigating reminders or a subset of studies investigating food decisions.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Obviously, we did not approach this task by reading all ~200 studies. In the interest of (our) time and to maximize informativeness, we decided in advance to focus on the extremes: the biggest and smallest effect size estimates within each category. Do these studies provide valid answers to the question of interest to the meta-analyst? Does it make sense to combine them? Let\u2019s take a look.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><u>Reminders<br \/>\n<\/u><\/strong>The meta-analysts defined a reminder as anything that increases \"the attentional salience of desirable behavior to overcome inattention due to information overload\u201d [<a href=\"#footnote_0_6954\" id=\"identifier_0_6954\" class=\"footnote-link footnote-identifier-link\" title=\"This definition comes from a taxonomy developed by M&uuml;nscher et al. (2016, .htm).\">1<\/a>]. Figure 4 from the PNAS paper, reprinted below, shows that the average effect of reminders is d = .29. Let\u2019s review the studies generating the smallest and largest effect sizes and consider whether it makes sense to average across them.<br \/>\n<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6957 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted.png\" alt=\"\" width=\"468\" height=\"426\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted.png 2475w, https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted-300x273.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted-1024x931.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted-768x698.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted-1536x1396.png 1536w, https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted-2048x1862.png 2048w, https:\/\/datacolada.org\/wp-content\/uploads\/Mertens-et-al.-Nudges-By-Intervention-Figure-Highlighted-850x773.png 850w\" sizes=\"auto, (max-width: 468px) 100vw, 468px\" \/><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><em>1. The Smallest Reminder Effect: Something Is The \u201cDish of The Day\u201d in the State of Denmark<br \/>\n<\/em><\/strong>In a study published in 2020 in the European Journal of Nutrition (<a href=\"https:\/\/link.springer.com\/article\/10.1007\/s00394-019-01903-y\" target=\"_blank\" rel=\"noopener\">.htm<\/a>), researchers reported a quasi-experiment conducted in four countries: Denmark, France, Italy, and the UK . Student groups were assigned, but not <em>randomly<\/em> assigned, to one of two foodservice conditions [<a href=\"#footnote_1_6954\" id=\"identifier_1_6954\" class=\"footnote-link footnote-identifier-link\" title=\"The original authors randomly assigned time slots, not individuals, to one of the foodservice conditions. There were very few timeslots (e.g., two time slots at three different schools in Denmark), and one cannot confidently expect the differences between, say, students who eat in an earlier time slot vs. those who eat in a later time slot, to even out over such a small number of time slots (e.g., maybe 2 out of 3 earlier time slots were assigned to the dish-of-the-day condition, and those who attend earlier time slots are less likely to prefer veggie balls). In addition, there is a problem of non-independence of observations within each time slot; students may be more likely to choose veggie balls if another student in their time slot chooses veggie balls. In general, we do not believe any causal inferences from this study are justified.\">2<\/a>]. As shown in Figure 1 of the original article, the key intervention was whether the vegetarian option was labeled as the \u201cdish of the day\u201d or not:<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-6958 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure.png\" alt=\"\" width=\"869\" height=\"457\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure.png 3437w, https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure-300x158.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure-1024x539.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure-768x404.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure-1536x808.png 1536w, https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure-2048x1077.png 2048w, https:\/\/datacolada.org\/wp-content\/uploads\/Veggie-Balls-Dish-Of-The-Day-Figure-850x447.png 850w\" sizes=\"auto, (max-width: 869px) 100vw, 869px\" \/><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The key measure was whether students chose the vegetarian option.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The original authors reported the results from each country separately, and thus so did the meta-analysts. This most negative effect of reminders in the meta-analysis (d = -.1223) represents the result that emerged in Denmark (N = 84), when only 4 out of 37 (10.8%) students in the intervention condition chose the vegetarian option vs. 7 out of 47 (14.9%) in the control condition [<a href=\"#footnote_2_6954\" id=\"identifier_2_6954\" class=\"footnote-link footnote-identifier-link\" title=\"When you pool across countries (N = 362) &ndash; and there may be good reasons not to do this (e.g., if the intervention or setting was meaningfully different in the four countries) &ndash; the effect is directionally positive: 15.5% choosing the vegetarian option in the intervention condition vs. 12.7% in the control condition.\">3<\/a>], [<a href=\"#footnote_3_6954\" id=\"identifier_3_6954\" class=\"footnote-link footnote-identifier-link\" title=\"It is worth noting that the original authors published this result despite finding no effect of the dish-of-the-day intervention. Publishing null results like this is not typical.\">4<\/a>].<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Note that it is not clear that this study should be included in the meta-analysis, since, as emphasized in the original article, students were not randomly assigned to condition, rendering these results correlational.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><em>2. The Largest Reminder Effect: Reminding (Whites) To Go To Sleep<br \/>\n<\/em><\/strong>A study published in 2017 in <em>Sleep Health<\/em> (<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2352721816301267\">.htm<\/a>) randomly assigned high school adolescents (N = 46) to either receive twice-daily text reminders to go to sleep on time (e.g., \u201cBedtime goals for tonight: Dim light at 9:30pm. Try getting into bed at 10:30pm.\u201d) or to receive no reminders about sleep.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">All participants wore a sleep monitor to measure sleep duration.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">In the overall analysis, the authors found no effect of the text message reminders on sleep duration (<em>p<\/em> = .471). But then they found a significant interaction with race\/ethnicity: reminders significantly increased the sleep hours of the 20 (non-Hispanic) Whites in the sample (d = 1.18, <em>p<\/em> = .028), and had no significant effect on other ethnic groups (<em>p <\/em>= .094, opposite direction).<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The meta-analysis includes only the p &lt;\u00a0 .05 result for (non-Hispanic) Whites, not the p &gt; .05 result for the whole sample, and not the p &gt; .05 result for minority students. In so doing, the meta-analysis introduced a form of publication bias \u2013 selectively including only the analyses that were statistically significant \u2013 that was not present in the original study [<a href=\"#footnote_4_6954\" id=\"identifier_4_6954\" class=\"footnote-link footnote-identifier-link\" title=\"The original authors did not directly report the results for the subsample of minority students. The meta-analysts told us that they contacted the original authors to try to get these results and did not receive a reply.\">5<\/a>].<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">At this point, you may want to pause to consider some questions. Do you think these studies represent valid tests of the effects of reminders? Do you think it makes sense to average these studies together? What would that average tell you?<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><u>Food Choices<br \/>\n<\/u><\/strong>In this section, we look at studies investigating nudges of food choices, because the abstract identified those as having the biggest effect (an average of d = .65). Specifically, the abstract reports that \u201cFood choices are particularly responsive to choice architecture interventions, with effect sizes up to 2.5 times larger than those in other behavioral domains.\u201d But does it make sense to average across studies investigating food nudges?<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><em>The Smallest Food Choice Effect: Location, Location, Location (of the Risotto Primavera)<br \/>\n<\/em><\/strong>Although the authors report that nudges targeting food choices are the most effective, some of the food nudges in the meta-analysis did not work. Indeed, the least successful food nudge involves a moderately <em>negative effect<\/em>, indicating that the nudge <em>backfired <\/em>to the tune of d = -.24. In other words, the manipulation caused a nontrivial effect in <em>the wrong direction<\/em>. Let\u2019s try to understand what happened.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">This study came from an article published in 2018 in the journal <em>Appetite <\/em>(.<a href=\"https:\/\/doi.org\/10.1016\/j.appet.2018.02.006\">htm<\/a>). In that study, 750 participants on Prolific were asked to \u201cimagine they were catching up with a friend in a nice restaurant during the week\u201d (p. 193), and to select from a menu that contained 2 vegetarian and 6 non-vegetarian dinner options. The dependent variable was whether they selected a vegetarian option.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The researchers randomly assigned participants to one of four variations of the menu. There was a \u201cControl Menu\u201d and three modified menus. The negative effect at issue here involved a comparison between the \u201cControl Menu\u201d and a \u201cVegetarian Menu\u201d, for which \u201cthe vegetarian options are placed in a separate section of the menu\u201d (p. 192). Let\u2019s look at those two menus (the yellow highlighting is ours):<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-6959 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/Menu.png\" alt=\"\" width=\"515\" height=\"360\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/Menu.png 515w, https:\/\/datacolada.org\/wp-content\/uploads\/Menu-300x210.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The <em>Appetite <\/em>authors included this manipulation because they noticed that some restaurants group all vegetarian options together, and the authors reportedly had the intuition that this may <em>decrease <\/em>demand for vegetarian options [<a href=\"#footnote_5_6954\" id=\"identifier_5_6954\" class=\"footnote-link footnote-identifier-link\" title=\"By &ldquo;signaling that this section [of the menu] is not for the non-vegetarians&rdquo; (p. 192).\">6<\/a>]. Since what they observed was what they expected, it is not obvious that the effect here should be negative (d = -.24) rather than positive (d = <strong>+<\/strong>.24)\u2026 [<a href=\"#footnote_6_6954\" id=\"identifier_6_6954\" class=\"footnote-link footnote-identifier-link\" title=\"For example, imagine you have a control condition in which it is easy to engage in a behavior and then an experimental condition in which it is harder to engage in the behavior. If making things harder reduces the behavior, that should probably be coded as a positive effect rather than as a negative effect.\">7<\/a>].<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">(As a separate issue, there is a confound in the design here. See footnote: [<a href=\"#footnote_7_6954\" id=\"identifier_7_6954\" class=\"footnote-link footnote-identifier-link\" title=\"In the Vegetarian Menu condition, both vegetarian options are listed last, whereas in the Control Menu one of the vegetarian options is listed first. Thus, this finding may reflect an effect of presentation order rather than an effect of signaling. We learned from the second author that in a subsequent pre-registered study (.htm), published after the meta-analysts collected their data, they eliminated this confound (by counterbalancing order) obtainig a significant effect in the same direction.\">8<\/a>].)<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Regardless, what this study finds is that altering the menu in this particular way produced a small(ish) (positive or negative) effect on food choices.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><em>The Largest Food Choice Effect: Eating More When There Is More To Eat<br \/>\n<\/em><\/strong>The biggest positive effect of a nudge on food choices comes from a 2004 paper published in <em>Obesity Research <\/em>(.<a href=\"https:\/\/onlinelibrary.wiley.com\/doi\/full\/10.1038\/oby.2004.64\">htm<\/a>). This study, which was conducted at \u201ca public cafeteria-style restaurant on a university campus\u201d, asked the question: Do people eat more food when they are given more food?<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">As the authors report, \u201cOn 10 days over 5 months, we covertly recorded the food intake of customers who purchased a baked pasta entr\u00e9e from a serving line at lunch. On 5 of the days, the portion size of the entr\u00e9e was the standard (100%) portion, and on 5 different days, the size was increased to 150% of the standard portion.\u201d The customers and those who worked in the cafeteria did not know the portions were varied, and the price was the same regardless of portion size. The finding was, um, not subtle: People ate more calories of pasta when they were given more pasta, d = 3.08.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Imagine that these were the only two food choice studies in the meta-analysis. Do you think it makes sense to average across them, and to conclude that the average effect of nudges on food choices is d = (3.08-.24)\/2 = 1.42?<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Conclusion #1: Meaningless Means<br \/>\n<\/strong>Imagine someone tells you that they are planning to implement a reminder or a food-domain nudge, and then asks you to forecast how effective it would be, would you just say d = .29 or d = .65 and walk away? Of course not. What you\u2019d do is to start asking questions. Things like, \u201cWhat kind of nudge are you going to use?\u201d and \u201cWhat is the behavior you are trying to nudge?\u201d and \u201cWhat is the context?\u201d Basically, you\u2019d say, \u201cTell me <em>exactly<\/em> what you are planning to do.\u201d And based on your knowledge of the literature, as well as logic and experience, you\u2019d say something like, \u201cWell, the best studies investigating the effects of text message reminders show a positive but small effect\u201d or \u201cGiving people a lot less to eat is likely to get them to eat a lot less, at least in the near term.\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">What you wouldn\u2019t do is consult an average of a bunch of different effects involving a bunch of different manipulations and a bunch of different measures and a bunch of a different contexts. And why not? Because that average is meaningless.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Conclusion #2: What Is \u201cThe\u201d Effect of Nudges?<br \/>\n<\/strong>What are the implications for the debate about the effect size of nudges?<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">In the commentary claiming, in its title, that there is \u201cno evidence\u201d for the effectiveness for nudges, the authors do acknowledge that \u201csome nudges might be effective\u201d, but mostly they emphasize that after correcting for publication bias the average effect of nudging on behavior is indistinguishable from zero. Really? Surely, many of the things we call \u201cnudges\u201d have large effects on behavior. Giving people more to eat can exert a gigantic influence on how much they eat, just as interventions that make it easier to perform a behavior often make people much more likely to engage in that behavior. For example, people are much more likely to be organ donors when they are defaulted into becoming organ donors [<a href=\"#footnote_8_6954\" id=\"identifier_8_6954\" class=\"footnote-link footnote-identifier-link\" title=\"Another example from a JAMA paper not in the meta-analysis (.htm): defaults have large effects on doctors prescribing generic drugs.\">9<\/a>]. For the average effect to be zero, you\u2019d either need to dispute the reality of these effects, or you\u2019d need to observe nudges that backfire in a way that offsets that effects. This doesn\u2019t seem plausible to us.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The commentary authors also claim that nudges occurring within the finance domain have an effect size of <em>zero <\/em>and that there <em>is no heterogeneity within this domain<\/em>. Think about what this means. It means that any nudge that tries to influence financial decisions must have an effect size of zero. So a nudge that defaults people into a particular kind of auto insurance (J Risk Unc. 1993; <a href=\"https:\/\/link.springer.com\/article\/10.1007\/BF01065313\">.htm<\/a>) has the same true effect of zero(!) as a nudge that reminds people that they wrote an honor code to pay off a loan that they were already defaulted into re-paying (JEBO 2017, <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0167268117301002\">.htm<\/a>). Again, this doesn\u2019t seem plausible. Much more plausible is that those publication bias corrections don\u2019t work (see DataColada [<a class=\"broken_link\" href=\"https:\/\/datacolada.org\/30\/\">30<\/a>],[<a href=\"https:\/\/datacolada.org\/58\" target=\"_blank\" rel=\"noopener\">58<\/a>],[<a href=\"https:\/\/datacolada.org\/59\" target=\"_blank\" rel=\"noopener\">59<\/a>])\u00a0and that some nudges work more than others [<a href=\"#footnote_9_6954\" id=\"identifier_9_6954\" class=\"footnote-link footnote-identifier-link\" title=\"This part of our conclusion aligns with a statement in the commentary from Szaszi et al: &ldquo;instead of focusing on average effects, we need to understand when and where some nudges have huge positive effects and why others are not able to repeat those successes.&rdquo;\">10<\/a>].<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">It is also worth thinking about publication bias in this domain. Yes, you will get traditional publication bias, where researchers fail to report analyses and studies that find no effect of (usually subtle) nudges. But you could imagine also getting an opposite form of publication bias, where researchers only study or publish nudges that have surprising effects, because the effects of large nudges are too obvious. For example, many researchers may not run studies investigating the effects of defaults on behavior, because those effects are already known.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">In sum, we believe that many nudges undoubtedly exert real and meaningful effects on behavior. But you won\u2019t learn that \u2013 or which ones \u2013 by computing a bunch of averages, or adjusting those averages for publication bias. Instead, you have to read the studies and do some thinking.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Coda<br \/>\n<\/strong>In this post, we reviewed only four effects from a meta-analysis of 447, but you just need four to see the problems [<a href=\"#footnote_10_6954\" id=\"identifier_10_6954\" class=\"footnote-link footnote-identifier-link\" title=\"We did not cherry pick these studies. It takes a *long* time to do this, and we don&rsquo;t have lots of time. We also didn&rsquo;t want to make this post longer than it is. We invite interested readers to conduct their own audit(s) of this or other meta-analyses. At least in cases where the meta-analyst is averaging across very different kinds of studies, we&rsquo;d be very surprised if you don&rsquo;t uncover the same problems that we document here.\">11<\/a>]. In some cases, the studies are not valid tests of the research question. Moreover, regardless of their validity, averaging across them generates a number that has no discernible meaning.<\/span><\/p>\n<p><span style=\"font-family: helvetica, arial, sans-serif;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #0000ff; font-family: helvetica, arial, sans-serif;\"><strong>Author feedback:<br \/>\n<\/strong>Our policy (.<a href=\"https:\/\/datacolada.org\/feedback_policy\">htm<\/a>) is to share drafts of blog posts with authors whose work we discuss, in order to solicit suggestions for things we should change prior to posting, and to invite them to write a response that we link to at the end of the post. We contacted (i) the authors of the PNAS meta-analysis, (ii) those of the two PNAS letters, and of the individual studies we use as examples.<br \/>\n<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 12pt; font-family: helvetica, arial, sans-serif; color: blue;\">1) The authors of the meta-analysis provided valuable feedback that helped us improve the accuracy of this post. <\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 11pt; font-family: helvetica, arial, sans-serif; color: blue;\"><span style=\"font-size: 12pt;\">2) Sz\u00e1szi et al., authors of the commentary entitled, \u201cNo reason to expect large and consistent effects of nudge interventions,\u201d provided the following response: (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/Szaszi-et-al.-Response-To-Colada-105.pdf\">.pdf<\/a>). Andrew Gelman, another author of that commentary, wrote the following response: (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/Gelman-Response-To-Colada-105.pdf\">.pdf<\/a>).\u00a0<\/span> <\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><span style=\"color: #0000ff;\">3) Max Maier, an author of the commentary entitled, \u201cNo evidence for nudging after adjusting for publication bias,\u201d provided a 180-word response<span style=\"color: #0000ff;\"> (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/Maier-et-al.-Response-To-Colada-105.pdf\">.pdf<\/a>), as <\/span>well as <\/span><span style=\"color: #0000ff;\">a link to a longer blog post they wrote in response: (<a href=\"https:\/\/www.bayesianspectacles.org\/rejoinder-no-evidence-for-nudging\/\">.htm<\/a>). <\/span><span style=\"color: #0000ff;\">We believe their response<\/span> <span style=\"color: #0000ff;\">incorrectly equates a meaningless mean &#8211; what <em>we<\/em> are worried about &#8211; with the mean of a variable with high variance (what meta-analysts' tools model). To see the difference between meaningless means and heterogeneous means, imagine computing the average height of three structures in a city block. One structure is 2 stories high, another 2 meters high, and a third is 2 Taylor Swifts high. The average structure is 2 units high, SD=0. That mean is homogeneous but meaningless. In contrast, the average height of all members of Taylor Swift's family, measured in inches, is meaningful but heterogeneous.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #0000ff; font-family: helvetica, arial, sans-serif;\">4) Armando Perez-Cueto, author of the \u201cDish-of-the-day\u201d study, provided helpful feedback on the post and helpful details about that study. Dario Krpan, author of the Vegetarian menus study, provided helpful feedback on the post. And Barbara Rolls, author of the portion size study, provided helpful feedback on the post as well as this response:<\/span><\/p>\n<blockquote><p><span style=\"color: #0000ff; font-family: helvetica, arial, sans-serif;\"><em>My lab does not work on nudging so I am surprised that our careful demonstration that portion size affects intake outside a lab would be included in such a meta-analysis. There was no nudging involved in the study. Covertly in the context of a cafeteria, the pasta dish was either bigger or smaller.\u00a0 There was no choice except to order pasta rather than something else, and there was no nudge.\u00a0<\/em><\/span><\/p>\n<p><span style=\"color: #0000ff; font-family: helvetica, arial, sans-serif;\"><em>Like you, I share concerns about such meta-analyses and systematic reviews. Prescriptions of what is to be included miss the nuances and do not adequately account for study quality (or in this case even relevance to the question under review).\u00a0<\/em><\/span><\/p><\/blockquote>\n<p><span style=\"color: #0000ff; font-family: helvetica, arial, sans-serif;\">No other authors responded to our request for feedback.<\/span><\/p>\n<hr \/>\n<p><span style=\"font-family: helvetica, arial, sans-serif;\"><div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/6954\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"66979751d6\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/6954\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div><strong>Footnotes<\/strong><\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_6954\" class=\"footnote\">This definition comes from a taxonomy developed by M\u00fcnscher et al. (2016, <a href=\"https:\/\/onlinelibrary.wiley.com\/doi\/full\/10.1002\/bdm.1897\">.htm<\/a>). [<a href=\"#identifier_0_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_6954\" class=\"footnote\">The original authors randomly assigned time slots, not individuals, to one of the foodservice conditions. There were very few timeslots (e.g., two time slots at three different schools in Denmark), and one cannot confidently expect the differences between, say, students who eat in an earlier time slot vs. those who eat in a later time slot, to even out over such a small number of time slots (e.g., maybe 2 out of 3 earlier time slots were assigned to the dish-of-the-day condition, and those who attend earlier time slots are less likely to prefer veggie balls). In addition, there is a problem of non-independence of observations within each time slot; students may be more likely to choose veggie balls if another student in their time slot chooses veggie balls. In general, we do not believe any causal inferences from this study are justified. [<a href=\"#identifier_1_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_2_6954\" class=\"footnote\">When you pool across countries (N = 362) \u2013 and there may be good reasons <span style=\"text-decoration: underline;\">not<\/span> to do this (e.g., if the intervention or setting was meaningfully different in the four countries) \u2013 the effect is directionally positive: 15.5% choosing the vegetarian option in the intervention condition vs. 12.7% in the control condition. [<a href=\"#identifier_2_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_3_6954\" class=\"footnote\">It is worth noting that the original authors published this result despite finding no effect of the dish-of-the-day intervention. Publishing null results like this is not typical. [<a href=\"#identifier_3_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_4_6954\" class=\"footnote\">The original authors did not directly report the results for the subsample of minority students. The meta-analysts told us that they contacted the original authors to try to get these results and did not receive a reply. [<a href=\"#identifier_4_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_5_6954\" class=\"footnote\">By \u201csignaling that this section [of the menu] is not for the non-vegetarians\u201d (p. 192). [<a href=\"#identifier_5_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_6_6954\" class=\"footnote\">For example, imagine you have a control condition in which it is easy to engage in a behavior and then an experimental condition in which it is harder to engage in the behavior. If making things harder reduces the behavior, that should probably be coded as a positive effect rather than as a negative effect. [<a href=\"#identifier_6_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_7_6954\" class=\"footnote\">In the Vegetarian Menu condition, both vegetarian options are listed last, whereas in the Control Menu one of the vegetarian options is listed first. Thus, this finding may reflect an effect of presentation order rather than an effect of signaling. We learned from the second author that in a subsequent pre-registered study (<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0272494419308011\">.htm)<\/a>, published after the meta-analysts collected their data, they eliminated this confound (by counterbalancing order) obtainig a significant effect in the same direction. [<a href=\"#identifier_7_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_8_6954\" class=\"footnote\">Another example from a JAMA paper not in the meta-analysis (<a href=\"https:\/\/jamanetwork.com\/journals\/jamainternalmedicine\/article-abstract\/2520677\">.htm<\/a>): defaults have large effects on doctors prescribing generic drugs. [<a href=\"#identifier_8_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_9_6954\" class=\"footnote\">This part of our conclusion aligns with a statement in the commentary from Szaszi et al: \u201cinstead of focusing on average effects, we need to understand when and where some nudges have huge positive effects and why others are not able to repeat those successes.\u201d [<a href=\"#identifier_9_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_10_6954\" class=\"footnote\">We did not cherry pick these studies. It takes a *long* time to do this, and we don\u2019t have lots of time. We also didn\u2019t want to make this post longer than it is. We invite interested readers to conduct their own audit(s) of this or other meta-analyses. At least in cases where the meta-analyst is averaging across very different kinds of studies, we\u2019d be very surprised if you don\u2019t uncover the same problems that we document here. [<a href=\"#identifier_10_6954\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This post is the second in a series (see its introduction: htm) arguing that meta-analytic means are often meaningless, because (1) they include results from invalid tests of the research question of interest to the meta-analyst, and (2) they average across fundamentally incommensurable results. In this post we focus primarily on problem (2), though problem&#8230;<\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[75],"tags":[],"class_list":["post-6954","post","type-post","status-publish","format-standard","hentry","category-meta-analysis"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/6954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=6954"}],"version-history":[{"count":6,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/6954\/revisions"}],"predecessor-version":[{"id":7169,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/6954\/revisions\/7169"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=6954"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=6954"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=6954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}