{"id":586,"date":"2014-06-04T07:00:41","date_gmt":"2014-06-04T11:00:41","guid":{"rendered":"http:\/\/datacolada.org\/?p=586"},"modified":"2020-02-11T16:24:23","modified_gmt":"2020-02-11T21:24:23","slug":"23-ceiling-effects-and-replications","status":"publish","type":"post","link":"https:\/\/datacolada.org\/23","title":{"rendered":"[23] Ceiling Effects and Replications"},"content":{"rendered":"<p>A recent failure to replicate led to an attention-grabbing debate in psychology.<\/p>\n<p>As\u00a0you may expect from university professors, some of it\u00a0involved data.\u00a0 As you may not expect from university professors, much of it involved saying mean\u00a0things that would get a child sent\u00a0to the principal's office (.<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/gilbertTweets.pdf\">pdf<\/a>).<\/p>\n<p>The hostility in the debate has obscured an interesting empirical question. This post aims to\u00a0answer that interesting empirical question. [<a href=\"#footnote_0_586\" id=\"identifier_0_586\" class=\"footnote-link footnote-identifier-link\" title=\"This blogpost was drafted&nbsp;on Thursday May 29th and was sent to original and replication authors for feedback, offering also an opportunity to comment. The dialogue with Simone Schnall lasted until June 3rd, which is why it appears only today. In the interim Tal Yarkoni&nbsp;and Yoel Inbar, among others, posted their own independent analyses.\">1<\/a>]\n<p><strong>Ceiling effect<br \/>\n<\/strong>The replication (<a href=\"https:\/\/psycnet.apa.org\/fulltext\/2014-20922-011.html\">.html<\/a>) was pre-registered; it was evaluated and approved by peers, including the original authors, before being run. The predicted effect was not obtained, in two separate replication studies.<\/p>\n<p>The sole issue of contention regarding the data (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Data-Both-Experiments-Morality-and-cleanliness.xlsx\">.xlsx<\/a>), is that nearly twice as many\u00a0respondents gave the highest possible answer in the replication as in the original study (about 41% vs about 23%). \u00a0In a forthcoming commentary (<a href=\"https:\/\/psycnet.apa.org\/record\/2014-38072-009\">.html<\/a>), the original author proposes a \"<em>ceiling effect\" <\/em>explanation: it is hard to increase something that is already very high.<\/p>\n<p>I re-analyzed the original and replication data to assess this sensible\u00a0concern.<br \/>\n<span style=\"color: blue; font-size: 13px;\">My read is that the evidence is greatly inconsistent with the<\/span><em style=\"color: blue; font-size: 13px;\"> ceiling effect<\/em><span style=\"color: blue; font-size: 13px;\">\u00a0explanation.<\/span><\/p>\n<p><strong>The experiments<br \/>\n<\/strong>In the original paper (<a href=\"https:\/\/journals.sagepub.com\/doi\/10.1111\/j.1467-9280.2008.02227.x\">.html<\/a>), participants rated six \"dilemmas\" involving moral judgments (e.g., <em>How wrong is it to keep money found in a lost wallet?<\/em>). These judgments were predicted to become less harsh for\u00a0people primed with cleanliness (Study 1) or who just washed their hands (Study 2).<\/p>\n<p><strong>The new analysis<br \/>\n<\/strong>In a paper with\u00a0Joe and Leif (<a href=\"http:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=2351926\">SSRN<\/a>),\u00a0we showed\u00a0that a prominent failure to replicate in economics was invalidated by a ceiling effect. I use the same key analysis here. [<a href=\"#footnote_1_586\" id=\"identifier_1_586\" class=\"footnote-link footnote-identifier-link\" title=\"Actually, in that paper it was a&nbsp;floor effect\">2<\/a>]\n<p>It consists of going beyond comparing means, examining instead all observations.The stylized figures below give the intuition. They plot the cumulative percentage of observations for each value of the dependent variable.<\/p>\n<p>The first shows an effect across the board: there is a gap between the curves throughout.<br \/>\nThe third shows the absence of an effect: the curves perfectly overlap.<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Example-Figure.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-601\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Example-Figure.png\" alt=\"Example Figure\" width=\"933\" height=\"234\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Example-Figure.png 1251w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Example-Figure-300x75.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Example-Figure-1024x257.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Example-Figure-900x225.png 900w\" sizes=\"auto, (max-width: 933px) 100vw, 933px\" \/><\/a>The middle figure captures what a ceiling effect looks like. All\u00a0values above 2 were brought down to 2 so the lines overlap there, but below the ceiling the gap is still easy to notice.<\/p>\n<p>Let's now look at real data. Study 1 first: [<a href=\"#footnote_2_586\" id=\"identifier_2_586\" class=\"footnote-link footnote-identifier-link\" title=\"The x-axis on these graphs had a typo that we were alerted to by Alex Perrone in August, 2014. The current version is correct\">3<\/a>]\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Ori1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-714 size-medium\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Ori1-300x218.png\" alt=\"Ori1\" width=\"300\" height=\"218\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Ori1-300x218.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Ori1.png 474w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>\u00a0 <a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Rep1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-715 size-medium\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Rep1-300x218.png\" alt=\"Rep1\" width=\"300\" height=\"218\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Rep1-300x218.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Rep1.png 474w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\nIt is easy to spot the effect in the original data.<br \/>\nIt is just as easy\u00a0to spot the absence of an effect in the replication.<\/p>\n<p>Study 2 is more compelling,<br \/>\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Ori22.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-604\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Ori22.png\" alt=\"Ori2\" width=\"311\" height=\"225\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Ori22.png 522w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Ori22-300x217.png 300w\" sizes=\"auto, (max-width: 311px) 100vw, 311px\" \/><\/a>\u00a0<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Rep22.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-603\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Rep22.png\" alt=\"Rep2\" width=\"311\" height=\"225\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Rep22.png 522w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/05\/Rep22-300x217.png 300w\" sizes=\"auto, (max-width: 311px) 100vw, 311px\" \/><\/a><\/p>\n<p>In the Original the effect is largest\u00a0in the 4-6 range.\u00a0In the Replication about 60% of the data is in that range, far from the ceiling of 7. But still there is no gap between the lines.<\/p>\n<p><strong>Ceiling analysis by original author<br \/>\n<\/strong>In her forthcoming commentary (<a href=\"https:\/\/psycnet.apa.org\/fulltext\/2014-38072-008.pdf\">.pdf<\/a>), effect size is computed as a <em>percentage <\/em>and shown\u00a0to be smaller in scenarios\u00a0with higher baseline levels\u00a0(see her <a href=\"https:\/\/datacolada.org\/?attachment_id=602#main\">Figure 1<\/a>). This is interpreted as evidence of a ceiling effect.<br \/>\n<span style=\"color: #3366ff;\">I don't think that's right.<\/span><\/p>\n<p>Dividing something by increasingly larger numbers leads to increasingly smaller ratios, with or without a ceiling. Imagine\u00a0the effect were\u00a0constant, completely unaffected by ceiling effects. Say a 1 point increase in the morality scale in <em>every<\/em> scenario. This constant effect would be a smaller % in scenarios with a larger baseline; going from 2 to 3 is a 50% increase, whereas going from 9 to 10 only 11%. [<a href=\"#footnote_3_586\" id=\"identifier_3_586\" class=\"footnote-link footnote-identifier-link\" title=\"She actually divides by the share of observations at ceiling, but the same intuition and arithmetic apply.\">4<\/a>]\n<p>If a store-owner gives you $5 off any item, buying\u00a0a $25 calculator gets you a 20% discount, buying\u00a0a $100 jacket gets you only a 5% discount. But there is no ceiling, you are getting $5 in both cases.<\/p>\n<p>To eliminate the arithmetic confound, I redid this\u00a0analysis with effect size defined as the difference of means, rather than %, and there was no association between effect size and share of answers at boundary across scenarios\u00a0(see calculations,\u00a0<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/06\/Schnell-analysis-with-absolute-effect-size.xlsx\">.xlsx<\/a>).<\/p>\n<p><strong>Ceiling analysis by replicators<br \/>\n<\/strong>In their rejoinder (<a href=\"https:\/\/osf.io\/5mi8t\/\">.pdf<\/a>), the replicators counter by dropping all observations at the ceiling and showing the results are still not significant.<br \/>\n<span style=\"color: #3366ff; font-size: 13px;\">I don't think that's right either.<\/span><\/p>\n<p><span style=\"font-size: 13px;\">Dropping observations at\u00a0the boundary lowers power whether there is a ceiling effect or not, <em>by a lot<\/em>. \u00a0In simulations, I saw drops of 30% and more, say from\u00a050% to 20% power (<a href=\"http:\/\/urisohn.com\/sohn_files\/BlogAppendix\/Colada23.CeilingEffects.R\">R Code<\/a>). So not getting an effect this way does not support the absence of a ceiling effect problem.<\/span><\/p>\n<p><strong>Tobit<br \/>\n<\/strong>To formally take ceiling effects into account one can use the Tobit model (common in economics for censored data, see <a href=\"http:\/\/en.wikipedia.org\/wiki\/Tobit_model\">Wikipedia<\/a>). A feature of this approach is that it allows analyzing the data at the scenario level, where the ceiling effect would actually be happening. I run Tobits on all datasets. The replications still had tiny effect sizes (&lt;1\/20th size of original), with p-values&gt;.8 (<a href=\"http:\/\/urisohn.com\/sohn_files\/BlogAppendix\/Colada23\/\">STATA code<\/a>). [<a href=\"#footnote_4_586\" id=\"identifier_4_586\" class=\"footnote-link footnote-identifier-link\" title=\"I treat the experiment as nested, with 6 repeated-measures for each participant, one per scenario\">5<\/a>]\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/p>\n<p><strong>Authors' response<br \/>\n<\/strong>Our policy at DataColada is to give drafts of our post to authors whose work we cover before posting, asking for feedback and providing an opportunity to comment. This causes delays (see footnote 1) but avoids misunderstandings.<\/p>\n<p><span style=\"color: #000000;\">The replication authors, <a href=\"https:\/\/web.archive.org\/web\/20170928134923\/https:\/\/psychology.msu.edu\/people\/faculty\/donnel59\">Brent Donnellan<\/a>, <a href=\"https:\/\/web.archive.org\/web\/20150201000000*\/http:\/\/www.spsp.org\/members\/?id=16632735\">Felix Cheung<\/a> and David Johnson suggested minor modifications to analyses and writing. They are reflected in the version you just read.<\/span><\/p>\n<p>The original author, <a href=\"http:\/\/www.psychol.cam.ac.uk\/directory\/ss877@cam.ac.uk\">Simone Schnall<\/a>, suggested a few edits also, and\u00a0asked me to include this comment from her:<\/p>\n<blockquote><p>Your analysis still does not acknowledge the key fact: There are significantly more extreme scores in the replication data (38.5% in Study 1, and 44.0% in Study 2) than in the original data. The Tobin analysis is a model-based calculation and makes certain assumptions; it is not based on the empirical data. In the presence of so many extreme scores a null result remains inconclusive.<\/p>\n<p>&nbsp;<\/p><\/blockquote>\n<div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/586\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"8c0b7962b8\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/586\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_586\" class=\"footnote\">This blogpost was drafted\u00a0on Thursday May 29th and was sent to original and replication authors for feedback, offering also an opportunity to comment. The dialogue with Simone Schnall lasted until June 3rd, which is why it appears only today. In the interim <a href=\"http:\/\/www.talyarkoni.org\/blog\/2014\/06\/01\/there-is-no-ceiling-effect-in-johnson-cheung-donnellan-2014\/\">Tal Yarkoni<\/a>\u00a0and <a href=\"http:\/\/yorl.tumblr.com\/post\/87428392426\/ceiling-effects\">Yoel Inbar<\/a>, among others, posted their own independent analyses. [<a href=\"#identifier_0_586\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_586\" class=\"footnote\">Actually, in that paper it was a\u00a0<em>floor<\/em> effect [<a href=\"#identifier_1_586\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_2_586\" class=\"footnote\">The x-axis on these graphs had a typo that we were alerted to by Alex Perrone in August, 2014. The current version is correct [<a href=\"#identifier_2_586\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_3_586\" class=\"footnote\">She actually divides by the share of observations at ceiling, but the same intuition and arithmetic apply. [<a href=\"#identifier_3_586\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_4_586\" class=\"footnote\">I treat the experiment as nested, with 6 repeated-measures for each participant, one per scenario [<a href=\"#identifier_4_586\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>A recent failure to replicate led to an attention-grabbing debate in psychology. As\u00a0you may expect from university professors, some of it\u00a0involved data.\u00a0 As you may not expect from university professors, much of it involved saying mean\u00a0things that would get a child sent\u00a0to the principal's office (.pdf). The hostility in the debate has obscured an interesting&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[18],"tags":[],"class_list":["post-586","post","type-post","status-publish","format-standard","hentry","category-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=586"}],"version-history":[{"count":6,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/586\/revisions"}],"predecessor-version":[{"id":4762,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/586\/revisions\/4762"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=586"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=586"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}