{"id":1038,"date":"2015-10-28T07:00:45","date_gmt":"2015-10-28T11:00:45","guid":{"rendered":"http:\/\/datacolada.org\/?p=1038"},"modified":"2020-02-11T22:55:05","modified_gmt":"2020-02-12T03:55:05","slug":"42_accepting_the_null","status":"publish","type":"post","link":"https:\/\/datacolada.org\/42","title":{"rendered":"[42] Accepting the Null: Where to Draw the Line?"},"content":{"rendered":"<p>We typically ask if an effect exists.\u00a0 But sometimes we want to ask if it does not.<\/p>\n<p>For example, how many of the \u201cfailed\u201d replications in the recent reproducibility project published in <em>Science<\/em> (.<a href=\"http:\/\/web.archive.org\/web\/20151026060815\/http:\/\/etiennelebel.com\/documents\/osc(2015,science).pdf\">pdf<\/a>) suggest the absence of an effect?<\/p>\n<p>Data have noise, so we can never say \u2018the effect is<em> exactly <\/em>zero.\u2019 \u00a0We can only say \u2018the effect is <em>basically<\/em> zero.\u2019 What we do is\u00a0draw a line close to zero and if we are confident the effect is below the line, we accept the null.<br \/>\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/Pic-accept-null-whiteboard-3.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1039\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/Pic-accept-null-whiteboard-3.jpg\" alt=\"Drawing on whiteboard with confidence intervals that do and do not include the line\" width=\"1280\" height=\"720\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/Pic-accept-null-whiteboard-3.jpg 1280w, https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/Pic-accept-null-whiteboard-3-300x169.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/Pic-accept-null-whiteboard-3-1024x576.jpg 1024w\" sizes=\"auto, (max-width: 1280px) 100vw, 1280px\" \/><\/a>We can draw the line via Bayes or via <em>p<\/em>-values, it does not matter very much.\u00a0The line is what really matters. How far from zero is it? What moves it up and down?<\/p>\n<p>In this post I describe 4 ways to draw the line, and then pit the top-2 against each other.<\/p>\n<p><strong>Way 1. Absolutely small<br \/>\n<\/strong>The oldest approach draws the line based on absolute size. Say, diets leading to losing less than 2 pounds have an effect of basically zero.\u00a0Economists do this often. For instance, a recent World Bank paper (.<a href=\"http:\/\/web.archive.org\/web\/20151026061223\/https:\/\/www.openknowledge.worldbank.org\/bitstream\/handle\/10986\/9348\/WPS6073.txt?sequence=2\">html<\/a>) reads<\/p>\n<blockquote><p><em>\u201cThe impact of financial literacy on the average remittance frequency has a 95 percent confidence interval [\u22124.3%, +2.5%] \u2026. We consider this a relatively <strong><u>precise zero<\/u><\/strong> effect, ruling out large positive or negative effects of training\u201d (emphasis added)<br \/>\n<\/em><em style=\"color: #999999; font-size: 8pt; line-height: 1.6em;\">(Dictionary note. Remittance: immigrants sending money home).<\/em><\/p><\/blockquote>\n<p>In much of behavioral science effects of <em>any<\/em> size can be of theoretical interest, and sample sizes are too small to obtain tight confidence intervals, making this approach unviable in principle and in practice [<a href=\"#footnote_0_1038\" id=\"identifier_0_1038\" class=\"footnote-link footnote-identifier-link\" title=\"e.g., we need n=1500 per cell to have a confidence interval entirely within d&lt;.1 and d&gt;-.1\">1<\/a>].<\/p>\n<p><strong>Way 2. Undetectably Small<br \/>\n<\/strong>In our first <em>p<\/em>-curve paper with Joe and Leif (<a href=\"http:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=2256237\">SSRN<\/a>), and in my \u201cSmall Telescopes\u201d paper on evaluating replications (<a href=\"http:\/\/urisohn.com\/sohn_files\/wp\/wordpress\/wp-content\/uploads\/2019\/01\/small-telescopes-detectability-published.pdf\">.pdf<\/a>), we draw the line based on detectability.<\/p>\n<p>We don\u2019t draw the line where we stop caring about effects.<br \/>\nWe draw the line where we stop being able to detect them.<\/p>\n<p>Say an original study with n=50 finds people can feel the future.\u00a0A replication with n=125 \u2018fails,\u2019 getting and effect estimate of d=0.01, <em>p<\/em>=.94.\u00a0Data are noisy, so the confidence interval goes all the way up to <em>d<\/em>=.2.\u00a0That\u2019s a respectably big feeling-the-future effect we are not ruling out. So we cannot say the effect is absolutely small.<br \/>\n<img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1042 size-large\" style=\"border: 1px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/example-1024x656.png\" alt=\"example\" width=\"550\" height=\"352\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/example-1024x656.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/example-300x192.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2015\/10\/example.png 1142w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><br \/>\nThe original study, with just n=50, however, is unable to detect that small an effect (it would have &lt;18% power). So we accept the null, the null that the effect is either zero, or undetectably small by existing studies.<\/p>\n<p><strong>Way 3. Smaller than expected <em>in general<br \/>\n<\/em><\/strong>Bayesian hypothesis testing runs a horse race between two hypotheses:<\/p>\n<p>Hypothesis 1 (null): \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0The effect is exactly zero.<br \/>\nHypothesis 2 (alternative): The effect is one of those moderately sized ones [<a href=\"#footnote_1_1038\" id=\"identifier_1_1038\" class=\"footnote-link footnote-identifier-link\" title=\"The tests don&rsquo;t formally assume the effects are moderately large, rather they assume distributions of effect size, say N(0,1). These distributions include tiny effects, even zero, but they also include very large effects, e.g., d&gt;1 as probable possibilities.&nbsp; It is hard to have intuitions for what assuming a distribution entails. So for brevity and clarity I just say they assume the effect is moderately large.\">2<\/a>].<\/p>\n<p>When data clearly favor 1 more than 2, we accept the null.\u00a0The bigger the effects Hypothesis 2 includes, the further from zero we draw the line, the more likely we accept the null [<a href=\"#footnote_2_1038\" id=\"identifier_2_1038\" class=\"footnote-link footnote-identifier-link\" title=\"Bayesians don&#039;t accept and reject hypotheses, instead, the evidence supports one or another hypothesis. I will use the term accept anyway.\">3<\/a>].<\/p>\n<p>The default Bayesian test, commonly used by Bayesian advocates in psychology, draws the line too far from zero (for my taste). Reasonably powered studies of moderately big effects wrongly accept the null of zero effect too often (see Colada[<a href=\"https:\/\/datacolada.org\/2015\/04\/09\/35-the-default-bayesian-test-is-prejudiced-against-small-effects\/\">35<\/a>]) [<a href=\"#footnote_3_1038\" id=\"identifier_3_1038\" class=\"footnote-link footnote-identifier-link\" title=\"This is fixable in principle, just define another alternative. If someone proposes a new Bayesian test, ask them &ldquo;what line around zero is it drawing?&rdquo;&nbsp; Even without understanding Bayesian statistics you can evaluate if you like the line the test generates or not.\">4<\/a>].<\/p>\n<p><strong>Way 4. Smaller than expected <em>this time<br \/>\n<\/em><\/strong>A new Bayesian approach to evaluate replications, by Verhagen and Wagenmakers (2014 .<a href=\"http:\/\/web.archive.org\/web\/20151026061603\/http:\/\/www.ejwagenmakers.com\/2014\/VerhagenWagenmakers2014.pdf\">pdf<\/a>), pits a different Hypothesis 2 against the null.\u00a0Its Hypothesis 2 is what a Bayesian observer would predict for the replication after seeing the Original (with some\u00a0assumed prior).<\/p>\n<p>Similar to Way 3 the bigger the effect seen in the original is, the bigger the effect we expect in the replication, and hence the further from zero we draw the line. Importantly, here the line moves based on what we observed in the original, not (only) on what we arbitrarily choose to consider reasonable to expect. The approach is the handsome cousin of testing if effect size differs between original and replication.<\/p>\n<p><strong>Small Telescope vs Expected <em>This Time<\/em> (Way 2 vs Way 4)<br \/>\n<\/strong>I compared the conclusions both approaches arrive at when applied to the 100 replications from that <em>Science<\/em> paper.\u00a0The results are similar but far from equal, <em>r<\/em> = .9 across all replications, and <em>r<\/em> = .72 among <em>n.s. <\/em>ones (<a href=\"http:\/\/urisohn.com\/sohn_files\/BlogAppendix\/Colada42\/%5b42%5d%20-%20Accepting%20the%20Null%20-%20R%20-%20Reproducibility%20project%20small%20telescope%20vs%20bayesian%202015%2010%2026.R\">R Code<\/a>). Focusing on situations where the two lead to opposite conclusions is useful to understand each better [<a href=\"#footnote_4_1038\" id=\"identifier_4_1038\" class=\"footnote-link footnote-identifier-link\" title=\"Alex Etz in a blogpost (.html) reported the Bayesian analysis of the 100 replications, I used some of his results here.\">5<\/a>]<span style=\"font-size: 13.3333px;\">,<\/span>[<a href=\"#footnote_5_1038\" id=\"identifier_5_1038\" class=\"footnote-link footnote-identifier-link\" title=\"These are the spearman correlation between the p-value testing the null that the original had at least 33% power, and Bayes Factor described above.\">6<\/a>].<\/p>\n<p>In Study 7 in the <em>Science<\/em> paper,<br \/>\nThe Original estimated a monstrous <em>d<\/em>=2.14 with N=99 participants total.<br \/>\nThe Replication estimated a small\u00a0\u00a0\u00a0 <em>d<\/em>=0.26, with a miniscule N=14.<\/p>\n<p>The Small Telescopes approach is irked by the small sample of the replication. Its wide\u00a0confidence interval includes effects as big as <em>d<\/em> =1.14, giving the original &gt;99% power. We cannot rule out detectable effects, the replication is <span style=\"color: #0000ff;\">inconclusive<\/span>.<\/p>\n<p>The Bayesian observer, in contrast, draw a line quite far from zero after seeing the massive Original effect size. The line, indeed is at a remarkable\u00a0<em>d<\/em>=.8. Replications with smaller effect size estimates, anything smaller than large, 'supports the null.' Because the replication is <em>d<\/em>=.26, it strongly\u00a0<span style=\"color: #ff0000;\">supports the null<\/span>.<\/p>\n<p>A hypothetical scenario where they disagree in the opposite direction (<a href=\"http:\/\/urisohn.com\/sohn_files\/BlogAppendix\/Colada42\/%5b42%5d%20-%20Accepting%20the%20Null%20-%20R%20-%20Example%202%20-%202015%2010%2026.R\">R Code<\/a>),<br \/>\nOriginal.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 N=40, \u00a0 \u00a0 \u00a0 d=.7<br \/>\nReplication.\u00a0 N=5000, d=.1<\/p>\n<p>The Small Telescopes approach asks if the replication rejects an effect big enough to be detectable by the original. Yes. <em>d<\/em>=.1 cannot be studied with N=40.<span style=\"color: #ff0000;\"> Null Accepted\u00a0<\/span>[<a href=\"#footnote_6_1038\" id=\"identifier_6_1038\" class=\"footnote-link footnote-identifier-link\" title=\"Technically it is the upper end of the confidence interval we consider when evaluating the power of the original sample, it goes up to d=.14, I used d=.1 to keep things simpler\">7<\/a>].<\/p>\n<p>Interestingly, that small N=40 pushes the Bayesian in the opposite direction. An original with N=40 changes very little her beliefs about the effect, so d=.1 in the replication is not that surprising \u00a0vs. the Original, but it is incompatible with d=0 given the large sample size, <span style=\"color: #0000ff;\">null rejected.<\/span><\/p>\n<p><span style=\"color: #993366;\">I find myself agreeing with the\u00a0Small Telescopes' line more than any other. But that's a matter of taste, not fact.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/p>\n<hr \/>\n<div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1038\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"0e7f1cad39\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/1038\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div>\n<span style=\"font-size: 10pt;\"><strong>Footnotes.<\/strong><\/span><\/p>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_1038\" class=\"footnote\">e.g., we need n=1500 per cell to have a confidence interval entirely within <em>d<\/em>&lt;.1 and d&gt;-.1 [<a href=\"#identifier_0_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_1038\" class=\"footnote\">The tests don\u2019t formally assume the effects are moderately large, rather they assume distributions of effect size, say N(0,1). These distributions include tiny effects, even zero, but they also include very large effects, e.g., d&gt;1 as probable possibilities.\u00a0 It is hard to have intuitions for what assuming a distribution entails. So for brevity and clarity I just say they assume the effect is moderately large. [<a href=\"#identifier_1_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_2_1038\" class=\"footnote\">Bayesians don't accept and reject hypotheses, instead, the evidence supports one or another hypothesis. I will use the term accept anyway. [<a href=\"#identifier_2_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_3_1038\" class=\"footnote\">This is fixable in principle, just define another alternative. If someone proposes a new Bayesian test, ask them \u201cwhat line around zero is it drawing?\u201d\u00a0 Even without understanding Bayesian statistics you can evaluate if you like the line the test generates or not. [<a href=\"#identifier_3_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_4_1038\" class=\"footnote\">Alex Etz in a blogpost (.<a href=\"http:\/\/alexanderetz.com\/2015\/08\/30\/the-bayesian-reproducibility-project\/\">html<\/a>) reported the Bayesian analysis of the 100 replications, I used some of his results here. [<a href=\"#identifier_4_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_5_1038\" class=\"footnote\">These are the spearman correlation between the <em>p<\/em>-value testing the null that the original had at least 33% power, and Bayes Factor described above. [<a href=\"#identifier_5_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_6_1038\" class=\"footnote\">Technically it is the upper end of the confidence interval we consider when evaluating the power of the original sample, it goes up to d=.14, I used d=.1 to keep things simpler [<a href=\"#identifier_6_1038\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>We typically ask if an effect exists.\u00a0 But sometimes we want to ask if it does not. For example, how many of the \u201cfailed\u201d replications in the recent reproducibility project published in Science (.pdf) suggest the absence of an effect? Data have noise, so we can never say \u2018the effect is exactly zero.\u2019 \u00a0We can&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[78,77],"tags":[],"class_list":["post-1038","post","type-post","status-publish","format-standard","hentry","category-bayes","category-hard_stats"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1038","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=1038"}],"version-history":[{"count":5,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1038\/revisions"}],"predecessor-version":[{"id":4793,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1038\/revisions\/4793"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=1038"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=1038"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=1038"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}