{"id":8206,"date":"2024-09-16T07:00:25","date_gmt":"2024-09-16T11:00:25","guid":{"rendered":"https:\/\/datacolada.org\/?p=8206"},"modified":"2024-09-16T09:22:15","modified_gmt":"2024-09-16T13:22:15","slug":"120-off-label-smirnov-how-many-subjects-show-an-effect-in-between-subject-experiments","status":"publish","type":"post","link":"https:\/\/datacolada.org\/120","title":{"rendered":"[120] Off-Label Smirnov: How Many Subjects Show an Effect in Between-Subjects Experiments?"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">There is a classic statistical test known as the Kolmogorov-Smirnov (KS) test (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Kolmogorov%E2%80%93Smirnov_test\" target=\"_blank\" rel=\"noopener\">Wikipedia<\/a>).<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">This post is about an off-label use of the KS-test that I don\u2019t think people know about (not even Kolmogorov or Smirnov), and which seems useful for experimentalists in behavioral science and beyond (most useful, I think, for clinical trials and field experiments of policies that could backfire on some people).<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">If I\u2019m wrong and the off-label use is known, well, that\u2019s a little embarrassing, but please let me know.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Like the t-test, the KS-test can be used to compare two samples<sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"1\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"The KS-test can also be used to compare one sample to a theoretical distribution, e.g., to assess if a sample is statistically significantly not-normally distributed. We relied on it to when assessing whether the Ariely insurance driving data were uniformly distributed, see footnote 5 in Colada[98]\u00a0 \"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-1\">1<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-1\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"1\">The KS-test can also be used to compare one sample to a theoretical distribution, e.g., to assess if a sample is statistically significantly not-normally distributed. We relied on it to when assessing whether the Ariely insurance driving data were uniformly distributed, see footnote 5 in <a href=\"https:\/\/datacolada.org\/98\" target=\"_blank\" rel=\"noopener\">Colada[98]<\/a>\u00a0 <\/span>.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">While the t-test involves only differences of <em>means, <\/em>the KS-test considers any type of difference across samples. So, like the t-test, the KS-test could be statistically significant because the sample means are quite different, but, unlike the t-test, the KS-test could also be statistically significant because the sample variances are quite different, or the share of observations that are exactly 0 are quite different, etc.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The KS test is seldom used, but when it is used, it is used as a a <em>test, <\/em>as a way to obtain a <em>p<\/em>-value quantifying whether things are significantly different across groups<sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"2\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"Or one group, see Footnote 1\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-2\">2<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-2\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"2\">Or one group, see Footnote 1<\/span>.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">But, there is an off-label use of the KS-test, one that does not use it as a test, but as a way to <em>estimate<\/em> something. Specifically, it can estimate the share of observations in a between-subjects design that are impacted by the manipulation. So, instead of asking \u201cIs the <em>average<\/em> of the dependent variable higher in A than in B?\u201d, we can ask \u201c<em>How many<\/em> people have a higher dependent variable in A than in B\u201d? Again. This is noteworthy because it involves <em><u>between<\/u><\/em> subjects design where people are only in A or only in B so we don't know if any given individual has a higher DV in A or in B <sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"3\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"Technically, the KS test does not estimate the share showing an effect, it bounds it, meaning it tells you its smallest possible value, rather than its expected value. More on this later.\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-3\">3<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-3\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"3\">Technically, the KS test does not estimate the share showing an effect, it bounds it, meaning it tells you its smallest possible value, rather than its expected value. More on this later.<\/span>.<\/span><\/p>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single.png\"><img decoding=\"async\" class=\"size-full wp-image-8207 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single.png\" alt=\"\" width=\"200\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single-300x300.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single-150x150.png 150w, https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single-768x768.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/Vodka-single-850x850.png 850w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>What does the KS-test do?<br \/>\n<\/strong><\/span><span style=\"font-family: helvetica, arial, sans-serif;\">The KS test compares the entire distribution of values between samples (the CDFs). It quantifies the biggest observed difference in those distributions. Let's illustrate with data from an old paper of mine (.<a href=\"http:\/\/urisohn.com\/sohn_files\/papers\/uncertainty_effect_published.pdf\" target=\"_blank\" rel=\"noopener\">pdf<\/a>) in which participants provided their valuation, in a between subject design, for either a $50 Gift Certificate, or for a 50:50 gamble that for sure paid either that $50 certificate, or a $100 one (people, puzzlingly, pay <em>less<\/em> for the lottery, the objectively superior alternative; this is known as the \u201cUncertainty Effect\u201d.<a href=\"https:\/\/academic.oup.com\/qje\/article-abstract\/121\/4\/1283\/1855222\" target=\"_blank\" rel=\"noopener\">htm<\/a>). The figure below has the CDFs. For example, if we were to draw a horizontal line (that unfortunately STATA didn't draw) at the 50<sup>th<\/sup> percentile, the medians are about $10 and $35. About half the people pay $10 or less for the lottery, and about half the people pay $35 or less for the $50 gift certificate.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-8208\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated.png\" alt=\"\" width=\"500\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated.png 1630w, https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated-300x215.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated-1024x733.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated-768x549.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated-1536x1099.png 1536w, https:\/\/datacolada.org\/wp-content\/uploads\/CDF-uncertainty-effect-annotated-850x608.png 850w\" sizes=\"(max-width: 1630px) 100vw, 1630px\" \/><\/a><br \/>\nFig 1.<\/strong> Distribution of valuations an the D+ and D- in a KS-test.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">OK, let's now get to the KS test. It computes the biggest positive and negative differences in these CDFs. The biggest positive difference is about 55%, seen around $10.\u00a0<\/span><span style=\"font-family: helvetica, arial, sans-serif;\">In the KS-test that's the <em><strong>D<sup>+<\/sup><\/strong><\/em>\u00a0(test) statistic, the biggest positive difference. <\/span><span style=\"font-family: helvetica, arial, sans-serif;\">There is also the D<sup>&#8211;<\/sup>, flagged here with the 2<sup>nd<\/sup> arrow; <em><strong>D<sup>&#8211; <\/sup><\/strong><\/em>= 18%.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Researchers and statisticians usually ignore those D+ and D- values, they usually do not even report them. But it is those D+ and D- that give us the off-label use. Those D values inform <\/span><span style=\"font-family: helvetica, arial, sans-serif;\">the share of people showing each effect. Returning to Figure 1 above, the Ds mean that <em>at least<\/em> 55% of participants pay more for the $50 gift certificate than for the lottery, and <em>at least<\/em> 18% of participants do the opposite. <\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"font-family: helvetica, arial, sans-serif; color: #0000ff;\">The KS-test off-label use involves using D+ and D- as bounds on the share of people impacted positively vs negatively by treatment.<\/span><\/strong><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">I should say I did not discover or prove this fact; a paper published in <em>Econometric Theory <\/em>in 2010<\/span><span style=\"font-family: helvetica, arial, sans-serif;\"> did. But the authors do not seem to have realized that that's what they proved (they do not mention the KS test in their paper). I wrote my personal journey to acquiring this tidbit of information, but it is long, so I put it behind the green button.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong><input type='hidden' bg_collapse_expand='69f1b8a9e00189054366694' value='69f1b8a9e00189054366694'><input type='hidden' id='bg-show-more-text-69f1b8a9e00189054366694' value='How did I learn this?'><input type='hidden' id='bg-show-less-text-69f1b8a9e00189054366694' value='close personal journey'><button id='bg-showmore-action-69f1b8a9e00189054366694' class='bg-showmore-plg-button bg-green-button  '   style=\" color:#4a4949;\">How did I learn this?<\/button><div id='bg-showmore-hidden-69f1b8a9e00189054366694' ><\/strong><\/span><\/p>\n<blockquote>\n<p style=\"text-align: justify\"><span style=\"font-family: helvetica, arial, sans-serif\">When working on the paper with the data for Figure 1, many years ago, I wanted to estimate how many participants paid more for the gift-certificate than the lottery, and how many did the opposite. But the data were between-subjects, any given participant only valued one of the two, so this was not possible. But I thought perhaps I could \u201cbound\u201d it, so that I could say \"at least x% of people paid more for the lottery, and at least y% paid less for it\". And I developed an algorithm that did just that. Details in footnote<sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"4\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"The algorithm involved graph theory and matching participants across conditions ('imagine subject 1 in the control is subject 17 in the treatment, would they have paid more or less for the lottery?), and doing that for every participant. It did not require actually doing all possible combinations, the algorithm sorted the data optimally and did it just once. \"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-4\">4<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-4\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"4\">The algorithm involved graph theory and matching participants across conditions ('imagine subject 1 in the control is subject 17 in the treatment, would they have paid more or less for the lottery?), and doing that for every participant. It did not require actually doing all possible combinations, the algorithm sorted the data optimally and did it just once. <\/span>. I then submitted the algorithm idea as a separate paper to the AER (an econ journal). A kind and wise reviewer pointed out that a paper had <em>just<\/em> been published in the journal \"<em>Econometric Theory<\/em>\" providing a more general (and less clunky, my words) solution to the same problem: bounding share of participants showing an effect in between subject designs. I obviously downloaded that paper immediately (.<a href=\"https:\/\/sci-hub.se\/10.1017\/S0266466609990168\">pdf<\/a>). It was too technical for me to follow in enough detail, but I determined the reviewer was right, and I dropped the project.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"font-family: helvetica, arial, sans-serif\">But, while scratching my head looking at that <em>Econometric Theory<\/em> paper, at \"lemma 2.1\" in particular, something caught my eye which appears to have missed the eyes of the authors of the paper (and anyone else who may have read it). Their formula to find bounds for participants being impacted positively and negatively by treatment, is the same formula as that of the D+ and D- in the KS test.<\/span><\/p>\n<p style=\"text-align: justify\"><span style=\"font-family: helvetica, arial, sans-serif\">Here is a visual approximation to what happened on my desk<sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"5\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"In Lemma 2.1 we need to make delta=0 to get the KS D+ and D- statistic. Delta is the size of the effect, so if delta=0, we are asking how many show a positive vs negative effect. If we make delta=1 we bound the share of participants showing an effect of at least $1, vs less than $1. It is easy to achieve that with the KS-test too, just add delta to the condition, and rerun the KS-test; that is, the KS test can also bound the share of people showing an effect of a particular size or larger \"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-5\">5<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-5\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"5\">In Lemma 2.1 we need to make delta=0 to get the KS D+ and D- statistic. Delta is the size of the effect, so if delta=0, we are asking how many show a positive vs negative effect. If we make delta=1 we bound the share of participants showing an effect of at least $1, vs less than $1. It is easy to achieve that with the KS-test too, just add delta to the condition, and rerun the KS-test; that is, the KS test can also bound the share of people showing an effect of a particular size or larger <\/span>:<\/span><\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-8209\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test.png\" alt=\"\" width=\"600\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test.png 1698w, https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test-300x167.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test-1024x569.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test-768x427.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test-1536x854.png 1536w, https:\/\/datacolada.org\/wp-content\/uploads\/callout-ks-test-850x473.png 850w\" sizes=\"(max-width: 1698px) 100vw, 1698px\" \/><\/a><\/p>\n<p style=\"text-align: justify\"><span style=\"font-family: helvetica, arial, sans-serif\">I thought \u201cone day I will write about this\u201d, and now, some 15 years later, is that day.<\/span><\/p>\n<\/blockquote>\n<p style=\"text-align: justify\"><span style=\"font-family: helvetica, arial, sans-serif\"><strong><\/div><\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>An intuition for this off-label use<br \/>\n<\/strong>The intuition behind the D+ and D- being bounds is actually straightforward, and in my nerdy opinion, intellectually stimulating. Let\u2019s do an example. If we start with a binary dependent variable things are easier.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Let\u2019s say we are studying whether people pay more for red or white wine. We run a study where people are offered either a red wine or a white wine, and are asked: \u201cWould you pay $24 for this bottle?\u201d<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-8210\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine.png\" alt=\"\" width=\"200\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine-300x300.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine-150x150.png 150w, https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine-768x768.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/evaluating-wine-850x850.png 850w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Imagine the share of \u2018yesses\u2019 is 72% for red and 42% for white.<br \/>\nHow many people pay more for red than for white? Well, we don\u2019t know, because for any person we only see their answer for <em>either<\/em> red or white. But we can bound it. <\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">The only fact to keep in mind is that we estimate a 30% change in the share of people who are willing to pay $24. It's useful to spell out what that 30% of people entails. Those are people that would pay less than $24 for white wine, but would pay more than $24 for red wine. For example someone who would pay $10 for white and $30 for red is there.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">It\u2019s <em>possible<\/em> that 100% of people pay more for red wine in general, but only 30% are in the sweetspot that crosses over $24. For example, if Marjorie pays $10 for White and $20 for red, she shows an effect, but not in our sample, as she would not buy either wine. If Marco pays $26 for White and $30 for red, same idea: exhibits an effect which we do not observe. Because some people exhibit an effect that's not observable, it's possible that 100% of people exhibit the effect.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">It\u2019s similarly <em>possible <\/em>that 75% of people exhibit an effect but we observe only 30%.<br \/>\n<\/span><span style=\"font-family: helvetica, arial, sans-serif;\">It\u2019s similarly <em>possible <\/em>that 50% of people exhibit an effect but we observe only 30%.<br \/>\n<\/span><span style=\"font-family: helvetica, arial, sans-serif;\">Many things are possible, but usefully, some are <em><span style=\"text-decoration: underline;\">impossible<\/span><\/em><sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"6\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"For ease of exposition I use the term possible\/impossible, but this is ignoring sampling error, so the statements are probabilistic, requiring statistical inference. One way of thinking about it is that D+ and D- are estimates with random error, so D+ is the estimated bound, with random error\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-6\">6<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-6\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"6\">For ease of exposition I use the term possible\/impossible, but this is ignoring sampling error, so the statements are probabilistic, requiring statistical inference. One way of thinking about it is that D+ and D- are estimates with random error, so D+ is the <em>estimated<\/em> bound, with random error<\/span>.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">For example, it is not possible that only 10% of people pay more for red wine. <\/span><span style=\"font-family: helvetica, arial, sans-serif;\">It\u2019s not possible because if only 10% of people paid more for red wine, then we would not see 30% more people paying more than $24 for red than for white, we would see at most a 10% increase, if <em>everyone<\/em> was in the sweetspot that values white &lt;$24 and red &gt;$24.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">This is why we can bound the share showing the effect at 30%.<br \/>\nIf 30% more people would pay $24 for red but not for white, then at least 30% of people would pay more for red wine. When you put it this way, it's kind of&#8230; &#8230;duh! (at least to someone who first thought of this 14 years ago).<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Alright, but what if the dependent variable is continuous though? Can we still bound it?<br \/>\nYes! And it's similarly intuitive. You <em>dichotomize<\/em> the DV, then do the same duh thing.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Say in our wine study we asked <em>how much <\/em>they would pay for either the red or the white wine, instead of asking only whether they would pay $24. So we get our data which is a bunch of dollar values instead of yesses and nos, some $29 here, some $7 there. How do we use dichotomization to find the bounds?<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">Say we dichotomize at $10. We compute how many dollar amounts are above and below that in the two conditions, and find that 80% of people pay $10 or more for red wine, and 55% for white wine. OK, so the duh thing tells us that then at least 25% of people pay more for red. Fine, but why $10? That was arbitrary. OK. We can try <em>every possible price<\/em>. Dichotomize at $1, then at $2, then at $3&#8230;. For each dichotomization you have one estimate, take the biggest of those numbers, and you have D+ from the KS test: the biggest vertical gap in the CDF. And hopefully you now see why that number bounds the share of people showing the effect, because my attempt at an explanation ends with this period.\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">This may feel <em>p<\/em>-hacky, the KS test is finding the cutoff by trying everything in the data. Well, it <em>is<\/em> <em>p<\/em>-hacky.<br \/>\n<\/span><span style=\"font-family: helvetica, arial, sans-serif;\">But the KS-test takes that into account when it computes its <em>p<\/em>-value, it adjusts for the 'multiple comparisons' so to speak. Indeed, it is usually considered a conservative test (under-rejects the null).<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>In Sum<\/strong><\/span><span style=\"font-family: helvetica, arial, sans-serif;\"><br \/>\nThe KS-test computes all the possible dichotomizations you could do of the dependent variable, and keeps track of the two dichotomizations producing the biggest differences between conditions in the cumulative shares of observations. These values, D+ and D- respectively, are bounds on the share of people influenced by the treatment positively and negatively.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Is this useful?<br \/>\n<\/strong><\/span><span style=\"font-family: helvetica, arial, sans-serif;\">Maybe. <\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\">It\u2019s a bound, so it will tend to be a low number. That is, the true value could be that 70% of people are influenced, and the bound could easily be just 20%. Bounds are like that. \"At least 1 person will love this post\" is not that informative a statement about how good this post is. So I lean to thinking this is more interesting than useful. Though it is an empirical question that depends on the shape of the distributions produced by treatments. I suspect it is potentially useful in two scenarios.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Scenario 1<\/strong>: when the majority of people in a study show an effect in one direction, but a minority of people show a strong effect in the opposite direction, and you care to realize this may be happening. For example, some antidepressants help most patients but exacerbate depression in a minority. The off-label Smirnov would be useful there to pick up that heterogeneity and prevent some disasters that have happened. So seems possibly useful for clinical trials (most people get less depressed, some people get very depressed)<sup class=\"modern-footnotes-footnote modern-footnotes-footnote--hover-on-desktop \" data-mfn=\"7\" data-mfn-post-scope=\"000000006ec4fb70000000004e14a77f_8206\"><a href=\"javascript:void(0)\"  title=\"would this be more or less useful than quantile tests? I don't know\"  role=\"button\" aria-pressed=\"false\" aria-describedby=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-7\">7<\/a><\/sup><span id=\"mfn-content-000000006ec4fb70000000004e14a77f_8206-7\" role=\"tooltip\" class=\"modern-footnotes-footnote__note\" tabindex=\"0\" data-mfn=\"7\">would this be more or less useful than quantile tests? I don't know<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: helvetica, arial, sans-serif;\"><strong>Scenario 2:<\/strong> to remind us that when we show a difference of means we may be focusing on an effect that is shown by a minority of participants. If we got into the habit of reporting off-label KS, we may put more effort in determining whether our findings apply to most, some, or few people.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/p>\n<hr \/>\n<p style=\"text-align: justify;\"><span style=\"color: #0000ff;\"><strong>Author feedback<br \/>\n<\/strong>Our policy (.<a style=\"color: #0000ff;\" href=\"https:\/\/datacolada.org\/feedback_policy\">htm<\/a>) is to share drafts of blog posts with authors whose work we discuss, in order to solicit suggestions for things we should change prior to posting, I emailed the authors of the <em>Journal of Econometrics<\/em> authors, but they did not reply. I did not contact Kolmogorov or Smirnov as they died in 1987 and 1966 respectively.<\/span><\/p>\n<div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/8206\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"0e7f1cad39\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/8206\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div>\n<h3 class=\"modern-footnotes-list-heading modern-footnotes-list-heading--hide-for-print\">Footnotes<\/h3><ul class=\"modern-footnotes-list modern-footnotes-list--hide-for-print\"><li><span>1<\/span><div>The KS-test can also be used to compare one sample to a theoretical distribution, e.g., to assess if a sample is statistically significantly not-normally distributed. We relied on it to when assessing whether the Ariely insurance driving data were uniformly distributed, see footnote 5 in <a href=\"https:\/\/datacolada.org\/98\" target=\"_blank\" rel=\"noopener\">Colada[98]<\/a>\u00a0 <\/div><\/li><li><span>2<\/span><div>Or one group, see Footnote 1<\/div><\/li><li><span>3<\/span><div>Technically, the KS test does not estimate the share showing an effect, it bounds it, meaning it tells you its smallest possible value, rather than its expected value. More on this later.<\/div><\/li><li><span>4<\/span><div>The algorithm involved graph theory and matching participants across conditions ('imagine subject 1 in the control is subject 17 in the treatment, would they have paid more or less for the lottery?), and doing that for every participant. It did not require actually doing all possible combinations, the algorithm sorted the data optimally and did it just once. <\/div><\/li><li><span>5<\/span><div>In Lemma 2.1 we need to make delta=0 to get the KS D+ and D- statistic. Delta is the size of the effect, so if delta=0, we are asking how many show a positive vs negative effect. If we make delta=1 we bound the share of participants showing an effect of at least $1, vs less than $1. It is easy to achieve that with the KS-test too, just add delta to the condition, and rerun the KS-test; that is, the KS test can also bound the share of people showing an effect of a particular size or larger <\/div><\/li><li><span>6<\/span><div>For ease of exposition I use the term possible\/impossible, but this is ignoring sampling error, so the statements are probabilistic, requiring statistical inference. One way of thinking about it is that D+ and D- are estimates with random error, so D+ is the <em>estimated<\/em> bound, with random error<\/div><\/li><li><span>7<\/span><div>would this be more or less useful than quantile tests? I don't know<\/div><\/li><\/ul>","protected":false},"excerpt":{"rendered":"<p>There is a classic statistical test known as the Kolmogorov-Smirnov (KS) test (Wikipedia). This post is about an off-label use of the KS-test that I don\u2019t think people know about (not even Kolmogorov or Smirnov), and which seems useful for experimentalists in behavioral science and beyond (most useful, I think, for clinical trials and field&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[41,77],"tags":[],"class_list":["post-8206","post","type-post","status-publish","format-standard","hentry","category-just-fun","category-hard_stats"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/8206","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=8206"}],"version-history":[{"count":5,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/8206\/revisions"}],"predecessor-version":[{"id":8280,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/8206\/revisions\/8280"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=8206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=8206"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=8206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}