{"id":1531,"date":"2017-04-12T07:00:15","date_gmt":"2017-04-12T11:00:15","guid":{"rendered":"http:\/\/datacolada.org\/?p=1531"},"modified":"2024-11-13T02:27:31","modified_gmt":"2024-11-13T07:27:31","slug":"59-pet-peese-is-not-like-homeopathy","status":"publish","type":"post","link":"https:\/\/datacolada.org\/59","title":{"rendered":"[59] PET-PEESE Is Not Like Homeopathy"},"content":{"rendered":"<p>PET-PEESE is a meta-analytical tool that seeks to correct for publication bias. In a footnote in my previous post (<a href=\"https:\/\/datacolada.org\/58\">.htm<\/a>), I referred to is as the homeopathy of meta-analysis. That was unfair and inaccurate.<\/p>\n<p><em>Unfair <\/em>because, in the style of our President, I just called PET-PEESE a name instead of describing what I believed was wrong with it. I deviated from one of my rules for 'menschplaining' (.<a href=\"https:\/\/datacolada.org\/52\">htm<\/a>): \"Don't label, describe.\"<\/p>\n<p><em>Inaccurate <\/em>because skeptics of homeopathy merely propose that it is ineffective, not harmful. But my argument is not that PET-PEESE is merely ineffective, I believe it is also harmful. It doesn't just fail to correct for publication bias, it <em>adds<\/em> substantial bias where none exists.<\/p>\n<p><span style=\"color: #ff0000; font-size: 10pt;\"><span style=\"color: #0000ff;\">note: A few hours after this blog went live, James Pustejovsky (.<span style=\"color: #993366;\"><a style=\"color: #993366;\" href=\"https:\/\/education.utexas.edu\/faculty\/james_pustejovsky\">htm<\/a><\/span>) identified a typo in the R Code which affects some results. I have already updated the code and figures below. (I archived the original post: .<span style=\"color: #993366;\"><a style=\"color: #993366;\" href=\"http:\/\/web.archive.org\/web\/20170412222418\/https:\/\/datacolada.org\/59\">htm<\/a><\/span>).<\/span><br \/>\n<\/span><\/p>\n<p><strong>PET-PEESE in a NUT-SHELL<br \/>\n<\/strong>Tom Stanley (.<a href=\"https:\/\/www.hendrix.edu\/maer-network\/default.aspx?id=15184\">htm<\/a>), later joined by Hristos Doucouliagos, developed PET-PEESE in various papers that have each accumulated 100-400 Google cites (.<a href=\"http:\/\/doi.org\/10.1111\/j.1468-0084.2007.00487.x\" target=\"_blank\" rel=\"noopener noreferrer\">pdf<\/a> | .<a href=\"http:\/\/doi.org\/10.1002\/jrsm.1095\" target=\"_blank\" rel=\"noopener noreferrer\">htm<\/a>). The procedure consists of running a meta-regression: a regression in which studies are the unit of analysis, with effect size as the dependent variable and its variance as the key predictor [<a href=\"#footnote_0_1531\" id=\"identifier_0_1531\" class=\"footnote-link footnote-identifier-link\" title=\"Actually, that&#039;s just PEESE; PET uses the standard error as the predictor\">1<\/a>]. The clever insight by Stanley &amp; Doucouliagos is that the intercept of this regression is the effect we would expect in the absence of noise, thus, our estimate of the -publication bias corrected- true effect [<a href=\"#footnote_1_1531\" id=\"identifier_1_1531\" class=\"footnote-link footnote-identifier-link\" title=\"With PET-PEESE one runs both regressions. If PET is significant, one uses PEESE; if PET is not significant, one uses PET (!). \">2<\/a>].<\/p>\n<p><strong>PET-PEESE in Psychology<\/strong><br \/>\nPET-PEESE was developed with the meta-analysis of economics papers in mind (regressions with non-standardized effects). It is possible that some of the problems identified here, considering meta-analyses of standardized effect sizes, Cohen's d, do not extend to such settings [<a href=\"#footnote_2_1531\" id=\"identifier_2_1531\" class=\"footnote-link footnote-identifier-link\" title=\"Though a working paper by Alinaghi and Reed suggests PET-PEESE performs poorly there as well .pdf\">3<\/a>].<\/p>\n<p style=\"text-align: left;\">Psychologists have started using PET-PEESE recently. For instance, in meta-analyses about religious primes (.<a href=\"http:\/\/www.psy.miami.edu\/ehblab\/ImplicitReligiousPrimesDictatorGame.pdf\">pdf<\/a>), working memory training (.<a href=\"https:\/\/web.archive.org\/web\/20180124191819\/http:\/\/pubmedcentralcanada.ca\/pmcc\/articles\/PMC4302828\/\" target=\"_blank\" rel=\"noopener noreferrer\">htm<\/a>), and personality of computer wizzes (.<a href=\"http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0092656615300052\" target=\"_blank\" rel=\"noopener noreferrer\">htm)<\/a>. Probably the most famous example is Carter et al.'s meta-analysis of ego depletion, published in JEP:G (.<a href=\"https:\/\/psycnet.apa.org\/doi\/10.1037\/xge0000083\" target=\"_blank\" rel=\"noopener noreferrer\">htm<\/a>).<\/p>\n<p style=\"text-align: left;\">In this post I share simulation results that suggest we should not treat PET-PEESE estimates, at least of psychological research, very seriously. It arrives at wholly invalid estimates under too many plausible circumstances. Statistical tools need to be generally valid, or at least valid under predictable circumstances. PET-PEESE, to my understanding, is neither [<a href=\"#footnote_3_1531\" id=\"identifier_3_1531\" class=\"footnote-link footnote-identifier-link\" title=\"I shared an early draft of this paper with various peers, including Daniel Lakens and Stanley himself. They both pointed me to a recent paper in SPPS by Stanley (.htm). It identifies conditions under which PET-PEESE gives bad results. The problems I identify here are different, and much more general than those identified there. Moreover, results presented here seem to directly contradict the conclusions from the SPPS paper. For instance, Stanley proposes that if the observed heterogeneity in studies is I2&lt;80% we should trust PET-PEESE, and yet, in none of the simulations I present here, with utterly invalid results, is I2&gt;80%; thus I would suggest to readers to not follow that advice. Stanley (.htm) also points out that when there are 20 or fewer studies PET-PEESE should not be used; all my simulations assume 100 studies, and the results do not improve with a smaller sample of studies.\">4<\/a>].<\/p>\n<p style=\"text-align: left;\"><strong>Results<br \/>\n<\/strong>Let's start with a baseline case for which PET-PEESE does OK: there is no publication bias, every study examines the exact same effect size, and sample sizes are distributed uniformly between n=12 and n=120 per cell. Below we see that when the true effect is d=0, PET-PEESE correctly estimates it as d\u0302=0, and as d gets larger, d\u0302 gets larger (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/DataColada59-2017-04-11-corrected-typo-in-var.d.r\">R Code<\/a>).<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f1.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1724\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f1.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f1.png 600w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f1-300x240.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>About 2 years ago, Will Gervais evaluated PET-PEESE in a thorough blog post (.<a href=\"http:\/\/web.archive.org\/web\/20170326200626\/http:\/\/willgervais.com\/blog\/2015\/6\/25\/putting-pet-peese-to-the-test-1\">htm<\/a>) (which I have cited in papers a few times). He found that in the presence of publication bias PET-PEESE did not perform well, but that in the absence of publication bias it at least did not make things worse. The simulations depicted above are not that different from his.<\/p>\n<p>Recently, however, and by happenstance, I realized that Gervais got lucky with the simulations (or I guess PET-PEESE got lucky) [<a href=\"#footnote_4_1531\" id=\"identifier_4_1531\" class=\"footnote-link footnote-identifier-link\" title=\"In particular, when preparing Colada[58] I simulated meta-analyses where, instead of choosing sample size at random, as the funnel-plot assumes, researchers choose larger samples to study smaller effects. I found truly spectacularly poor performance by PET-PEESE, much worse that trim-and-fill. Thinking about it, I realized that if researchers do any sort of power calculations, even intuitive or based on experience, then a symmetric distributions of effect size leads to an asymmetric distributions of sample size. See this illustrative figure (R Code):\n\nSo it seemed worth checking if asymmetry alone, even if researchers were to set sample size at random, led to worse performance for PET-PEESE. And it did. \">5<\/a>]. If we deviate slightly from some\u00a0 of the specifics of the ideal scenario in any of several directions, PET-PEESE no longer performs well even in the absence of publication bias.<\/p>\n<p>For example, imagine that sample sizes don't go all the way to up n=120 per cell; instead, they go up to only n=50 per cell (as is commonly the case with lab studies) [<a href=\"#footnote_5_1531\" id=\"identifier_5_1531\" class=\"footnote-link footnote-identifier-link\" title=\"e.g., using d.f. in t-test from scraped studies as data, back in 2010, the median n in Psych Science was about 18, and around 85% of studies were n&lt;50\">6<\/a>]:<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f2.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1725\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f2.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f2.png 600w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f2-300x240.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p style=\"text-align: justify; text-justify: inter-ideograph;\">A more surprisingly consequential assumption involves the symmetry of sample sizes across studies. Whether there are more small than large n studies, or vice versa, PET PEESE's performance suffers quite a bit. For example, if sample sizes look like this:<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/hist1.png\"><img decoding=\"async\" class=\"alignnone wp-image-1540 size-full\" style=\"border: 0.5px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/hist1.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/hist1.png 600w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/hist1-300x240.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>then PET-PEESE looks like this:<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f3.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1726\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f3.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f3.png 600w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f3-300x240.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><span style=\"color: #808080;\"><strong><span style=\"font-size: 8pt;\"><br \/>\n<span style=\"font-size: 12pt; color: #999999;\">Micro-appendix<\/span><\/span><\/strong><br \/>\n<span style=\"font-size: 12pt; color: #999999;\"> 1) It looks <em><span style=\"text-decoration: underline;\">worse<\/span> <\/em>if there are more big n than small n studies (.<span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e1-2.png\" target=\"_blank\" rel=\"noopener noreferrer\">png<\/a><\/span>). <\/span><br \/>\n<span style=\"font-size: 12pt; color: #999999;\"> 2) Even if studies have n=50 to n=120, there is noticeable bias if <em>n<\/em> is skewed across studies (.<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e2-2017-06-15.png\">png<\/a>) <\/span><br \/>\n<\/span><\/p>\n<p>It's likely, I believe, for real meta-analyses to have skewed n distributions. e.g., this is what it looked like in that ego depletion paper (note: it plots total N, not per-cell):<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/Carter-histogram.png\"><img decoding=\"async\" class=\"alignnone wp-image-1591 size-full\" style=\"border: 1px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/Carter-histogram.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/Carter-histogram.png 674w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/Carter-histogram-300x252.png 300w\" sizes=\"(max-width: 674px) 100vw, 674px\" \/><\/a><\/p>\n<p>So far we have assumed all studies have the exact same effect size, say all studies in the d=.4 bin are exactly d=.4. In real life different studies have different effects. For example, a meta-analysis of ego-depletion may include studies with stronger and weaker manipulations that lead to, say, d=.5 and d=.3 respectively. <em>On average<\/em> the effect may be d=.4, but it moves around. Let's see what happens if across studies the effect size has a standard deviation of SD=.2.<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f4.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1727\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f4.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f4.png 600w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f4-300x240.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p><span style=\"color: #808080;\"><strong><span style=\"font-size: 8pt;\"><span style=\"font-size: 12pt; color: #999999;\">Micro-appendix<\/span><\/span><\/strong><br \/>\n<span style=\"font-size: 12pt; color: #999999;\"> 3) If big n studies are more common than small ns: .<\/span><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e3-2.png\" target=\"_blank\" rel=\"noopener noreferrer\">png<br \/>\n<\/a><\/span><span style=\"color: #808080;\"><span style=\"font-size: 12pt; color: #999999;\">4) If n=12 to n=120 instead of just n=50, .<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e4-2.png\" target=\"_blank\" rel=\"noopener noreferrer\">png<\/a><\/span><\/span><\/p>\n<p><strong>Most troubling scenario<br \/>\n<\/strong>Finally, here is what happens when there is publication bias (only observe <em>p<\/em>&lt;.05)<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f5.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1728\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f5.png\" alt=\"\" width=\"400\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f5.png 600w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/f5-300x240.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><br \/>\n<span style=\"color: #808080;\"><strong><span style=\"font-size: 8pt;\"><span style=\"font-size: 12pt; color: #999999;\">Micro-appendix<\/span><\/span><\/strong><br \/>\n<span style=\"font-size: 12pt; color: #999999;\"> With publication bias,<br \/>\n5) If <em>n<\/em> goes up to n=120: .<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e5-2.png\" target=\"_blank\" rel=\"noopener noreferrer\">png<\/a><br \/>\n6) If n is uniform n=12 to n=50 .<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e6-2.png\" target=\"_blank\" rel=\"noopener noreferrer\">png<\/a><br \/>\n7) If d is homogeneous, sd(d)=0 .<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/e7-2.png\" target=\"_blank\" rel=\"noopener noreferrer\">png<\/a> <\/span><\/span><\/p>\n<p>It does not seem prudent to rely on PET-PEESE, in any way, for analyzing psychological research. It's an invalid tool under too many scenarios.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" style=\"border: 0px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/p>\n<hr \/>\n<p><span style=\"color: #0000ff;\"><strong>Author feedback.<br \/>\n<\/strong>Our policy is to share early drafts of our post with authors whose work we discuss. I shared this post with the creators of PET-PEESE, and also with others familiar with it: Will Gervais, Daniel Lakens, Joe Hilgard, Evan Carter, Mike McCullough and Bob Reed. Their feedback helped me identify an important error in my R Code, avoid some statements that seemed unfair, and become aware of the recent SPPS paper by Tom Stanley (see footnote 4). During this process I also learned, to my dismay, that people seem to believe <strong>-incorrectly-<\/strong> that <em>p<\/em>-curve is invalidated under heterogeneity of effect size. A future post will discuss this issue, impatient readers can check out our p-curve papers, especially Figure 1 in our first paper (<span style=\"color: #800000;\"><a style=\"color: #800000;\" href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=2256237\">here<\/a><\/span>) and Figure S2 in our second (<span style=\"color: #800000;\"><a style=\"color: #800000;\" href=\"http:\/\/p-curve.com\/Supplement\/Supplement_pcurve2.pdf\">here<\/a><\/span>), which already address it; but evidently insufficiently compellingly.<br \/>\n<\/span><\/p>\n<p><span style=\"color: #0000ff;\">Last but not least, everyone I contacted was offered an opportunity to reply within this post. Both Tom Stanley (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/Response-by-Tom-Sanley-to-Data-Colada.pdf\">.<span style=\"color: #ff0000;\">pdf<\/span><\/a>), and Joe Hilgard (.<span style=\"color: #800000;\"><a style=\"color: #800000;\" href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/Response-by-Joe-Hilgard-to-Colada-59.pdf\">pdf<\/a><\/span>) did.<br \/>\n<\/span><\/p>\n<div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1531\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"475b6cc4a4\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/1531\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div>\n<strong>Footnotes. <\/strong><\/p>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_1531\" class=\"footnote\">Actually, that's just PEESE; PET uses the standard error as the predictor [<a href=\"#identifier_0_1531\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_1531\" class=\"footnote\">With PET-PEESE one runs both regressions. If PET is significant, one uses PEESE; if PET is not significant, one uses PET (!).  [<a href=\"#identifier_1_1531\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_2_1531\" class=\"footnote\">Though a working paper by Alinaghi and Reed suggests PET-PEESE performs poorly there as well .<a href=\"https:\/\/web.archive.org\/web\/20190430201935\/http:\/\/www.econ.canterbury.ac.nz\/RePEc\/cbt\/econwp\/1626.pdf\">pdf<\/a> [<a href=\"#identifier_2_1531\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_3_1531\" class=\"footnote\">I shared an early draft of this paper with various peers, including Daniel Lakens and Stanley himself. They both pointed me to a recent paper in SPPS by Stanley (.<a href=\"https:\/\/doi.org\/10.1177%2F1948550617693062\">htm<\/a>). It identifies conditions under which PET-PEESE gives bad results. The problems I identify here are different, and much more general than those identified there. Moreover, results presented here seem to directly contradict the conclusions from the SPPS paper. For instance, Stanley proposes that if the observed heterogeneity in studies is I<sup>2<\/sup>&lt;80% we should trust PET-PEESE, and yet, in none of the simulations I present here, with utterly invalid results, is I<sup>2<\/sup>&gt;80%; thus I would suggest to readers to not follow that advice. Stanley (.<a href=\"https:\/\/doi.org\/10.1177%2F1948550617693062\">htm<\/a>) also points out that when there are 20 or fewer studies PET-PEESE should not be used; all my simulations assume 100 studies, and the results do not improve with a <em>smaller<\/em> sample of studies. [<a href=\"#identifier_3_1531\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_4_1531\" class=\"footnote\">In particular, when preparing Colada[58] I simulated meta-analyses where, instead of choosing sample size at random, as the funnel-plot assumes, researchers choose larger samples to study smaller effects. I found truly spectacularly poor performance by PET-PEESE, much worse that trim-and-fill. Thinking about it, I realized that if researchers do any sort of power calculations, even intuitive or based on experience, then a symmetric distributions of effect size leads to an asymmetric distributions of sample size. See this illustrative figure (<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/R-Symmetry-of-n-vs-d.r\">R Code)<\/a>:<br \/>\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/symmetry-of-n-vs-d.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1699\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/symmetry-of-n-vs-d.png\" alt=\"\" width=\"300\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/symmetry-of-n-vs-d.png 948w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/symmetry-of-n-vs-d-300x184.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/04\/symmetry-of-n-vs-d-768x472.png 768w\" sizes=\"(max-width: 948px) 100vw, 948px\" \/><\/a><br \/>\nSo it seemed worth checking if asymmetry alone, even if researchers were to set sample size at random, led to worse performance for PET-PEESE. And it did.  [<a href=\"#identifier_4_1531\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_5_1531\" class=\"footnote\">e.g., using d.f. in t-test from scraped studies as data, back in 2010, the median n in Psych Science was about 18, and around 85% of studies were n&lt;50 [<a href=\"#identifier_5_1531\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>PET-PEESE is a meta-analytical tool that seeks to correct for publication bias. In a footnote in my previous post (.htm), I referred to is as the homeopathy of meta-analysis. That was unfair and inaccurate. Unfair because, in the style of our President, I just called PET-PEESE a name instead of describing what I believed was&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[72,75,65],"tags":[],"class_list":["post-1531","post","type-post","status-publish","format-standard","hentry","category-file-drawer","category-meta-analysis","category-p-curve"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1531","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=1531"}],"version-history":[{"count":7,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1531\/revisions"}],"predecessor-version":[{"id":5367,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1531\/revisions\/5367"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=1531"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=1531"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=1531"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}