{"id":4535,"date":"2019-11-20T07:00:33","date_gmt":"2019-11-20T12:00:33","guid":{"rendered":"http:\/\/datacolada.org\/?p=4535"},"modified":"2020-02-11T23:59:25","modified_gmt":"2020-02-12T04:59:25","slug":"80-interaction-effects-need-interaction-controls","status":"publish","type":"post","link":"https:\/\/datacolada.org\/80","title":{"rendered":"[80] Interaction Effects Need Interaction Controls"},"content":{"rendered":"<p style=\"text-align: justify;\">In a recent referee report I argued something I have argued in several reports before:<em> if the effect of interest in a regression is an interaction, the control variables addressing possible confounds should be interactions as well.<\/em> In this post I explain that argument using as a working example a 2011 QJE paper (.<a href=\"https:\/\/academic.oup.com\/qje\/article-abstract\/126\/1\/103\/1903433\">htm<\/a>) that examined domestic violence following NFL games.<\/p>\n<p style=\"text-align: justify;\">I chose that paper because it provides an intuitive setting to explain the need for interaction controls, and because it is the paper that first prompted me to think of this issue (several years ago).<\/p>\n<p style=\"text-align: left;\"><strong>The 2011 QJE paper<br \/>\n<\/strong>This excerpt from the abstract summarizes the key finding:<br \/>\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/short-abstract.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4537 size-full\" style=\"border: 1px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/short-abstract.png\" alt=\"\" width=\"500\" height=\"175\" \/><\/a><br \/>\n<span style=\"font-size: 10pt; color: #999999;\"><em>\u00a0 \u00a0 \u00a0 \u00a0<u>Codebook.<\/u><\/em><\/span><br \/>\n<span style=\"font-size: 10pt; color: #999999;\"><em>\u00a0 \u00a0 \u00a0 Pregame point spread<\/em>: Number of points the favorite team is expected to win by.<\/span><br \/>\n<span style=\"font-size: 10pt; color: #999999;\"><em>\u00a0 \u00a0 \u00a0Local viewing audience<\/em>: How many locals watched the game on TV.<\/span><\/p>\n<p>Table IV in the paper has the key result:<br \/>\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/Taboe-IV.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-4538 size-full\" style=\"border: 1px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/Taboe-IV.png\" alt=\"\" width=\"600\" height=\"694\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Each column is a different (Poisson) regression with domestic violence after an NFL game as the dependent variable. As we move right the regressions include more controls.<\/p>\n<p style=\"text-align: justify;\">The first row shows the estimated effect of a surprising vs. expected loss. The point estimate is about 0.10, corresponding to that 10% increase in violence mentioned in the abstract (scroll up).<\/p>\n<p style=\"text-align: justify;\"><strong>Viewership: a confound that worried the original authors<br \/>\n<\/strong>As the QJE paper notes, one concern with the finding is that more people watch games when their team is expected to win (see footnote for details of the authors\u2019 original discussion of this issue [<a href=\"#footnote_0_4535\" id=\"identifier_0_4535\" class=\"footnote-link footnote-identifier-link\" title=\"Discussion by QJE 2011 authors of role of viewership.\nSection III.D in the paper deals with the viewership issue in two paragraphs. The authors note that a regression predicting viewership with ex-ante spread implies that going from a -4 to a +4 spread implies just a 1 percentage point increase in viewership and thus &ldquo;we infer that any differential reaction to the outcomes of predicted wins versus predicted losses is unlikely to be attributable to changes in viewership.&rdquo; (p.124). This seems sensible to me and a reasonable guess in the absence of a better way of addressing this issue, but, including an interaction control of viewership with loss vs gain is a better way of addressing it.\">1<\/a>]). This means that more people watch surprising losses than watch expected losses. We thus have two explanations for greater violence after surprising losses:<\/p>\n<p style=\"text-align: justify;\">Explanation 1: surprising losses are more enraging.<br \/>\nExplanation 2: surprising losses are watched by more people.<\/p>\n<p style=\"text-align: justify;\">Column (5) in Table IV above intends to tease these two apart by controlling for viewership (the \"Nielsen rating\" row). But the point of this post is that such control is not sufficient.<\/p>\n<p style=\"text-align: justify;\">If the confound we were worried about was that the bigger the number of fans <span style=\"color: #0000ff;\">watching any game<\/span>, the more violence there is, then controlling for the Nielsen rating as a main effect would be the right solution.\u00a0 But we are concerned with something different. The confound we are worried about is that the bigger the number of fans <span style=\"color: #0000ff;\">watching a loss<\/span>, the more violence there is.<\/p>\n<p style=\"text-align: justify;\">More generally, because the effect of interest is an interaction, any alternative explanation for it must also involve an interaction. In this particular case, the effect of interest involves an interaction with the team losing, and therefore, any confound must also involve an interaction with the team losing. So we want viewership to potentially have a different effect for games won vs lost, so we need to add the term <em><span style=\"color: #0000ff;\">Nielsen Rating*Loss<\/span> <\/em>as a predictor.<\/p>\n<p style=\"text-align: justify;\">I make the same point with simple regression equations in this footnote [<a href=\"#footnote_1_4535\" id=\"identifier_1_4535\" class=\"footnote-link footnote-identifier-link\" title=\"Presenting the problem with generic regression equations.\nSay a paper is interested in an interaction between x &amp; z, the term c in (1)\n(1) y=ax+bz+cxz,\nWhen authors are concerned that a third variable, w, is correlated with x, and correlated with y, then, to account for this confound, it is not enough to estimate (2),\n(2) y=ax+bz+cxz+dw\nInstead, one must estimate (3),\n(3) y=ax+bz+cxz+dw+ewz\nThe 2011 QJE paper, and the several papers I have reviewed that motivated this post, estimated (2).&nbsp;\n\n\n\n&lt;end of Colada[80]&gt;\">2<\/a>].<\/span><\/p>\n<p style=\"text-align: justify;\"><strong>This is a relatively common issue.<br \/>\n<\/strong>I browsed the last two issues of some top economic journals (AEJ:Applied, AER, &amp; QJE) and found several papers with regressions where an interaction effect was of interest, and where controls to address confounds were included. But I did not find any that included interaction-controls. The need for interaction-controls does not seem to be an issue applied researchers are generally aware of.<\/p>\n<p style=\"text-align: justify;\"><strong>Punchline<\/strong>.<br \/>\nWhen the coefficient of interest is an interaction, and confounds are a concern, <em>interacted<\/em> controls are needed.<\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/p>\n<hr \/>\n<p style=\"text-align: justify;\"><span style=\"color: #0000ff;\"><strong>Author feedback<\/strong><\/span><br \/>\n<span style=\"color: #0000ff;\">Our policy (.<a style=\"color: #0000ff;\" href=\"https:\/\/web.archive.org\/web\/20181120152002\/https:\/\/clas.ucdenver.edu\/hbsc\/meng-li\">htm<\/a>) is to share drafts of blog posts that discuss someone else's work with them to solicit suggestions for things we should change prior to posting. I shared a draft with David Card and Gordon Dahl. They did not suggest any changes but Gordon reported the results with the suggested interaction controls, writing [I added the approximate p-values in [ ] for readers who may be interested in them]:<\/span><\/p>\n<p><span style=\"color: #0000ff;\">\"<em>adding these interactions does not appreciably change the point estimate on the upset loss variable: it becomes .095 (.056) [approx. p-value=.09].\u00a0 This compares to the estimates reported in the QJE paper of .100 (.0310)\u00a0 [approx. p-value &lt;.001] when we only control for Nielsen rating and .096 (.031) when we don't control for Nielsen rating at all.\u00a0 Interestingly, the coefficients on rating*loss and rating*win are almost identical: .0031 versus .0034, respectively.\u00a0 Moreover, our test for loss aversion (upset loss = &#8211; upset win) has a p-value of .02 now, whereas before it was .01.\u00a0 So our conclusion is that adding in the interaction terms with ratings results in less precise estimates, but doesn't change the estimates appreciably<\/em>.\"\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"color: #0000ff;\">I thank David and Gordon for their time, and for dusting off the STATA code from nearly 10 years ago for this post.<\/span><\/p>\n<p><span style=\"color: #0000ff;\">Shortly after this post went live, Dominique Muller (<a href=\"https:\/\/twitter.com\/dom_muller\" target=\"_blank\" rel=\"noopener noreferrer\">@dom_muller<\/a>) alerted me to this JESP (.<a style=\"color: #0000ff;\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0022103103001598\">htm<\/a>) paper dealing with this very issue and documenting that psychology papers don't include interaction controls.<\/span><\/p>\n<div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/4535\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"b4cdbc0b54\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/4535\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div><strong>Footnotes.<\/strong><\/p>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_4535\" class=\"footnote\"><em>Discussion by QJE 2011 authors of role of viewership.<br \/>\n<\/em><span style=\"font-size: 10pt;\">Section III.D in the paper deals with the viewership issue in two paragraphs. The authors note that a regression predicting viewership with ex-ante spread implies that going from a -4 to a +4 spread implies just a 1 percentage point increase in viewership and thus \u201c<em>we infer that any differential reaction to the outcomes of predicted wins versus predicted losses is unlikely to be attributable to changes in viewership.<\/em>\u201d (p.124). This seems sensible to me and a reasonable guess in the absence of a better way of addressing this issue, but, including an interaction control of viewership with loss vs gain <em>is<\/em> a better way of addressing it.<\/span> [<a href=\"#identifier_0_4535\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_4535\" class=\"footnote\"><em>Presenting the problem with generic regression equations.<br \/>\n<\/em><span style=\"font-size: 10pt;\">Say a paper is interested in an interaction between x &amp; z, the term <strong>c<\/strong> in (1)<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 10pt;\">(1) y=<strong>a<\/strong>x+<strong>b<\/strong>z+<strong>c<\/strong>xz,<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 10pt;\">When authors are concerned that a third variable, <em>w,<\/em> is correlated with x, and correlated with y, then, to account for this confound, it is not enough to estimate (2),<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 10pt;\">(2) y=<strong>a<\/strong>x+<strong>b<\/strong>z+<strong>c<\/strong>xz+<strong>d<\/strong>w<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 10pt;\">Instead, one must estimate (3),<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 10pt;\">(3) y=<strong>a<\/strong>x+<strong>b<\/strong>z+<strong>c<\/strong>xz+<strong>d<\/strong>w+<span style=\"color: #0000ff;\"><strong>e<\/strong>wz<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 10pt;\">The 2011 QJE paper, and the several papers I have reviewed that motivated this post, estimated (2).\u00a0<\/span><\/p>\n<hr \/>\n<p style=\"text-align: center;\"><span style=\"font-size: 10pt; color: #808080;\">&lt;end of Colada[80]&gt; [<a href=\"#identifier_1_4535\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>In a recent referee report I argued something I have argued in several reports before: if the effect of interest in a regression is an interaction, the control variables addressing possible confounds should be interactions as well. In this post I explain that argument using as a working example a 2011 QJE paper (.htm) that&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[80,77],"tags":[],"class_list":["post-4535","post","type-post","status-publish","format-standard","hentry","category-interactions","category-hard_stats"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/4535","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=4535"}],"version-history":[{"count":5,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/4535\/revisions"}],"predecessor-version":[{"id":4809,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/4535\/revisions\/4809"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=4535"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=4535"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=4535"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}