{"id":5716,"date":"2020-10-01T07:00:58","date_gmt":"2020-10-01T11:00:58","guid":{"rendered":"http:\/\/datacolada.org\/?p=5716"},"modified":"2020-10-01T08:33:30","modified_gmt":"2020-10-01T12:33:30","slug":"92-data-replicada-8-is-the-left-digit-bias-stronger-when-prices-are-presented-side-by-side","status":"publish","type":"post","link":"https:\/\/datacolada.org\/92","title":{"rendered":"[92] Data Replicada #8: Is The Left-Digit Bias Stronger When Prices Are Presented Side-By-Side?"},"content":{"rendered":"<p style=\"text-align: justify;\">In the eighth installment of <a href=\"https:\/\/datacolada.org\/81\">Data Replicada<\/a>, we report our attempt to replicate a recently published <em>Journal of Marketing Research <\/em>(JMR) article entitled, \u201cThe Left-Digit Bias: When and Why Are Consumers Penny Wise and Pound Foolish?\u201d (<a href=\"https:\/\/journals.sagepub.com\/doi\/full\/10.1177\/0022243720932532\">.htm<\/a>).<\/p>\n<p style=\"text-align: justify;\">In this paper, the authors offer insight into a previously documented observation known as the <em>left-digit bias<\/em>, whereby consumers tend to give greater weight to the left-most digit when comparing two prices. This means, for example, that consumers tend to treat a price difference of $4.00 vs. $2.99 as larger than $4.01 vs. $3.00 [<a href=\"#footnote_0_5716\" id=\"identifier_0_5716\" class=\"footnote-link footnote-identifier-link\" title=\"In our Data Colada Seminar Series, Devin Pope recently presented very compelling evidence for the left digit bias among Lyft riders. You can watch that talk here: https:\/\/www.youtube.com\/watch?v=9uUPd313vYk.\">1<\/a>]. The authors\u2019 key claim is that this bias is greater when the prices are presented side-by-side, in a way that makes them easier to compare, than sequentially, on two separate but consecutive screens.<\/p>\n<p style=\"text-align: justify;\">This JMR paper contains five MTurk studies plus an additional study in which the authors analyzed scanner panel data. We chose to replicate Study 1 (N = 145) because it represented the simplest test of the authors\u2019 hypothesis, and because its effect size was among the largest in the paper [<a href=\"#footnote_1_5716\" id=\"identifier_1_5716\" class=\"footnote-link footnote-identifier-link\" title=\"Study 3 had a larger effect but was much more complicated.\">2<\/a>]. In this study, participants saw two brands of peanut butter, a premium brand and a store brand, along with their prices. The authors manipulated two factors. First, the left-digit price difference was either large ($4.00 vs. $2.99) or small ($4.01 vs. $3.00). Second, the peanut butters (and their prices) were either presented adjacently on the same screen, or one after the other on different screens. Participants evaluated the relative price of the store brand on a scale ranging from <em>very low <\/em>to <em>very high<\/em>. The researchers found that consumers showed a greater left-digit bias \u2013 evaluating the store brand to be relatively lower when it was $2.99 than when it was $3.00 \u2013 when the products and prices were presented on the same screen rather than sequentially.<\/p>\n<p style=\"text-align: justify;\">We contacted the authors to request the materials needed to conduct a replication. They were extremely forthcoming, thorough, and polite. They immediately shared the original Qualtrics file that they used to conduct that study, and we used it to conduct our replication. They also promptly answered a few follow-up questions. It is also worth noting that the authors had publicly posted their data (<a href=\"https:\/\/osf.io\/5zbgw\/\">OSF link<\/a>), which allowed us to easily access key statistics and to verify that their results are reproducible (and they are). We are very grateful to them for their help, professionalism, and transparency.<\/p>\n<p style=\"text-align: justify;\"><strong><u>The Replications<\/u><\/strong><\/p>\n<p style=\"text-align: justify;\">We ran two identical replications. The first was run on MTurk using the same criteria the original authors specified. The second was run on MTurk using a new feature that screens for only high-quality \u201cCloudResearch Approved Participants\u201d [<a href=\"#footnote_2_5716\" id=\"identifier_2_5716\" class=\"footnote-link footnote-identifier-link\" title=\"Specifically, in the first replication, we used MTurkers with at least a 98% approval rating and at least 1,000 HITs completed. In the second replication, we used only &ldquo;CloudResearch Approved Participants&rdquo; with at least a 98% approval rating and at least 1,000 HITs completed.\">3<\/a>].<\/p>\n<p style=\"text-align: justify;\">In the pre-registered replications, we used the same survey as in the original study, and therefore the same instructions, procedures, images, and questions. These studies did not deviate from the original study in any discernible way, except that our consent form was necessarily different, our exclusion rules were slightly different [<a href=\"#footnote_3_5716\" id=\"identifier_3_5716\" class=\"footnote-link footnote-identifier-link\" title=\"For quality control, we pre-registered to exclude all observations associated with duplicate MTurk IDs or IP Addresses, and to exclude those whose actual MTurk IDs were different than their reported IDs.\">4<\/a>], and we added an attention check to the last page of the survey to help us measure participant quality. After exclusions, we wound up with 1,099 participants in Replication 1 (~7.5 times the original sample size) and 1,555 participants in Replication 2 (~10.7 times the original sample size) [<a href=\"#footnote_4_5716\" id=\"identifier_4_5716\" class=\"footnote-link footnote-identifier-link\" title=\"We decided to increase the sample size in Replication 2 after observing marginally significant results in Replication 1. We went so big on both sample sizes because we know from experience (and math) that you often need very large samples to detect attenuated interactions (see Data Colada[17].\">5<\/a>]. You can (easily!) access all of our pre-registrations, surveys, materials, data, and code on ResearchBox (<a href=\"https:\/\/researchbox.org\/37\">.htm<\/a>).<\/p>\n<p style=\"text-align: justify;\">In each study, participants saw five pairs of products alongside their prices. Each pair was presented on its own page(s) of the survey, and included a premium brand and a store brand called Great Value. The first four product pairs \u2013 ketchup, tuna, brown rice, and mayonnaise \u2013 were filler items \u2013 and the fifth pair \u2013 peanut butter \u2013 was the critical item. For each pair participants were asked: \u201cCompared to the premium price, the price of the store brand is . . .\u201d (1 = Very Low; 7 = Very High).<\/p>\n<p style=\"text-align: justify;\">There were two manipulations.<\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-5717 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image.png\" alt=\"\" width=\"3179\" height=\"2237\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image.png 3179w, https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image-300x211.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image-1024x721.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image-768x540.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image-1536x1081.png 1536w, https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image-2048x1441.png 2048w, https:\/\/datacolada.org\/wp-content\/uploads\/Sokolova-Conditions-Image-850x598.png 850w\" sizes=\"auto, (max-width: 3179px) 100vw, 3179px\" \/><\/p>\n<p style=\"text-align: justify;\">First, as shown above, participants in the \u201cSame Screen\u201d condition saw the two products\/prices on the same screen, with the premium brand always presented above the store brand. On that same screen, participants were asked the complete the dependent measure. Those in the \u201cOne At A Time\u201d condition instead saw the two products\/prices on consecutive screens, with the premium brand always presented first and the store brand always presented second. The dependent measure was then presented on a subsequent screen [<a href=\"#footnote_5_5716\" id=\"identifier_5_5716\" class=\"footnote-link footnote-identifier-link\" title=\"Actually, in the &ldquo;One At A Time&rdquo; condition, the products and measures were presented across five screens rather than three. The first screen showed the premium brand and its price. The second screen displayed an asterisk for one second. The third screen showed the store brand and its price. The fourth screen displayed another asterisk for one second. And then the fifth screen presented the dependent measure. According to the authors, &ldquo;The asterisk was used to clear participants&rsquo; visuospatial sketchpads and make it more difficult for them to retain precise perceptual representations in memory (Baddeley and Hitch 1974).&rdquo;\">6<\/a>].<\/p>\n<p style=\"text-align: justify;\">Second, the prices of the peanut butters varied between conditions. In the Large Left Difference condition (shown above), the premium brand was priced at $4.00 and the store brand was priced at $2.99. In the Small Left Difference condition (not shown), the premium was priced at $4.01 and the store brand was priced at $3.00.<\/p>\n<p style=\"text-align: justify;\"><strong><u>Results<\/u><\/strong><\/p>\n<p style=\"text-align: justify;\">Here are the original vs. replication results. Note that there is a left-digit bias whenever a green bar is higher than the adjacent blue bar:<\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-5718 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2.png\" alt=\"\" width=\"3000\" height=\"1800\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2.png 3000w, https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2-300x180.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2-1024x614.png 1024w, https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2-768x461.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2-1536x922.png 1536w, https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2-2048x1229.png 2048w, https:\/\/datacolada.org\/wp-content\/uploads\/means_plot-all-data-2-850x510.png 850w\" sizes=\"auto, (max-width: 3000px) 100vw, 3000px\" \/><\/p>\n<p style=\"text-align: justify;\">Here are the big takeaways.<\/p>\n<p style=\"text-align: justify;\">First, you can see that we find marginally significant support for the authors\u2019 key interaction in both replications. The left-digit bias is directionally larger when the prices are presented on the same screen than when the prices are presented in different screens. Although the evidence is much weaker than in the original study, we are inclined to believe that the original effect is real.<\/p>\n<p style=\"text-align: justify;\">Second, if we didn\u2019t observe a significant interaction with 1,100-1,600 participants, it is unlikely that you can reliably detect this interaction in a study that contains 145 participants. Indeed, our analyses show that the original study had only 9.3% power to detect the Replication 1 result and 7.8% power to detect the Replication 2 result. You\u2019d need 3,000 participants (750 per cell) to have an 80% chance of detecting the Replication 1 result, and 4,500 participants to have an 80% chance of detecting the Replication 2 result [<a href=\"#footnote_6_5716\" id=\"identifier_6_5716\" class=\"footnote-link footnote-identifier-link\" title=\"You really should let all of this sink in. First, to study things like this you need really gigantic samples. Second, the difference in your required sample size between a true b = .31 (Replication 1) and a true b = .21 (Replication 2) is 1,500 participants! In our field, we tend to be entirely indifferent to an effect size difference of that magnitude. But that seemingly meaningless difference can dramatically affect how expensive (or possible) it is to study something.\">7<\/a>].<\/p>\n<p style=\"text-align: justify;\">Third, using CloudResearch Approved Participants (in Replication 2) seemed to strengthen the left-digit bias in the Same Screen condition, perhaps because these participants are more attentive (for evidence see this footnote: [<a href=\"#footnote_7_5716\" id=\"identifier_7_5716\" class=\"footnote-link footnote-identifier-link\" title=\"We added an attention check to the very end of the survey. Specifically, we asked participants, &ldquo;Which product category were you NOT asked about in this survey?&rdquo; The response options included the five categories that were presented plus one &ndash; Jam &ndash; that wasn&rsquo;t. Echoing our findings from Data Replicada #7, we found that the CloudResearch Approved Participants (87.1%) were more likely to pass the attention check than were those in Replication 1 (78.5%). It is often dangerous to remove participants based on an attention check that comes after the key manipulations and measures, but in case you are curious: Removing participants who failed the check made the results of Replication 1 slightly weaker (b = .25, p = .198) and the results of Replication 2 slightly stronger (b = .28, p = .038).\">8<\/a>]). But it did not increase the size of the authors\u2019 interaction effect. Because of this, we think it is unlikely that our smaller effect size is driven by our somehow recruiting lower quality MTurkers.<\/p>\n<p style=\"text-align: justify;\"><strong><u>Conclusion<\/u><\/strong><\/p>\n<p style=\"text-align: justify;\">In sum, we think the totality of the evidence suggests that the left-digit bias is larger when the products\/prices are presented on the same screen than when they are presented sequentially. At the same time, our evidence suggests that the effect is much smaller than the original authors reported, and that thousands of participants may be required to reliably produce it.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3417 aligncenter\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2018\/12\/Narrow-colada-logo.png\" alt=\"\" width=\"60\" height=\"93\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2018\/12\/Narrow-colada-logo.png 242w, https:\/\/datacolada.org\/wp-content\/uploads\/2018\/12\/Narrow-colada-logo-194x300.png 194w\" sizes=\"auto, (max-width: 60px) 100vw, 60px\" \/><\/p>\n<hr \/>\n<p><span style=\"color: #0000ff;\"><strong>Author Feedback<\/strong><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: georgia, palatino, serif; font-size: 12pt; color: #0000ff;\">When we reached out to the authors for comment, they had this to say:<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: georgia, palatino, serif; font-size: 12pt; color: #0000ff;\">\"We are very happy to learn that you replicated our results. We are not surprised that the general pattern reported in the paper replicates: prior to and during the review process we internally replicated all of the reported studies. In addition, following your earlier email, we successfully replicated study 1 in August 2020. In that study (N = 404), we observed a significant left-digit bias in the \"Side-By-Side\" condition (M<sub>3.00\u00a0<\/sub>= 3.38 vs. M<sub>2.99\u00a0<\/sub>= 2.85; b=-.53, SE=.19, t=-2.79, p=.006) and no left-digit bias in the \"One At A Time\" condition (M<sub>3.00<\/sub>=3.18 vs. M<sub>2.99<\/sub>=3.18; b=.00, SE=.20, t=0.02, p=.981). The critical interaction test produced b=.53, SE=.27, t=1.96, p=.051. Our data set is posted on the <a href=\"https:\/\/osf.io\/5zbgw\/\"><span style=\"color: #0000ff;\">Open Science Framework<\/span><\/a>.\"<\/span><\/p>\n<p><strong>Footnotes.<\/strong><\/p>\n<p><strong>\u00a0<\/strong><\/p>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_5716\" class=\"footnote\">In our Data Colada Seminar Series, Devin Pope recently presented very compelling evidence for the left digit bias among Lyft riders. You can watch that talk here: <a href=\"https:\/\/www.youtube.com\/watch?v=9uUPd313vYk\">https:\/\/www.youtube.com\/watch?v=9uUPd313vYk<\/a>. [<a href=\"#identifier_0_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_5716\" class=\"footnote\">Study 3 had a larger effect but was much more complicated. [<a href=\"#identifier_1_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_2_5716\" class=\"footnote\">Specifically, in the first replication, we used MTurkers with at least a 98% approval rating and at least 1,000 HITs completed. In the second replication, we used only \u201cCloudResearch Approved Participants\u201d with at least a 98% approval rating and at least 1,000 HITs completed. [<a href=\"#identifier_2_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_3_5716\" class=\"footnote\">For quality control, we pre-registered to exclude all observations associated with duplicate MTurk IDs or IP Addresses, and to exclude those whose actual MTurk IDs were different than their reported IDs. [<a href=\"#identifier_3_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_4_5716\" class=\"footnote\">We decided to increase the sample size in Replication 2 after observing marginally significant results in Replication 1. We went so big on both sample sizes because we know from experience (and math) that you often need very large samples to detect attenuated interactions (see <a href=\"https:\/\/datacolada.org\/17\">Data Colada[17]<\/a>. [<a href=\"#identifier_4_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_5_5716\" class=\"footnote\">Actually, in the \u201cOne At A Time\u201d condition, the products and measures were presented across five screens rather than three. The first screen showed the premium brand and its price. The second screen displayed an asterisk for one second. The third screen showed the store brand and its price. The fourth screen displayed another asterisk for one second. And then the fifth screen presented the dependent measure. According to the authors, \u201cThe asterisk was used to clear participants\u2019 visuospatial sketchpads and make it more difficult for them to retain precise perceptual representations in memory (Baddeley and Hitch 1974).\u201d [<a href=\"#identifier_5_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_6_5716\" class=\"footnote\">You really should let all of this sink in. First, to study things like this you need really gigantic samples. Second, the difference in your required sample size between a true b = .31 (Replication 1) and a true b = .21 (Replication 2) is 1,500 participants! In our field, we tend to be entirely indifferent to an effect size difference of that magnitude. But that seemingly meaningless difference can dramatically affect how expensive (or possible) it is to study something. [<a href=\"#identifier_6_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_7_5716\" class=\"footnote\">We added an attention check to the very end of the survey. Specifically, we asked participants, \u201cWhich product category were you NOT asked about in this survey?\u201d The response options included the five categories that were presented plus one \u2013 Jam \u2013 that wasn\u2019t. Echoing our findings from Data Replicada #7, we found that the CloudResearch Approved Participants (87.1%) were more likely to pass the attention check than were those in Replication 1 (78.5%). It is often dangerous to remove participants based on an attention check that comes after the key manipulations and measures, but in case you are curious: Removing participants who failed the check made the results of Replication 1 slightly weaker (b = .25, p = .198) and the results of Replication 2 slightly stronger (b = .28, p = .038). [<a href=\"#identifier_7_5716\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>In the eighth installment of Data Replicada, we report our attempt to replicate a recently published Journal of Marketing Research (JMR) article entitled, \u201cThe Left-Digit Bias: When and Why Are Consumers Penny Wise and Pound Foolish?\u201d (.htm). In this paper, the authors offer insight into a previously documented observation known as the left-digit bias, whereby&#8230;<\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[81,4,18],"tags":[],"class_list":["post-5716","post","type-post","status-publish","format-standard","hentry","category-data-replicada","category-paper","category-replication"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/5716","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=5716"}],"version-history":[{"count":4,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/5716\/revisions"}],"predecessor-version":[{"id":5755,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/5716\/revisions\/5755"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=5716"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=5716"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=5716"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}