{"id":1341,"date":"2017-01-03T07:03:46","date_gmt":"2017-01-03T12:03:46","guid":{"rendered":"http:\/\/datacolada.org\/?p=1341"},"modified":"2021-12-17T01:06:09","modified_gmt":"2021-12-17T06:06:09","slug":"56-twarking-test-weighting-results-known","status":"publish","type":"post","link":"https:\/\/datacolada.org\/56","title":{"rendered":"[56] TWARKing: Test-Weighting After Results are Known"},"content":{"rendered":"<p>On the last class of the semester I hold a \u201ctown-hall\u201d meeting; an open discussion about how to improve the course (content, delivery, grading, etc). I follow-up with a required online poll to \u201cvote\u201d on proposed changes [<a href=\"#footnote_0_1341\" id=\"identifier_0_1341\" class=\"footnote-link footnote-identifier-link\" title=\"Like Brexit, the poll in OID290 is not binding\">1<\/a>].<br \/>\n<img decoding=\"async\" class=\"alignnone size-full wp-image-1342\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1.png\" alt=\"f1\" width=\"0\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1.png 780w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1-300x182.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1-768x466.png 768w\" sizes=\"(max-width: 780px) 100vw, 780px\" \/><br \/>\nGrading in my class is old-school. Two tests, each 40%, homeworks 20% (graded mostly on a completion 1\/0 scale). The downside of this model is that those who do poorly early on, get demotivated. Also, a bit of bad luck in a test hurts a lot. During the latest town-hall the idea of having multiple quizzes and dropping the worst was popular. One problem with <em>this<\/em> model is that students can blow off a quiz entirely. After the town-hall I thought of <em><u>why<\/u><\/em> students loved the drop-1 idea and whether I could capture the same psychological benefit with a smaller pedagogical loss.<\/p>\n<p>I came up with TWARKing: assigning test weights after results are known [<a href=\"#footnote_1_1341\" id=\"identifier_1_1341\" class=\"footnote-link footnote-identifier-link\" title=\"Obviously the name is inspired by &lsquo;HARKing&rsquo;: hypothesizing after results are known.&nbsp; The similarity to Twerking, in contrast, is unintentional, and, given the sincerity of the topic, probably unfortunate.\">2<\/a>]. With TWARKing, instead of each test counting 40% for every student, whichever test an individual student did better on, gets more weight; so Julie does better in Test 1 than Test 2, then Julie\u2019s test 1 gets 45% and test 2 35%, but Jason did better in Test 2, so Jason\u2019s test 2 gets 45%. [<a href=\"#footnote_2_1341\" id=\"identifier_2_1341\" class=\"footnote-link footnote-identifier-link\" title=\"I presume someone already does this , not claiming novelty\">3<\/a>]. Dropping a quiz becomes a special case of TWARKing: worst gets 0% weight.<\/p>\n<p><strong>It polls well<\/strong><br \/>\nI expected TWARKing to do well in the online poll but was worried students would fall prey to competition-neglect, so I wrote a long question stacking the deck against TWARKing:<br \/>\n<a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/Question.png\"><img decoding=\"async\" class=\"alignnone wp-image-1346 size-full\" style=\"border: 1px solid #000000;\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/Question.png\" alt=\"question\" width=\"500\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/Question.png 657w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/Question-300x142.png 300w\" sizes=\"(max-width: 657px) 100vw, 657px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-1342\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1.png\" alt=\"f1\" width=\"500\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1.png 780w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1-300x182.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F1-768x466.png 768w\" sizes=\"(max-width: 780px) 100vw, 780px\" \/><\/a><\/p>\n<p>70% of student were in favor, only 15% against (N=92, only 3 students did not complete the poll).<\/p>\n<p>The poll is not anonymous, so I looked at how TWARKing attitudes are correlated with actual performance.<\/p>\n<p><a href=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F2-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1344\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F2-1.png\" alt=\"f2\" width=\"1456\" height=\"409\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F2-1.png 1456w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F2-1-300x84.png 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F2-1-768x216.png 768w, https:\/\/datacolada.org\/wp-content\/uploads\/2017\/01\/F2-1-1024x288.png 1024w\" sizes=\"auto, (max-width: 1456px) 100vw, 1456px\" \/><\/a><\/p>\n<p>Panel A shows that students doing better like TWARking less, but the effect is not as strong as I would have expected. Students liking it 5\/5 perform in the bottom 40%, those liking 2\/5 are in the top 40%.<\/p>\n<p>Panel B shows that students with more uneven performance do like the TWARKing more, but the effect is small and unimpressive (Spearman's r=.21:, <em>p<\/em>=.044).<\/p>\n<p>For Panel C I recomputed the final grades had TWARKing been implemented for this semester and saw how the change in ranking correlated with support of TWARKing. It did not. Maybe it was asking too much for this to work as students did not yet know their Test 2 scores.<\/p>\n<p>My read is that students cannot anticipate if it will help vs. hurt them, and they generally like it all the same.<\/p>\n<p><strong>TWARKing could be pedagogically superior.<br \/>\n<\/strong>Tests serve two main roles: motivating students and measuring performance. I think TWARKing could be better on both fronts.<\/p>\n<p><em><strong>Better measurement<\/strong>. <\/em><em>M<\/em>y tests tend to include insight-type questions: students either nail them or fail them. It is hard to get lucky in my tests, I think, hard to get a high score despite not knowing the material. But, easy, unfortunately, to get unlucky; to get no points on a topic you had a decent understanding of [<a href=\"#footnote_3_1341\" id=\"identifier_3_1341\" class=\"footnote-link footnote-identifier-link\" title=\"Students can still get lucky if I happen to ask on a topic they prepared better for.\">4<\/a>].\u00a0 Giving more weight to the highest test is hence giving more weight to the more accurate of the two tests. \u00a0So it could <em><u>improve<\/u><\/em> the overall validity of the grade.\u00a0 A student who gets a 90 and a 70 is, I presume, better than one getting 80 in both tests.<\/p>\n<p>This reminded me of what Shugan &amp; Mitra (2009 .<a href=\"http:\/\/doi.org\/10.1287\/mnsc.1080.0907\">htm<\/a>) label the \u201cAnna Karenina effect\u201d in their under-appreciated paper (11 Google cites). <em>Their<\/em> Anna Karenina effect (there are a few; each different from the other), occurs when less favorable outcomes carry less information than more favorable ones; for those situations, measures other than the average, e.g., the max, performs better for out-of-sample prediction. [<a href=\"#footnote_4_1341\" id=\"identifier_4_1341\" class=\"footnote-link footnote-identifier-link\" title=\"They provide calibrations with real data in sports, academia and movie ratings. Check the paper out.\">5<\/a>]\n<p>To get an intuition for this Anna Karenina effect: think about what contains more information, a marathon runner\u2019s best vs worst running time? A researcher\u2019s most vs least cited paper?<\/p>\n<p>Note that one can TWARK within test, weighting the highest scored answer by each student more. I will.<\/p>\n<p><em><strong>Motivation<\/strong>. <\/em>After doing very poorly in a test it must be very motivating to feel that if you study hard you can make this bad performance count less. I speculate that with TWARKing underperforming students in Test 1 are less likely to be demotivated for Test 2 (I will test this next semester, but without random assignment\u2026). \u00a0TWARKing has the magical psychological property that the gains are very concrete, every single student gets a higher average with TWARKing than without, and they see that; the losses, in contrast, are abstract and unverifiable (you don\u2019t see the students who benefited more than you did, leading to a net-loss in ranking).<\/p>\n<p><strong>Bottom line<\/strong><br \/>\nStudents seem to really like TWARKing.<br \/>\nIt may make things better for measurement.<br \/>\nIt may improve motivation.<\/p>\n<p>A free happiness boost.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-376\" src=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg\" alt=\"Wide logo\" width=\"78\" height=\"38\" srcset=\"https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo-300x145.jpg 300w, https:\/\/datacolada.org\/wp-content\/uploads\/2014\/02\/Wide-logo.jpg 320w\" sizes=\"auto, (max-width: 78px) 100vw, 78px\" \/><\/p>\n<hr \/>\n<div class=\"jetpack_subscription_widget\"><h2 class=\"widgettitle\">Subscribe to Blog via Email<\/h2>\n\t\t\t<div class=\"wp-block-jetpack-subscriptions__container\">\n\t\t\t<form action=\"#\" method=\"post\" accept-charset=\"utf-8\" id=\"subscribe-blog-1\"\n\t\t\t\tdata-blog=\"58049591\"\n\t\t\t\tdata-post_access_level=\"everybody\" >\n\t\t\t\t\t\t\t\t\t<div id=\"subscribe-text\"><p>Enter your email address to subscribe to this blog and receive notifications of new posts by email.<\/p>\n<\/div>\n\t\t\t\t\t\t\t\t\t\t<p id=\"subscribe-email\">\n\t\t\t\t\t\t<label id=\"jetpack-subscribe-label\"\n\t\t\t\t\t\t\tclass=\"screen-reader-text\"\n\t\t\t\t\t\t\tfor=\"subscribe-field-1\">\n\t\t\t\t\t\t\tEmail Address\t\t\t\t\t\t<\/label>\n\t\t\t\t\t\t<input type=\"email\" name=\"email\" autocomplete=\"email\" required=\"required\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tvalue=\"\"\n\t\t\t\t\t\t\tid=\"subscribe-field-1\"\n\t\t\t\t\t\t\tplaceholder=\"Email Address\"\n\t\t\t\t\t\t\/>\n\t\t\t\t\t<\/p>\n\n\t\t\t\t\t<p id=\"subscribe-submit\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"action\" value=\"subscribe\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"source\" value=\"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1341\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"sub-type\" value=\"widget\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" name=\"redirect_fragment\" value=\"subscribe-blog-1\"\/>\n\t\t\t\t\t\t<input type=\"hidden\" id=\"_wpnonce\" name=\"_wpnonce\" value=\"865685d15f\" \/><input type=\"hidden\" name=\"_wp_http_referer\" value=\"\/wp-json\/wp\/v2\/posts\/1341\" \/>\t\t\t\t\t\t<button type=\"submit\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tclass=\"wp-block-button__link\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tstyle=\"margin: 0; margin-left: 0px;\"\n\t\t\t\t\t\t\t\t\t\t\t\t\t\tname=\"jetpack_subscriptions_widget\"\n\t\t\t\t\t\t>\n\t\t\t\t\t\t\tSubscribe\t\t\t\t\t\t<\/button>\n\t\t\t\t\t<\/p>\n\t\t\t\t\t\t\t<\/form>\n\t\t\t\t\t\t<\/div>\n\t\t\t\n<\/div>\n<strong>Footnotes.<\/strong><\/p>\n<ol class=\"footnotes\">\n<li id=\"footnote_0_1341\" class=\"footnote\">Like Brexit, the poll in OID290 is not binding [<a href=\"#identifier_0_1341\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_1_1341\" class=\"footnote\">Obviously the name is inspired by \u2018HARKing\u2019: hypothesizing after results are known.\u00a0 The similarity to <em>Twerking, <\/em>in contrast, is unintentional, and, given the sincerity of the topic, probably unfortunate. [<a href=\"#identifier_1_1341\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_2_1341\" class=\"footnote\">I presume someone already does this , not claiming novelty [<a href=\"#identifier_2_1341\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_3_1341\" class=\"footnote\">Students can still get lucky if I happen to ask on a topic they prepared better for. [<a href=\"#identifier_3_1341\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<li id=\"footnote_4_1341\" class=\"footnote\">They provide calibrations with real data in sports, academia and movie ratings. Check the paper out. [<a href=\"#identifier_4_1341\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>On the last class of the semester I hold a \u201ctown-hall\u201d meeting; an open discussion about how to improve the course (content, delivery, grading, etc). I follow-up with a required online poll to \u201cvote\u201d on proposed changes [1]. Grading in my class is old-school. Two tests, each 40%, homeworks 20% (graded mostly on a completion&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wp_rev_ctl_limit":""},"categories":[41,28],"tags":[],"class_list":["post-1341","post","type-post","status-publish","format-standard","hentry","category-just-fun","category-teaching"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1341","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/comments?post=1341"}],"version-history":[{"count":5,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1341\/revisions"}],"predecessor-version":[{"id":6518,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/posts\/1341\/revisions\/6518"}],"wp:attachment":[{"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/media?parent=1341"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/categories?post=1341"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datacolada.org\/wp-json\/wp\/v2\/tags?post=1341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}