Yet again I see a post that has 0 down vote favorite as the first line in the question. This happened because the OP just copied and pasted a question that they already had and they caught the vote section when they did. Can we either stop submission of these or at least raise an auto flag? I have tried and I cannot think of a valid question or answer that would be harmed if they could not start with N down vote favorite.
-
66The same would apply to answers. The number of plagiarised posts I've deleted that included that text at the start..Martijn Pieters– Martijn Pieters Mod2017-01-31 19:26:56 +00:00Commented Jan 31, 2017 at 19:26
-
1@MartijnPieters Updated the title to include answers. Don't want to forget about thoseNathanOliver– NathanOliver2017-01-31 19:28:14 +00:00Commented Jan 31, 2017 at 19:28
-
46If you'd block posting posts that start with that, users would just remove it to post their posts, which would make it harder to detect it as a copy. So auto-flag would be a better option.Floern– Floern2017-01-31 19:33:53 +00:00Commented Jan 31, 2017 at 19:33
-
671 results ... are you sure you want to spend dev time on that?rene– rene2017-01-31 19:35:14 +00:00Commented Jan 31, 2017 at 19:35
-
1@rene: more false positives than true matches.Martijn Pieters– Martijn Pieters Mod2017-01-31 19:38:21 +00:00Commented Jan 31, 2017 at 19:38
-
19@rene But that doesn't count the number of those that have be deleted. My gut says the majority have been.NathanOliver– NathanOliver2017-01-31 19:39:43 +00:00Commented Jan 31, 2017 at 19:39
-
12You don't dare to play the deleted posts card. You know I have no defense against that!rene– rene2017-01-31 19:40:53 +00:00Commented Jan 31, 2017 at 19:40
-
8@NathanOliver - I see 430 matches for deleted posts containing "down vote favorite".Brad Larson– Brad Larson Mod2017-01-31 19:42:38 +00:00Commented Jan 31, 2017 at 19:42
-
1@br Thanks for the info. I guess it is not as big of an issue as I thought.NathanOliver– NathanOliver2017-01-31 19:43:58 +00:00Commented Jan 31, 2017 at 19:43
-
11430 deleted posts with the exact string "down vote favorite", many of which are answers despite being copied from questions. Go figure.BoltClock– BoltClock Mod2017-02-01 04:02:54 +00:00Commented Feb 1, 2017 at 4:02
-
9Wouldn't it make more sense to automatically detect and raise a flag for large portions of copy-pasted content? It isn't the "N down vote favorite" that's the problem, it's the fact that the rest of the post is a copy-paste. A >95% match should raise an auto-flag.Cody Gray– Cody Gray Mod2017-02-01 04:39:33 +00:00Commented Feb 1, 2017 at 4:39
-
1@CodyGray I remember it has been proposed to prevent plagiarization on tag wikis, but it had been shot down due false positives and expensive (not sure if those claims are valid anymore)Braiam– Braiam2017-02-01 14:55:57 +00:00Commented Feb 1, 2017 at 14:55
-
4Perhaps this may be a nice feature for one of the chatroom bots?DavidG– DavidG2017-02-02 10:39:56 +00:00Commented Feb 2, 2017 at 10:39
-
8I can't bring myself to upvote, because you didn't start this with "0 down vote favorite".philipxy– philipxy2017-02-02 11:25:35 +00:00Commented Feb 2, 2017 at 11:25
-
58.1 million results on google "0 down vote favorite"Andrew– Andrew2017-02-02 17:03:29 +00:00Commented Feb 2, 2017 at 17:03
2 Answers
We could blacklist strings (both in titles and bodies) that match the regex:
down vote favorite
That would prevent this sort of copy-and-paste error. But it would not prevent people from noticing the problem and removing (or modifying) the offending string. Since the string seems a really strong signal of plagiarism (and carelessness), it's counterproductive to block it unless the system is overwhelmed with these contributions. As rene demonstrated that's not the case. The handful of extant posts were gone before started looking at the question.
So, yes, this is an incredibly annoying behavior. But it's also relatively rare.
That said, Shog9 added the blacklist while I was answering. So that's fine too. ;-)
-
'Body cannot contain ""'?user4639281– user46392812017-02-03 01:21:56 +00:00Commented Feb 3, 2017 at 1:21
-
6Not wild about telling people which bits of text to remove here, @Tiny. As rene notes, this ain't gonna hit very many posts, so any value would be in discouraging lazy posters - and this seems sufficiently discouraging.2017-02-03 01:23:02 +00:00Commented Feb 3, 2017 at 1:23
-
Ahhhhhhh, I seeuser4639281– user46392812017-02-03 01:24:24 +00:00Commented Feb 3, 2017 at 1:24
-
1@TinyGiant: Shog's regex skills are superior to mine. The best I could do was quote the entire body which doesn't leave much room for a useful warning. (And might give away the game a bit sooner.)2017-02-03 01:26:22 +00:00Commented Feb 3, 2017 at 1:26
-
I suppose the "Body cannot contain" part is hard-coded, @Shog9? Otherwise, the message would be clearer if you just omitted it altogether.2017-02-03 07:54:13 +00:00Commented Feb 3, 2017 at 7:54
-
Yeah, that bit is mandatory for blacklisted terms in the body, @Cody.2017-02-03 17:20:38 +00:00Commented Feb 3, 2017 at 17:20
-
How did this get through? ([Metasmoke report for those with <10k)(metasmoke.erwaysoftware.com/post/56595))NobodyNada– NobodyNada2017-02-08 21:23:51 +00:00Commented Feb 8, 2017 at 21:23
-
@NobodyNada: The regex is anchored to the start of the post. (That answer started with a
[to make the link.)2017-02-08 21:44:39 +00:00Commented Feb 8, 2017 at 21:44
I'm all for quality.
Given the low number of occurrences of such posts:
- current visible (
81 ) - first revisions (253) 1
- deleted (430)
I don't think it is helping an awful lot.
I wouldn't spend developer time right now and here implementing this.
1. IKR that this doesn't include deleted post
-
Your "first revision" SQL has
SELECT TOP 10at the top. Any link with 10 rows in the resultset?Mat– Mat2017-01-31 20:26:51 +00:00Commented Jan 31, 2017 at 20:26 -
I left that intentionally there @Mat. You earned the waffles...rene– rene2017-01-31 20:29:29 +00:00Commented Jan 31, 2017 at 20:29
-
If there actually are only 10, then the deleted data is necessary - I've seen more than that just myself, and the one I saw pretty much all needed serious work (to say the least).Mat– Mat2017-01-31 20:32:56 +00:00Commented Jan 31, 2017 at 20:32
-
Yeah, I'm trying to get past the 2 minute timeout on SEDE, hold on ...rene– rene2017-01-31 20:33:42 +00:00Commented Jan 31, 2017 at 20:33
-
-
9Much less than I expected. But I guess that's science... thanks for the analysis.Mat– Mat2017-01-31 21:02:40 +00:00Commented Jan 31, 2017 at 21:02
-
2The visible ones are down to 1 closed question. Looks like the meta effect.RamenChef– RamenChef2017-02-02 16:42:18 +00:00Commented Feb 2, 2017 at 16:42
-
Maybe there could be a bot to grab these, google for the duplicates, and post CVs to the SOCVR room?John Dvorak– John Dvorak2017-02-02 17:27:46 +00:00Commented Feb 2, 2017 at 17:27
-
3@JanDvorak I just learned Gutenberg over in SO Botics is doing right that?rene– rene2017-02-02 17:34:05 +00:00Commented Feb 2, 2017 at 17:34
-
Oh, nice. I guess I should hang out more in there.John Dvorak– John Dvorak2017-02-02 17:34:37 +00:00Commented Feb 2, 2017 at 17:34
-
2I have a 27" display in front of my eyes. I have problems in reading your last part. Are you sure you need that much
<sub>? :DChristian Gollhardt– Christian Gollhardt2017-02-02 21:59:02 +00:00Commented Feb 2, 2017 at 21:59 -
@Christian No, needs moar
<sub>user4639281– user46392812017-02-03 01:22:42 +00:00Commented Feb 3, 2017 at 1:22 -
2The expression I blacklisted is a bit more fuzzy than the search, so would've hit a few hundred more attempts (including attempts at copying answers, which don't include "favorite"). False-positive rate still crushingly low, so what the hell might as well.2017-02-03 01:46:25 +00:00Commented Feb 3, 2017 at 1:46
-
If you're unhappy with your 27" display, I'll be happy to take it off your hands, @Christian. :-)2017-02-03 07:55:36 +00:00Commented Feb 3, 2017 at 7:55
