Use content classification systems for better SPAM detection#10151
Use content classification systems for better SPAM detection#10151alecslupu wants to merge 14 commits intodecidim:fature/prepare-analyzer-eventsfrom
Conversation
19010ac to
9161cfa
Compare
abeaf79 to
6d272fe
Compare
There was a problem hiding this comment.
Great work so far. There are still some rough corners that you are maybe already aware of. I've added some comments and suggestions for cleaning the code and improving its functionality.
It also turned out that this approach works fairly well at least for comment spam with some extra "clean content" training using the non-moderated comments. There are some changes that we need to do in the analysis (i.e. assign weights) to make it perform better as the analyzer is not optimal right now. But I also ran the same analysis against the bayes classifier only to see some comparison.
I tested this against several datasets:
- From one of our instances
- 3254 spam comments
- 6223 blocked user accounts
- 615 clean comments (that should not be flagged)
- From a second instance 308 clean comments to test training the classifier with the first dataset of clean comments
- Metadecidim public user accounts
With these datasets I had the following results. With the analysis I used CLD to detect the language of the content and the classifier would give it a lower score in case the language is in Finnish.
First instance
Pre-training data shipped with the module
Using bayes classifier alone:
- 99.78% spam comments flagged as spam
- 99.29% blocked users flagged as spam (based on profile description)
- Error rate: 93.17% (clean comments flagged as spam)
Using Decidim::Tools::Ai::SpamContent::Classifier:
- 11.34% spam comments flagged as spam
- 15.54% blocked users flagged as spam (based on profile description)
- Error rate: 1.79% (clean comments flagged as spam)
So, high error rates so far but then I added additional training with the clean comments data.
Pre-training data shipped with the module + extra language specific training
Using bayes classifier alone:
- 98.46% spam comments flagged as spam
- 99.33% blocked users flagged as spam (based on profile description)
- Error rate: 1.79% (clean comments flagged as spam)
Using Decidim::Tools::Ai::SpamContent::Classifier:
- 11.34% spam comments flagged as spam
- 15.54% blocked users flagged as spam (based on profile description)
- Error rate: 0% (clean comments flagged as spam)
So better using the bayes classifier alone but understandable since I trained it with this exact dataset. Let's try it with another set of comments.
Second instance
With the second instance I only tested the error rate both with the pre-training data only and the extra training data from the first instance.
Pre-training data shipped with the module
Using bayes classifier alone:
- Error rate: 95.13% (clean comments flagged as spam)
Using Decidim::Tools::Ai::SpamContent::Classifier:
- Error rate: 3.57% (clean comments flagged as spam)
Pre-training data shipped with the module + extra language specific training (from the first instance)
Using bayes classifier alone:
- Error rate: 11.69% (clean comments flagged as spam)
Using Decidim::Tools::Ai::SpamContent::Classifier:
- Error rate: 2.27% (clean comments flagged as spam)
Metadecidim
I fetched the users from the API and scraped their profile descriptions using automation to get a sample set of analysis data.
In total, there were 11705 records to be analyzed that had some content in their profile description.
Bayes classifier alone flagged 99.25% of the users as spam accounts and Decidim::Tools::Ai::SpamContent::Classifier flagged 9.23%. Note that in this analysis, only those users were included that have something written to their about section within their profiles, so this left about 4742 users unanalyzed (i.e. would be considered clean).
This test I ran with the same pre-training data + clean data from the first instance but the Finnish training data is likely very irrelevant here.
I would not be surprised if about 10% of the user accounts would be spam accounts at Metadecidim, this would match with our findings from some other instances. But clearly for this case, the bayes classifier would need more pre-training in English, Catalan and Spanish to work better for this dataset. I did not have any such data available so I couldn't test how it would preform training it more in these languages.
decidim-comments/app/commands/decidim/comments/create_comment.rb
Outdated
Show resolved
Hide resolved
decidim-comments/app/commands/decidim/comments/update_comment.rb
Outdated
Show resolved
Hide resolved
decidim-tools-ai/spec/event_handlers/user/user_changes_profile_data_spec.rb
Outdated
Show resolved
Hide resolved
decidim-tools-ai/spec/event_handlers/user/user_changes_profile_data_spec.rb
Outdated
Show resolved
Hide resolved
decidim-tools-ai/lib/decidim/tools/ai/spam_content/classifier.rb
Outdated
Show resolved
Hide resolved
decidim-tools-ai/lib/decidim/tools/ai/resources/user_base_entity.rb
Outdated
Show resolved
Hide resolved
|
I did a bit more research about the Metadecidim users public data and it turns out that the above analysis is not actually far off. It seems around 99% of the users who had a profile description are actual spam accounts. Anyways, I did further analysis and feeded some more sample data to the bayes classifier. I added the publicly available SPAM email data which I also translated into Spanish and Catalan. As I was thinking there is something wrong with the classification, just based on the numbers. Then I made a datasheet of all the analyzed accounts with the contents which I looked at manually and it seems the classifier was doing its job mostly correctly. Most of the users were actually profile spammers. The numbers with the additional training data were that the bayes classifier flagged about 99.19% of the users as spammers and So even with this data the bayes classifier seems to work quite well. Note to make here is that it has also done mistakes with the data. Some users who I manually identified as real users have also been classified as spammers but this is likely to be a very small subset of the data. And we can improve on those, if we mark these users as "ham" for the further analysis round. We could likely find these users pretty easily by looking at the most active users in Metadecidim who haven't been spamming. I would expect most of these profile spammer accounts have no activity on the platform (or only few spam comments). |
|
@ahukkanen , the concept of having that Expanded would be : Using the |
9803056 to
0d11ecc
Compare
3420091 to
910cc68
Compare
Add Gitlab action workflow Patch the generator Running linters Gemfiles
910cc68 to
d204dc7
Compare
Add language service Normalize gems
* Add BayesStrategy * Add Bayes Analyzer * Refactor strategy intialization process
…dim into ale-add-spam-detection
🎩 What? Why?
This PR adds the spam detection mechanism, created in a stand alone bundle that can be installed also in older decidim installations. Please refer to decidim-tools-ai/Readme.md for configuration details.
📌 Related Issues
Link your PR to an issue
Testing
📷 Screenshots
Please add screenshots of the changes you're proposing
