Add AI engine for spam#11319
Conversation
…are-analyzer-events
Add Gitlab action workflow Patch the generator Running linters Gemfiles
Add language service Normalize gems
* Add BayesStrategy * Add Bayes Analyzer * Refactor strategy intialization process
…are-analyzer-events
…dim into ale-add-spam-detection
* Add event handlers and spec data * Fixing failng specs * Fix Catgeory error in untrain * fix decidim ai tests
* Add Strategy module * Add more namespaces
* Add resources to be analyzed * mend
|
@andreslucena No need on my end. Last time the technical implementation seemed fine. Only thing to add is that as I have written in some of the previous reviews, I would not feed the classifier constantly with the moderated data as "spam". For example, in the instances we manage, spammers are using actual other people's comments and injecting their spam in those. When the classifier gets fed this data as "spam", it will eventually start to classify genuine content as SPAM with higher probability. I have written more details in the broader context of solving the SPAM issue at #10038. For content classification engine, this implementation is fine (but would require more pre-training data to actually work well). |
Wow, I actually missed those comments. Great job with researching this topic! So,for mitigating the problems that Antti found, and to merge this without future issues, we'd need to:
Can you do that @alecslupu? Thanks Then, after this PR is (finally) merge, we can focus on the other aspects of the "Steps forward" explanation from #10038 (comment) |
I am not sure what you mean by this. |
Done in f3b8f48 |
I forgot how this actually works to be honest xD This is a manual task that need to be done by the admin. According to what Antti is mentioning in its comment, spammers are reusing actual contents to "confuse" these kinds of filters. We can leave it in for now, but once we have the new datasets for moderations we can review this approach (i.e. remove this task if it isn't necessary and could be counterproductive) |
Co-authored-by: Andrés Pereira de Lucena <andreslucena@users.noreply.github.com>
…dim into ale-add-spam-detection
andreslucena
left a comment
There was a problem hiding this comment.
👏🏽 👏🏽
Thanks for your patience on this long standing PR!
@andreslucena To me it doesn't seem manual: decidim/decidim-ai/lib/decidim/ai/engine.rb Lines 32 to 43 in 39fbb94 decidim/decidim-ai/lib/decidim/ai/engine.rb Lines 24 to 26 in 39fbb94 The classifier is always fed the moderated content automatically when it is hidden from the website. |
You're right! @alecslupu can you remove what Antti is mentioning, please? |
|
@ahukkanen the part you initially mentioning: decidim/decidim-ai/lib/decidim/ai/engine.rb Lines 32 to 43 in 39fbb94 Is actually the part that submits the data to be analyzed. It does not do the training of spam / ham. The following is the one that automatically adds the data hidden by admin to spam base. decidim/decidim-ai/lib/decidim/ai/engine.rb Lines 24 to 26 in 39fbb94 |
|
I will remove only this bit. decidim/decidim-ai/lib/decidim/ai/engine.rb Lines 24 to 26 in 39fbb94 |
Partially correct but at the very bottom of the snippet there is logic that trains the user profile classifier with the blocked user's data. |
tackled here: #13550 |
🎩 What? Why?
This PR adds the spam detection mechanism, created in a stand alone bundle that can be installed also in older decidim installations. Please refer to decidim-tools-ai/Readme.md for configuration details.
📌 Related Issues
Link your PR to an issue
Testing
📷 Screenshots
Please add screenshots of the changes you're proposing
