Add AI engine for spam by alecslupu · Pull Request #11319 · decidim/decidim

alecslupu · 2023-07-22T08:36:45Z

🎩 What? Why?

This PR adds the spam detection mechanism, created in a stand alone bundle that can be installed also in older decidim installations. Please refer to decidim-tools-ai/Readme.md for configuration details.

📌 Related Issues

Link your PR to an issue

Requires Add Events support for Command #11064
Related to Use content classification systems for better SPAM detection #10038
Related to Use content classification systems for better SPAM detection #10151
Fixes Use content classification systems for better SPAM detection #10038

Testing

Follow the installation instructions in the readme file
Index the data
Create some content and check to see if it get's marked as spam

📷 Screenshots

Please add screenshots of the changes you're proposing

…are-analyzer-events

Add Gitlab action workflow Patch the generator Running linters Gemfiles

Add language service Normalize gems

* Add BayesStrategy * Add Bayes Analyzer * Refactor strategy intialization process

…m-detection

…are-analyzer-events

…tion

* Add BayesStrategy * Add Bayes Analyzer * Refactor strategy intialization process

…dim into ale-add-spam-detection

…m-detection

decidim-generators/lib/decidim/generators.rb

decidim-generators/lib/decidim/generators/app_generator.rb

decidim-generators/spec/lib/generators_spec.rb

* Add event handlers and spec data * Fixing failng specs * Fix Catgeory error in untrain * fix decidim ai tests

* Add Strategy module * Add more namespaces

* Add resources to be analyzed * mend

ahukkanen · 2024-10-14T06:35:49Z

@andreslucena No need on my end. Last time the technical implementation seemed fine.

Only thing to add is that as I have written in some of the previous reviews, I would not feed the classifier constantly with the moderated data as "spam". For example, in the instances we manage, spammers are using actual other people's comments and injecting their spam in those. When the classifier gets fed this data as "spam", it will eventually start to classify genuine content as SPAM with higher probability.

I have written more details in the broader context of solving the SPAM issue at #10038. For content classification engine, this implementation is fine (but would require more pre-training data to actually work well).

andreslucena · 2024-10-15T11:55:10Z

I have written more details in the broader context of solving the SPAM issue at #10038. For content classification engine, this implementation is fine (but would require more pre-training data to actually work well).

Wow, I actually missed those comments. Great job with researching this topic!

So,for mitigating the problems that Antti found, and to merge this without future issues, we'd need to:

Remove the current datasets, as we don't have the good numbers of contents to this actually be useful
To remove the feature of training with the new moderations, as the spammers are using "good" content, so we will start having lots of false positives

Can you do that @alecslupu? Thanks

Then, after this PR is (finally) merge, we can focus on the other aspects of the "Steps forward" explanation from #10038 (comment)

…m-detection

alecslupu · 2024-10-15T15:36:11Z

To remove the feature of training with the new moderations, as the spammers are using "good" content, so we will start having lots of false positives

I am not sure what you mean by this.

alecslupu · 2024-10-15T15:39:38Z

Remove the current datasets, as we don't have the good numbers of contents to this actually be useful

Done in f3b8f48

andreslucena · 2024-10-16T06:57:48Z

To remove the feature of training with the new moderations, as the spammers are using "good" content, so we will start having lots of false positives

I am not sure what you mean by this.

I forgot how this actually works to be honest xD

This is a manual task that need to be done by the admin. According to what Antti is mentioning in its comment, spammers are reusing actual contents to "confuse" these kinds of filters. We can leave it in for now, but once we have the new datasets for moderations we can review this approach (i.e. remove this task if it isn't necessary and could be counterproductive)

docs/modules/develop/pages/ai_tools.adoc

Co-authored-by: Andrés Pereira de Lucena <andreslucena@users.noreply.github.com>

…dim into ale-add-spam-detection

andreslucena

👏🏽 👏🏽

Thanks for your patience on this long standing PR!

ahukkanen · 2024-10-16T10:47:13Z

I forgot how this actually works to be honest xD

This is a manual task that need to be done by the admin.

@andreslucena To me it doesn't seem manual:

decidim/decidim-ai/lib/decidim/ai/engine.rb

Lines 32 to 43 in 39fbb94

    
           Decidim::EventsManager.subscribe("decidim.update_account:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::UserSpamAnalyzerJob.perform_later(data[:resource]) 
        
           end 
        
           Decidim::EventsManager.subscribe("decidim.update_user_group:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::UserSpamAnalyzerJob.perform_later(data[:resource]) 
        
           end 
        
           Decidim::EventsManager.subscribe("decidim.create_user_group:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::UserSpamAnalyzerJob.perform_later(data[:resource]) 
        
           end 
        
           Decidim::EventsManager.subscribe("decidim.admin.block_user:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::TrainUserDataJob.perform_later(data[:resource]) 
        
           end

decidim/decidim-ai/lib/decidim/ai/engine.rb

Lines 24 to 26 in 39fbb94

    
           Decidim::EventsManager.subscribe("decidim.admin.hide_resource:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::TrainHiddenResourceDataJob.perform_later(data[:resource]) 
        
           end

The classifier is always fed the moderated content automatically when it is hidden from the website.

andreslucena · 2024-10-16T10:57:03Z

@andreslucena To me it doesn't seem manual:

You're right!

@alecslupu can you remove what Antti is mentioning, please?

alecslupu · 2024-10-16T11:11:51Z

@ahukkanen the part you initially mentioning:

decidim/decidim-ai/lib/decidim/ai/engine.rb

Lines 32 to 43 in 39fbb94

    
           Decidim::EventsManager.subscribe("decidim.update_account:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::UserSpamAnalyzerJob.perform_later(data[:resource]) 
        
           end 
        
           Decidim::EventsManager.subscribe("decidim.update_user_group:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::UserSpamAnalyzerJob.perform_later(data[:resource]) 
        
           end 
        
           Decidim::EventsManager.subscribe("decidim.create_user_group:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::UserSpamAnalyzerJob.perform_later(data[:resource]) 
        
           end 
        
           Decidim::EventsManager.subscribe("decidim.admin.block_user:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::TrainUserDataJob.perform_later(data[:resource]) 
        
           end

Is actually the part that submits the data to be analyzed. It does not do the training of spam / ham.

The following is the one that automatically adds the data hidden by admin to spam base.

decidim/decidim-ai/lib/decidim/ai/engine.rb

Lines 24 to 26 in 39fbb94

    
           Decidim::EventsManager.subscribe("decidim.admin.hide_resource:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::TrainHiddenResourceDataJob.perform_later(data[:resource]) 
        
           end

alecslupu · 2024-10-16T11:12:25Z

I will remove only this bit.

decidim/decidim-ai/lib/decidim/ai/engine.rb

Lines 24 to 26 in 39fbb94

    
           Decidim::EventsManager.subscribe("decidim.admin.hide_resource:after") do |_event_name, data| 
        
             Decidim::Ai::SpamDetection::TrainHiddenResourceDataJob.perform_later(data[:resource]) 
        
           end

ahukkanen · 2024-10-16T11:45:35Z

Is actually the part that submits the data to be analyzed. It does not do the training of spam / ham.

Partially correct but at the very bottom of the snippet there is logic that trains the user profile classifier with the blocked user's data.

alecslupu · 2024-10-16T12:53:51Z

Is actually the part that submits the data to be analyzed. It does not do the training of spam / ham.

Partially correct but at the very bottom of the snippet there is logic that trains the user profile classifier with the blocked user's data.

tackled here: #13550

alecslupu added 26 commits July 13, 2023 19:00

Add events for comments

64532b9

Add events for debates

98798ea

Add events for meetings

c6a07b0

Update the proposals commands

ab8bcbd

Refactor with_events

0d11ecc

Apply review recommendations

cf9a7a0

Merge branch 'develop' of github.com:decidim/decidim into fature/prep…

2df8d59

…are-analyzer-events

Create decidim-ai module

d204dc7

Add Gitlab action workflow Patch the generator Running linters Gemfiles

change description

99e4c98

Add language detection-service

94952c1

Add language service Normalize gems

Add registry strategy (#253)

6533af8

Add SpamDetectionService class (#255)

d2f3823

Add BayesStrategy (#256)

6daeb01

* Add BayesStrategy * Add Bayes Analyzer * Refactor strategy intialization process

Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…

b27b5a8

…m-detection

Merge branch 'develop' of github.com:decidim/decidim into fature/prep…

8d5ae48

…are-analyzer-events

Merge branch 'fature/prepare-analyzer-events' into ale-add-spam-detec…

692a2fa

…tion

Change the pipeline working dir

9d57036

Fixing spam suite

c6a772a

Revert event changes

e27dc50

Revert event changes

6dc2c88

Merge branch 'fature/prepare-analyzer-events' into ale-add-spam-detec…

7dd8754

…tion

Add BayesStrategy (#256) (#257)

d587f3c

* Add BayesStrategy * Add Bayes Analyzer * Refactor strategy intialization process

Merge branch 'ale-add-spam-detection' of github.com:tremend-cofe/deci…

367ca39

…dim into ale-add-spam-detection

Running linters

8f51cc7

Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…

daec039

…m-detection

Add score calculation (#262)

f6a1dca

alecslupu commented Jul 22, 2023

View reviewed changes

alecslupu added 3 commits July 23, 2023 05:07

Add event handlers and spec data (#263)

6583933

* Add event handlers and spec data * Fixing failng specs * Fix Catgeory error in untrain * fix decidim ai tests

Refactor AI namespaces (#269)

4a1cc71

* Add Strategy module * Add more namespaces

Add resources to be indexed (#254)

215aa57

* Add resources to be analyzed * mend

Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…

ae5bf85

…m-detection

Remove module dataset training

f3b8f48

github-actions bot previously approved these changes Oct 15, 2024

View reviewed changes

Merge branch 'develop' into ale-add-spam-detection

bd5ba91

github-actions bot previously approved these changes Oct 16, 2024

View reviewed changes

andreslucena reviewed Oct 16, 2024

View reviewed changes

docs/modules/develop/pages/ai_tools.adoc Outdated Show resolved Hide resolved

alecslupu and others added 2 commits October 16, 2024 10:13

Update docs/modules/develop/pages/ai_tools.adoc

c879da3

Co-authored-by: Andrés Pereira de Lucena <andreslucena@users.noreply.github.com>

Remove doc bits

b2dc2da

github-actions bot previously approved these changes Oct 16, 2024

View reviewed changes

Merge branch 'ale-add-spam-detection' of github.com:tremend-cofe/deci…

b4f8a79

…dim into ale-add-spam-detection

github-actions bot previously approved these changes Oct 16, 2024

View reviewed changes

Apply suggestions from code review

fecd22c

github-actions bot approved these changes Oct 16, 2024

View reviewed changes

andreslucena approved these changes Oct 16, 2024

View reviewed changes

This was referenced Oct 16, 2024

Remove real-time training in AI engine for spam #13550

Merged

Hide comments when parent resource is hidden #13554

Merged

This was referenced Mar 14, 2025

Replies of the moderated comments are counted even if are hidden on both general search and resource page #13087

Closed

Fix hide comment from interface #14301

Merged

Add delay of spam_analysis queue #14304

Merged

Uh oh!

Conversation

alecslupu commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎩 What? Why?

📌 Related Issues

Testing

📷 Screenshots

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahukkanen commented Oct 14, 2024

Uh oh!

andreslucena commented Oct 15, 2024

Uh oh!

alecslupu commented Oct 15, 2024

Uh oh!

alecslupu commented Oct 15, 2024

Uh oh!

andreslucena commented Oct 16, 2024

Uh oh!

Uh oh!

andreslucena left a comment

Choose a reason for hiding this comment

Uh oh!

ahukkanen commented Oct 16, 2024

Uh oh!

andreslucena commented Oct 16, 2024

Uh oh!

alecslupu commented Oct 16, 2024

Uh oh!

alecslupu commented Oct 16, 2024

Uh oh!

ahukkanen commented Oct 16, 2024

Uh oh!

alecslupu commented Oct 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

alecslupu commented Jul 22, 2023 •

edited

Loading