Skip to content

Refactor cleanup, firehol, and mass_scanners cronjobs to use repositories. Addresses #633#698

Merged
regulartim merged 3 commits intointelowlproject:developfrom
opbot-xd:refactor/phase2-cronjob-repositories
Jan 13, 2026
Merged

Refactor cleanup, firehol, and mass_scanners cronjobs to use repositories. Addresses #633#698
regulartim merged 3 commits intointelowlproject:developfrom
opbot-xd:refactor/phase2-cronjob-repositories

Conversation

@opbot-xd
Copy link
Copy Markdown
Contributor

Description

This PR implements Phase 2 of the repository pattern refactoring by migrating the cleanup.py, firehol.py, and mass_scanners.py cronjobs to use the repository pattern instead of direct Django ORM calls.

Changes Summary

New Repositories Created:

  1. FireHolRepository (greedybear/cronjobs/repositories/firehol.py)

    • get_or_create(): Get existing or create new FireHol blocklist entry
    • save(): Persist FireHol entry
    • delete_old_entries(): Delete entries older than specified date
    • cleanup_old_entries(): Cleanup with configurable retention days (default 30)
  2. MassScannerRepository (greedybear/cronjobs/repositories/mass_scanner.py)

    • get_by_ip(): Retrieve mass scanner by IP address
    • create(): Create new mass scanner entry
    • save(): Persist mass scanner entry
    • exists(): Check if mass scanner exists by IP

Extended Existing Repositories:

  1. IocRepository - Added cleanup methods:

    • delete_old_iocs(): Delete IOC records older than cutoff date
    • update_ioc_reputation(): Update IP reputation for existing IOC
  2. CowrieSessionRepository - Added cleanup methods:

    • delete_old_command_sequences(): Delete old command sequences
    • delete_incomplete_sessions(): Delete sessions without start_time
    • delete_sessions_without_login(): Delete old sessions without login attempts
    • delete_sessions_without_commands(): Delete old sessions without commands

Refactored Cronjobs:

  1. CleanUp (greedybear/cronjobs/cleanup.py)

    • Now uses IocRepository and CowrieSessionRepository
    • Replaced all direct IOC.objects, CommandSequence.objects, CowrieSession.objects calls
    • Added dependency injection via __init__(ioc_repo=None, cowrie_repo=None)
  2. FireHolCron (greedybear/cronjobs/firehol.py)

    • Now uses FireHolRepository
    • Replaced direct FireHolList.objects calls
    • Added dependency injection via __init__(firehol_repo=None)
  3. MassScannersCron (greedybear/cronjobs/mass_scanners.py)

    • Now uses MassScannerRepository and IocRepository
    • Replaced direct MassScanner.objects and IOC.objects calls
    • Added dependency injection via __init__(mass_scanner_repo=None, ioc_repo=None)
    • Fixed logging to occur after DB transaction completes

Testing:

  • Added 16 comprehensive new tests covering all new repository methods
  • Tests include edge cases (empty results, missing entries, etc.)
  • All 359 tests passing (verified in Docker environment)

Related issues

Addresses #633 (Phase 2)

Type of change

  • Bug fix (non-breaking change which fixes an issue) - Fixed logging order in mass_scanners.py
  • New feature (non-breaking change which adds functionality) - Repository pattern implementation
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

  • I have read and understood the rules about how to Contribute to this project.
  • The pull request is for the branch develop.
  • I have added documentation of the new features.
  • Linter (Ruff) gave 0 errors. Pre-commit hooks passed.
  • I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors. (359/359 tests passing)
  • If changes were made to an existing model/serializer/view, the docs were updated and regenerated.
  • If the GUI has been modified: N/A (backend only changes)

…se repositories

- Extend IocRepository with cleanup methods:
  - delete_old_iocs(): Delete IOCs older than cutoff date
  - update_ioc_reputation(): Update IP reputation for existing IOCs

- Extend CowrieSessionRepository with cleanup methods:
  - delete_old_command_sequences(): Delete old command sequences
  - delete_incomplete_sessions(): Delete sessions without start_time
  - delete_sessions_without_login(): Delete old sessions without login
  - delete_sessions_without_commands(): Delete old sessions without commands

- Create FireHolRepository for blocklist management:
  - get_or_create(): Get existing or create new FireHol entry
  - cleanup_old_entries(): Delete entries older than retention days

- Create MassScannerRepository for mass scanner tracking:
  - get_by_ip(), create(), save(), exists()

- Refactor cronjobs to use repositories:
  - CleanUp: Use IocRepository and CowrieSessionRepository
  - FireHolCron: Use FireHolRepository
  - MassScannersCron: Use MassScannerRepository and IocRepository

- Add comprehensive test coverage (16 new tests)
- All 359 tests passing

Following Phase 1 best practices:
- All imports at module top
- Ternary operators for cleaner code
- Dependency injection for testability
- Log after DB operations complete

Addresses Phase 2 of issue intelowlproject#633
@opbot-xd
Copy link
Copy Markdown
Contributor Author

Note: Logging Issue in whatsmyip.py (Out of Scope)

During development, I noticed a logging inconsistency in greedybear/cronjobs/whatsmyip.py where the log statement occurs before the database transaction completes:

# Current code (line 18):
self.log.info(f"added new whatsmyip domain {domain=}")
WhatsMyIPDomain(domain=domain).save()

This should be:

WhatsMyIPDomain(domain=domain).save()
self.log.info(f"added new whatsmyip domain {domain=}")

@mlodic
Copy link
Copy Markdown
Member

mlodic commented Jan 12, 2026

this is a minor change, feel free to add it directly here. Thanks for reporting

@regulartim
Copy link
Copy Markdown
Collaborator

Looks good at first glance. 👍 I will review tomorrow!

Ensures that the log message 'added new whatsmyip domain' only appears
after the database transaction has successfully completed.
Adresses maintainer feedback to fix logging consistency.
Copilot AI review requested due to automatic review settings January 12, 2026 15:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 2 of the repository pattern refactoring, migrating the cleanup.py, firehol.py, and mass_scanners.py cronjobs to use repository classes instead of direct Django ORM calls. This improves testability and maintainability by introducing proper separation of concerns between business logic and data access.

Changes:

  • Created two new repository classes (FireHolRepository and MassScannerRepository) with complete CRUD operations
  • Extended existing IocRepository and CowrieSessionRepository with cleanup-related methods
  • Refactored three cronjobs to use dependency injection and repository pattern
  • Fixed logging order bug to ensure logs occur after database transactions complete
  • Added 16 comprehensive tests covering all new repository methods

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
greedybear/cronjobs/repositories/firehol.py New repository for FireHol blocklist data access with cleanup methods
greedybear/cronjobs/repositories/mass_scanner.py New repository for mass scanner data access with CRUD operations
greedybear/cronjobs/repositories/ioc.py Extended with cleanup methods (delete_old_iocs, update_ioc_reputation)
greedybear/cronjobs/repositories/cowrie_session.py Extended with cleanup methods for sessions and command sequences
greedybear/cronjobs/repositories/init.py Updated to export new repository classes
greedybear/cronjobs/cleanup.py Refactored to use IocRepository and CowrieSessionRepository with dependency injection
greedybear/cronjobs/firehol.py Refactored to use FireHolRepository with dependency injection
greedybear/cronjobs/mass_scanners.py Refactored to use MassScannerRepository and IocRepository with dependency injection; fixed logging order
greedybear/cronjobs/whatsmyip.py Fixed logging order to occur after database save
tests/test_repositories.py Added 16 comprehensive tests for all new repository methods with edge case coverage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1,4 +1,6 @@
from greedybear.cronjobs.repositories.cowrie_session import *
from greedybear.cronjobs.repositories.elastic import *
from greedybear.cronjobs.repositories.firehol import *
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pollutes the enclosing namespace, as the imported module greedybear.cronjobs.repositories.firehol does not define 'all'.
Import pollutes the enclosing namespace, as the imported module repositories.firehol does not define 'all'.

Suggested change
from greedybear.cronjobs.repositories.firehol import *
import greedybear.cronjobs.repositories.firehol as firehol

Copilot uses AI. Check for mistakes.
from greedybear.cronjobs.repositories.elastic import *
from greedybear.cronjobs.repositories.firehol import *
from greedybear.cronjobs.repositories.ioc import *
from greedybear.cronjobs.repositories.mass_scanner import *
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import pollutes the enclosing namespace, as the imported module greedybear.cronjobs.repositories.mass_scanner does not define 'all'.
Import pollutes the enclosing namespace, as the imported module repositories.mass_scanner does not define 'all'.

Suggested change
from greedybear.cronjobs.repositories.mass_scanner import *
from greedybear.cronjobs.repositories import mass_scanner

Copilot uses AI. Check for mistakes.
@opbot-xd
Copy link
Copy Markdown
Contributor Author

I noticed that GitHub Copilot automatically requested a review on this PR. Is this a new automated workflow in the repository settings? Just checking to ensure I didn't trigger anything by accident!

@mlodic
Copy link
Copy Markdown
Member

mlodic commented Jan 12, 2026

yes, I just enabled it as as experimental

Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! 👍 I only found some minor, cosmetic stuff.

Comment on lines +15 to +28
def get_by_ip(self, ip_address: str) -> MassScanner | None:
"""
Retrieve a mass scanner entry by IP address.

Args:
ip_address: IP address to look up.

Returns:
The matching MassScanner entry, or None if not found.
"""
try:
return MassScanner.objects.get(ip_address=ip_address)
except MassScanner.DoesNotExist:
return None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe this is stupid but where do we need this method? I can't find any reference (except for tests).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right! I added get_by_ip() following a common repository pattern (providing basic CRUD operations), but the actual MassScannersCron only uses exists().

Should I remove get_by_ip() from MassScannerRepository since it's not needed? I can keep just the methods that are actually used: exists(), create(), and save().

Let me know your preference!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point, but I would rather not have unused methods in the code base. So I would suggest to remove it.

Comment on lines +30 to +68
def create(self, ip_address: str, reason: str = "") -> MassScanner:
"""
Create a new mass scanner entry.

Args:
ip_address: IP address of the mass scanner.
reason: Optional reason/comment about the scanner.

Returns:
The newly created MassScanner instance.
"""
scanner = MassScanner(ip_address=ip_address, reason=reason)
scanner.save()
return scanner

def save(self, scanner: MassScanner) -> MassScanner:
"""
Save a MassScanner entry to the database.

Args:
scanner: MassScanner instance to save.

Returns:
The saved MassScanner instance.
"""
scanner.save()
return scanner

def exists(self, ip_address: str) -> bool:
"""
Check if a mass scanner entry exists for the given IP.

Args:
ip_address: IP address to check.

Returns:
True if the entry exists, False otherwise.
"""
return MassScanner.objects.filter(ip_address=ip_address).exists()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these three method be condensed to a single get_or_create method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! That's a much cleaner approach.

I can refactor MassScannerRepository to use a single get_or_create() method (similar to how FireHolRepository works), which will:

  1. Check if the entry exists
  2. Return it if found
  3. Create and return it if not found

This will also simplify the calling code in MassScannersCron from:

if not self.mass_scanner_repo.exists(ip_address):
    self.mass_scanner_repo.create(ip_address, reason)

To just:

self.mass_scanner_repo.get_or_create(ip_address, reason)

@regulartim
Copy link
Copy Markdown
Collaborator

regulartim commented Jan 12, 2026

Do you think it is time to split up the test_repositories.py ? It will have almost 1000 lines after this merge.
(But not in this PR. I would open a separate issue for that.)

Simplified MassScannerRepository by replacing create(), save(), get_by_ip(),
and exists() methods with a single get_or_create() method, following the same
pattern as FireHolRepository.

Benefits:
- Cleaner API (one method instead of four)
- Simpler calling code in MassScannersCron
- Consistent with Django's get_or_create pattern
- Reduces code duplication

Updated tests to verify:
- Creating new entries
- Returning existing entries without duplicates
- Handling entries with and without reasons

Addresses maintainer feedback on PR intelowlproject#698.
@opbot-xd
Copy link
Copy Markdown
Contributor Author

Great idea! I agree that test_repositories.py is getting quite large.

Splitting it up would make it more maintainable. A logical split could be:

  • test_ioc_repository.py - IOC-related tests
  • test_cowrie_session_repository.py - Cowrie session tests
  • test_sensor_repository.py - Sensor tests
  • test_elastic_repository.py - Elasticsearch tests
  • test_firehol_repository.py - FireHol tests
  • test_mass_scanner_repository.py - Mass scanner tests

I'm happy to tackle that refactoring in a separate PR once this one is merged.

@opbot-xd
Copy link
Copy Markdown
Contributor Author

Done! I've refactored MassScannerRepository to use a single get_or_create() method.

Changes:

  • Removed get_by_ip(), create(), save(), and exists()
  • Replaced with get_or_create() following the same pattern as FireHolRepository
  • Updated MassScannersCron to use the simpler API
  • Updated tests accordingly

@opbot-xd opbot-xd requested a review from regulartim January 13, 2026 12:56
@opbot-xd
Copy link
Copy Markdown
Contributor Author

Thanks for the Copilot review!

Regarding the wildcard import warnings for firehol.py and mass_scanner.py not defining __all__:

This is intentionally ignored in this codebase. The project's Ruff configuration (.github/configurations/python_linters/.ruff.toml) explicitly ignores F403 (wildcard imports):

ignore = [
    # F403: Allow wildcard imports in __init__.py files
    "F403",
    ...
]

Following the established codebase patterns here.

@regulartim
Copy link
Copy Markdown
Collaborator

This is intentionally ignored in this codebase. The project's Ruff configuration (.github/configurations/python_linters/.ruff.toml) explicitly ignores F403 (wildcard imports):

@mlodic Do you think we can make Copilot respect our Ruff config?

Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top 👍

@regulartim regulartim merged commit 0520584 into intelowlproject:develop Jan 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants