Refactor cleanup, firehol, and mass_scanners cronjobs to use repositories. Addresses #633#698
Conversation
…se repositories - Extend IocRepository with cleanup methods: - delete_old_iocs(): Delete IOCs older than cutoff date - update_ioc_reputation(): Update IP reputation for existing IOCs - Extend CowrieSessionRepository with cleanup methods: - delete_old_command_sequences(): Delete old command sequences - delete_incomplete_sessions(): Delete sessions without start_time - delete_sessions_without_login(): Delete old sessions without login - delete_sessions_without_commands(): Delete old sessions without commands - Create FireHolRepository for blocklist management: - get_or_create(): Get existing or create new FireHol entry - cleanup_old_entries(): Delete entries older than retention days - Create MassScannerRepository for mass scanner tracking: - get_by_ip(), create(), save(), exists() - Refactor cronjobs to use repositories: - CleanUp: Use IocRepository and CowrieSessionRepository - FireHolCron: Use FireHolRepository - MassScannersCron: Use MassScannerRepository and IocRepository - Add comprehensive test coverage (16 new tests) - All 359 tests passing Following Phase 1 best practices: - All imports at module top - Ternary operators for cleaner code - Dependency injection for testability - Log after DB operations complete Addresses Phase 2 of issue intelowlproject#633
Note: Logging Issue in whatsmyip.py (Out of Scope)During development, I noticed a logging inconsistency in # Current code (line 18):
self.log.info(f"added new whatsmyip domain {domain=}")
WhatsMyIPDomain(domain=domain).save()This should be: WhatsMyIPDomain(domain=domain).save()
self.log.info(f"added new whatsmyip domain {domain=}") |
|
this is a minor change, feel free to add it directly here. Thanks for reporting |
|
Looks good at first glance. 👍 I will review tomorrow! |
Ensures that the log message 'added new whatsmyip domain' only appears after the database transaction has successfully completed. Adresses maintainer feedback to fix logging consistency.
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 2 of the repository pattern refactoring, migrating the cleanup.py, firehol.py, and mass_scanners.py cronjobs to use repository classes instead of direct Django ORM calls. This improves testability and maintainability by introducing proper separation of concerns between business logic and data access.
Changes:
- Created two new repository classes (FireHolRepository and MassScannerRepository) with complete CRUD operations
- Extended existing IocRepository and CowrieSessionRepository with cleanup-related methods
- Refactored three cronjobs to use dependency injection and repository pattern
- Fixed logging order bug to ensure logs occur after database transactions complete
- Added 16 comprehensive tests covering all new repository methods
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| greedybear/cronjobs/repositories/firehol.py | New repository for FireHol blocklist data access with cleanup methods |
| greedybear/cronjobs/repositories/mass_scanner.py | New repository for mass scanner data access with CRUD operations |
| greedybear/cronjobs/repositories/ioc.py | Extended with cleanup methods (delete_old_iocs, update_ioc_reputation) |
| greedybear/cronjobs/repositories/cowrie_session.py | Extended with cleanup methods for sessions and command sequences |
| greedybear/cronjobs/repositories/init.py | Updated to export new repository classes |
| greedybear/cronjobs/cleanup.py | Refactored to use IocRepository and CowrieSessionRepository with dependency injection |
| greedybear/cronjobs/firehol.py | Refactored to use FireHolRepository with dependency injection |
| greedybear/cronjobs/mass_scanners.py | Refactored to use MassScannerRepository and IocRepository with dependency injection; fixed logging order |
| greedybear/cronjobs/whatsmyip.py | Fixed logging order to occur after database save |
| tests/test_repositories.py | Added 16 comprehensive tests for all new repository methods with edge case coverage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1,4 +1,6 @@ | |||
| from greedybear.cronjobs.repositories.cowrie_session import * | |||
| from greedybear.cronjobs.repositories.elastic import * | |||
| from greedybear.cronjobs.repositories.firehol import * | |||
There was a problem hiding this comment.
Import pollutes the enclosing namespace, as the imported module greedybear.cronjobs.repositories.firehol does not define 'all'.
Import pollutes the enclosing namespace, as the imported module repositories.firehol does not define 'all'.
| from greedybear.cronjobs.repositories.firehol import * | |
| import greedybear.cronjobs.repositories.firehol as firehol |
| from greedybear.cronjobs.repositories.elastic import * | ||
| from greedybear.cronjobs.repositories.firehol import * | ||
| from greedybear.cronjobs.repositories.ioc import * | ||
| from greedybear.cronjobs.repositories.mass_scanner import * |
There was a problem hiding this comment.
Import pollutes the enclosing namespace, as the imported module greedybear.cronjobs.repositories.mass_scanner does not define 'all'.
Import pollutes the enclosing namespace, as the imported module repositories.mass_scanner does not define 'all'.
| from greedybear.cronjobs.repositories.mass_scanner import * | |
| from greedybear.cronjobs.repositories import mass_scanner |
|
I noticed that GitHub Copilot automatically requested a review on this PR. Is this a new automated workflow in the repository settings? Just checking to ensure I didn't trigger anything by accident! |
|
yes, I just enabled it as as experimental |
regulartim
left a comment
There was a problem hiding this comment.
Excellent! 👍 I only found some minor, cosmetic stuff.
| def get_by_ip(self, ip_address: str) -> MassScanner | None: | ||
| """ | ||
| Retrieve a mass scanner entry by IP address. | ||
|
|
||
| Args: | ||
| ip_address: IP address to look up. | ||
|
|
||
| Returns: | ||
| The matching MassScanner entry, or None if not found. | ||
| """ | ||
| try: | ||
| return MassScanner.objects.get(ip_address=ip_address) | ||
| except MassScanner.DoesNotExist: | ||
| return None |
There was a problem hiding this comment.
Sorry, maybe this is stupid but where do we need this method? I can't find any reference (except for tests).
There was a problem hiding this comment.
You're absolutely right! I added get_by_ip() following a common repository pattern (providing basic CRUD operations), but the actual MassScannersCron only uses exists().
Should I remove get_by_ip() from MassScannerRepository since it's not needed? I can keep just the methods that are actually used: exists(), create(), and save().
Let me know your preference!
There was a problem hiding this comment.
I get your point, but I would rather not have unused methods in the code base. So I would suggest to remove it.
| def create(self, ip_address: str, reason: str = "") -> MassScanner: | ||
| """ | ||
| Create a new mass scanner entry. | ||
|
|
||
| Args: | ||
| ip_address: IP address of the mass scanner. | ||
| reason: Optional reason/comment about the scanner. | ||
|
|
||
| Returns: | ||
| The newly created MassScanner instance. | ||
| """ | ||
| scanner = MassScanner(ip_address=ip_address, reason=reason) | ||
| scanner.save() | ||
| return scanner | ||
|
|
||
| def save(self, scanner: MassScanner) -> MassScanner: | ||
| """ | ||
| Save a MassScanner entry to the database. | ||
|
|
||
| Args: | ||
| scanner: MassScanner instance to save. | ||
|
|
||
| Returns: | ||
| The saved MassScanner instance. | ||
| """ | ||
| scanner.save() | ||
| return scanner | ||
|
|
||
| def exists(self, ip_address: str) -> bool: | ||
| """ | ||
| Check if a mass scanner entry exists for the given IP. | ||
|
|
||
| Args: | ||
| ip_address: IP address to check. | ||
|
|
||
| Returns: | ||
| True if the entry exists, False otherwise. | ||
| """ | ||
| return MassScanner.objects.filter(ip_address=ip_address).exists() |
There was a problem hiding this comment.
Can these three method be condensed to a single get_or_create method?
There was a problem hiding this comment.
Absolutely! That's a much cleaner approach.
I can refactor MassScannerRepository to use a single get_or_create() method (similar to how FireHolRepository works), which will:
- Check if the entry exists
- Return it if found
- Create and return it if not found
This will also simplify the calling code in MassScannersCron from:
if not self.mass_scanner_repo.exists(ip_address):
self.mass_scanner_repo.create(ip_address, reason)To just:
self.mass_scanner_repo.get_or_create(ip_address, reason)|
Do you think it is time to split up the |
Simplified MassScannerRepository by replacing create(), save(), get_by_ip(), and exists() methods with a single get_or_create() method, following the same pattern as FireHolRepository. Benefits: - Cleaner API (one method instead of four) - Simpler calling code in MassScannersCron - Consistent with Django's get_or_create pattern - Reduces code duplication Updated tests to verify: - Creating new entries - Returning existing entries without duplicates - Handling entries with and without reasons Addresses maintainer feedback on PR intelowlproject#698.
|
Great idea! I agree that Splitting it up would make it more maintainable. A logical split could be:
I'm happy to tackle that refactoring in a separate PR once this one is merged. |
|
Done! I've refactored Changes:
|
|
Thanks for the Copilot review! Regarding the wildcard import warnings for This is intentionally ignored in this codebase. The project's Ruff configuration ( ignore = [
# F403: Allow wildcard imports in __init__.py files
"F403",
...
]Following the established codebase patterns here. |
@mlodic Do you think we can make Copilot respect our Ruff config? |
Description
This PR implements Phase 2 of the repository pattern refactoring by migrating the
cleanup.py,firehol.py, andmass_scanners.pycronjobs to use the repository pattern instead of direct Django ORM calls.Changes Summary
New Repositories Created:
FireHolRepository (
greedybear/cronjobs/repositories/firehol.py)get_or_create(): Get existing or create new FireHol blocklist entrysave(): Persist FireHol entrydelete_old_entries(): Delete entries older than specified datecleanup_old_entries(): Cleanup with configurable retention days (default 30)MassScannerRepository (
greedybear/cronjobs/repositories/mass_scanner.py)get_by_ip(): Retrieve mass scanner by IP addresscreate(): Create new mass scanner entrysave(): Persist mass scanner entryexists(): Check if mass scanner exists by IPExtended Existing Repositories:
IocRepository - Added cleanup methods:
delete_old_iocs(): Delete IOC records older than cutoff dateupdate_ioc_reputation(): Update IP reputation for existing IOCCowrieSessionRepository - Added cleanup methods:
delete_old_command_sequences(): Delete old command sequencesdelete_incomplete_sessions(): Delete sessions without start_timedelete_sessions_without_login(): Delete old sessions without login attemptsdelete_sessions_without_commands(): Delete old sessions without commandsRefactored Cronjobs:
CleanUp (
greedybear/cronjobs/cleanup.py)IocRepositoryandCowrieSessionRepositoryIOC.objects,CommandSequence.objects,CowrieSession.objectscalls__init__(ioc_repo=None, cowrie_repo=None)FireHolCron (
greedybear/cronjobs/firehol.py)FireHolRepositoryFireHolList.objectscalls__init__(firehol_repo=None)MassScannersCron (
greedybear/cronjobs/mass_scanners.py)MassScannerRepositoryandIocRepositoryMassScanner.objectsandIOC.objectscalls__init__(mass_scanner_repo=None, ioc_repo=None)Testing:
Related issues
Addresses #633 (Phase 2)
Type of change
Checklist
develop.Ruff) gave 0 errors. Pre-commit hooks passed.