Track IoC-Sensor relationship. Closes #779#784
Track IoC-Sensor relationship. Closes #779#784regulartim merged 11 commits intointelowlproject:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements sensor tracking for Indicators of Compromise (IoCs) by establishing a many-to-many relationship between the IOC and Sensor models. This enhancement enables GreedyBear to track which T-Pot sensors detected which IoCs, addressing issue #779 for better visibility into threat detection sources.
Changes:
- Added
sensorsManyToManyField to the IOC model with corresponding database migration - Refactored
SensorRepositoryto use dict-based caching and renamedadd_sensor()toget_or_create_sensor()for clearer semantics - Enhanced extraction pipeline to capture sensor information from Elasticsearch hits and associate them with IoCs through the new
add_sensor_to_ioc()repository method
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| greedybear/models.py | Added sensors ManyToManyField to IOC model for tracking which sensors detected each IoC |
| greedybear/migrations/0035_add_sensors_to_ioc.py | Database migration adding sensors field and renaming TorExitNode index |
| greedybear/cronjobs/repositories/sensor.py | Refactored to use dict-based cache storing Sensor objects; renamed method to get_or_create_sensor |
| greedybear/cronjobs/repositories/ioc.py | Added add_sensor_to_ioc method for associating sensors with IoCs |
| greedybear/cronjobs/extraction/pipeline.py | Modified to extract sensor from t-pot_ip_ext field and include in hit dictionaries |
| greedybear/cronjobs/extraction/ioc_processor.py | Updated add_ioc to accept optional sensor parameter and associate it with IoCs |
| greedybear/cronjobs/extraction/utils.py | Changed iocs_from_hits to return tuples of (IOC, sensors list) |
| greedybear/cronjobs/extraction/strategies/generic.py | Updated to handle sensor lists, processing first sensor with IoC creation and adding remaining sensors separately |
| greedybear/cronjobs/extraction/strategies/cowrie.py | Updated scanner and payload extraction to handle sensors correctly |
| greedybear/admin.py | Added sensor_list display method and filter_horizontal for sensors field; improved Sensor admin with focused display fields |
| tests/test_sensor_repository.py | Updated all tests for renamed method and return value changes from boolean to Sensor object |
| tests/test_ioc_processor.py | Updated sensor filtering test to use new cache-based approach |
| tests/test_extraction_utils.py | Updated all tests to handle new tuple return format from iocs_from_hits |
| tests/test_extraction_strategies.py | Added test for multi-sensor handling and updated existing tests for new tuple format |
| tests/test_cowrie_extraction.py | Updated test for new tuple format |
| tests/greedybear/cronjobs/test_extraction_pipeline_grouping.py | Updated to use get_or_create_sensor method name |
| tests/greedybear/cronjobs/test_extraction_pipeline_e2e.py | Updated to use get_or_create_sensor method name |
| tests/init.py | Added cache initialization to ExtractionTestCase setUp for sensor filtering |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| migrations.RenameIndex( | ||
| model_name='torexitnode', | ||
| new_name='greedybear__ip_addr_6bc095_idx', | ||
| old_name='greedybear_ip_addr_tor_idx', | ||
| ), |
There was a problem hiding this comment.
This migration includes a RenameIndex operation on the torexitnode model that appears unrelated to adding sensors to IoC. The index renaming from 'greedybear_ip_addr_tor_idx' to 'greedybear__ip_addr_6bc095_idx' is a Django auto-generated naming change but doesn't relate to the IoC-Sensor relationship feature described in the PR. Consider separating this into its own migration to keep changes focused and easier to review/rollback if needed.
| migrations.RenameIndex( | |
| model_name='torexitnode', | |
| new_name='greedybear__ip_addr_6bc095_idx', | |
| old_name='greedybear_ip_addr_tor_idx', | |
| ), |
There was a problem hiding this comment.
Hi @regulartim what to do about this?
It got autogenerated when I ran makemigrations for the IoC-Sensor feature, Django picked up this orphaned change.
Should I create 2 migration files for this or let it be as it is?
There was a problem hiding this comment.
I made the pending migration and pushed it to develop. Good catch! 👍
There was a problem hiding this comment.
(This means you probably have to re-make your migration.)
Test Coverage ReportAll modified files have comprehensive test coverage:
Overall: 92.04% coverage across all modified files |
- Add ManyToMany relationship between IOC and Sensor models - Refactor SensorRepository to use dict cache and return Sensor objects - Rename add_sensor() to get_or_create_sensor() for clarity - Update extraction pipeline to capture and pass sensor data through strategies - Modify iocs_from_hits() to return (IOC, sensors) tuples - Update IocProcessor to accept sensor parameter and associate with IOCs - Add add_sensor_to_ioc() method to IocRepository - Fix bug: process each IoC once to avoid duplicate counter increments - Update admin panel to display sensor list for each IoC - Add comprehensive tests for sensor tracking (104 tests passing) This enables tracking which T-Pot sensors detected which IoCs, providing valuable forensic data in multi-sensor deployments.
…r and initialize cache
This migration adds the sensors field which depends on the TorExitNode index rename migration (0035) from upstream develop.
da0d921 to
22c2bb2
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @regulartim the PR is ready for review. Please review it when you get time. Thanks :) |
regulartim
left a comment
There was a problem hiding this comment.
Hey @opbot-xd ! Good work as always! :) But we might be able to simplify the process.
- Remove add_sensor_to_ioc wrapper method from IocRepository (ManyToManyField.add is already idempotent) - Keep iocs_from_hits returning list[IOC] instead of tuples - Attach sensors to IOC via _sensors_to_add temporary attribute - Handle sensors in _merge_iocs for existing IOCs (consolidates merge logic) - Handle sensors after save for new IOCs - Use IOC.sensors.add(sensor) directly in strategies
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return None | ||
| sensor = Sensor(address=ip) | ||
| sensor.save() | ||
| self.cache.add(ip) | ||
| self.cache[ip] = sensor | ||
| self.log.info(f"added sensor {ip} to the database") |
There was a problem hiding this comment.
On a cache miss this always creates/saves a new Sensor row. Since Sensor.address isn’t unique, this can silently create duplicate Sensor records (and in multi-process/concurrent runs the in-memory cache won’t prevent duplicates). Consider using Sensor.objects.get_or_create(address=ip) (and update the cache from the returned instance).
There was a problem hiding this comment.
Copilot raised a valid point here about potential duplicate Sensor records in multi-process/concurrent Celery environments.
The issue: On a cache miss, the code creates a new Sensor. In multi-worker setups, two workers could simultaneously create sensors for the same IP since the in-memory cache isn't shared.
Proposed fixes:
- Use
Sensor.objects.get_or_create(address=ip)for atomic creation - Add
unique=Trueconstraint toSensor.addressin the model
My confusion: Should Sensor.address be unique? Logically it makes sense (each sensor IP should exist only once), but I want to confirm before adding a database constraint + migration.
@regulartim What do you think - should I add the unique constraint, or just use get_or_create as a simple fix?
There was a problem hiding this comment.
I agree. Currently we don't have a multi-worker setup, but that can happen in the future.
Should Sensor.address be unique?
Yes.
What do you think - should I add the unique constraint, or just use get_or_create as a simple fix?
Maybe both!? The unique constraint is the right thing to do on the database level. But then we still need to catch the described edge case on the application level, which can be done with get_or_create. Do you agree?
- Add unique=True to Sensor.address field (database-level protection) - Use Sensor.objects.get_or_create() for atomic, race-safe creation - Add get_queryset prefetch_related for N+1 query optimization in admin This prevents duplicate Sensor records in multi-worker Celery environments as suggested by @regulartim in code review.
regulartim
left a comment
There was a problem hiding this comment.
Just a few minor things left...
…emp attribute usage
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
greedybear/cronjobs/extraction/utils.py:103
- The PR description says
iocs_from_hits()returns tuples of(IOC, sensors list), but the implementation still returnslist[IOC]and stores sensors on a temporary_sensors_to_addattribute. Please either update the PR description/docs to match the implemented contract or adjust the function signature/return type to match the described behavior to avoid confusing future maintainers.
def iocs_from_hits(hits: list[dict]) -> list[IOC]:
"""
Convert Elasticsearch hits into IOC objects with associated sensors.
Groups hits by source IP, filters out non-global addresses, and
constructs IOC objects with aggregated data.
Enriches IOCs with FireHol categories at creation time to ensure
only fresh data is used.
Args:
hits: List of Elasticsearch hit dictionaries.
Returns:
List of IOC instances, one per unique source IP.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| migrations.AlterField( | ||
| model_name='sensor', | ||
| name='address', | ||
| field=models.CharField(max_length=15, unique=True), | ||
| ), |
There was a problem hiding this comment.
This migration adds a unique constraint to Sensor.address. If the database already contains duplicate sensor rows (which was possible before the constraint), applying this migration will fail. Consider adding a data migration step before AlterField to deduplicate existing Sensor records (and update any FK/M2M references if applicable) so the schema migration is safe to run on existing installations.
There was a problem hiding this comment.
This is a valid point by Copilot, what do you think?
There was a problem hiding this comment.
Agreed, that is a very valid point.
I have updated the migration file to include a data migration step (deduplicate_sensors) that runs before the AlterField operation.
This step identifies all Sensor records with duplicate addresses. Keeps the oldest record (lowest ID), Deletes the duplicates (safely, by ID).
This ensures the unique=True constraint can be applied safely to existing databases without raising an IntegrityError.
There was a problem hiding this comment.
changes are in this commit 3e0becc4c4a689bdffef3a56508416045cf1aaf5
There was a problem hiding this comment.
Sorry, I did not see that! 👍
regulartim
left a comment
There was a problem hiding this comment.
One last thing. Other than that, we are ready to merge! :)
| migrations.AlterField( | ||
| model_name='sensor', | ||
| name='address', | ||
| field=models.CharField(max_length=15, unique=True), | ||
| ), |
There was a problem hiding this comment.
This is a valid point by Copilot, what do you think?
| migrations.AlterField( | ||
| model_name='sensor', | ||
| name='address', | ||
| field=models.CharField(max_length=15, unique=True), | ||
| ), |
There was a problem hiding this comment.
Sorry, I did not see that! 👍
Description
This PR implements sensor tracking for Indicators of Compromise (IoCs) by adding a many-to-many relationship between IOC and Sensor models. This enables tracking which sensors detected which IoCs, providing better visibility into threat detection sources.
Database Changes:
sensorsManyToManyField to IOC model0036_add_sensors_to_ioc.pyRepository Layer:
SensorRepository: renamedadd_sensor()toget_or_create_sensor(), changed cache from set to dictadd_sensor_to_ioc()method toIOCRepositoryfor associating sensors with IoCsExtraction Pipeline:
IOCProcessor.add_ioc()to accept optional sensor parameterExtractionPipelineto capture sensor from Elasticsearch hits (t-pot_ip_extfield)iocs_from_hits()utility to return tuples of (IOC, sensors list)Admin Panel:
sensor_listmethodfilter_horizontalfor sensors fieldRelated issues
Closes #779
Type of change
Checklist
develop.Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.