Skip to content

Track IoC-Sensor relationship. Closes #779#784

Merged
regulartim merged 11 commits intointelowlproject:developfrom
opbot-xd:feature/ioc-sensor-relationship
Feb 10, 2026
Merged

Track IoC-Sensor relationship. Closes #779#784
regulartim merged 11 commits intointelowlproject:developfrom
opbot-xd:feature/ioc-sensor-relationship

Conversation

@opbot-xd
Copy link
Copy Markdown
Contributor

@opbot-xd opbot-xd commented Feb 5, 2026

Description

This PR implements sensor tracking for Indicators of Compromise (IoCs) by adding a many-to-many relationship between IOC and Sensor models. This enables tracking which sensors detected which IoCs, providing better visibility into threat detection sources.

Database Changes:

  • Added sensors ManyToManyField to IOC model
  • Created migration 0036_add_sensors_to_ioc.py

Repository Layer:

  • Refactored SensorRepository: renamed add_sensor() to get_or_create_sensor(), changed cache from set to dict
  • Added add_sensor_to_ioc() method to IOCRepository for associating sensors with IoCs

Extraction Pipeline:

  • Modified IOCProcessor.add_ioc() to accept optional sensor parameter
  • Updated ExtractionPipeline to capture sensor from Elasticsearch hits (t-pot_ip_ext field)
  • Modified iocs_from_hits() utility to return tuples of (IOC, sensors list)
  • Updated extraction strategies (Generic and Cowrie) to process IoC once with first sensor, then add remaining sensors separately (prevents duplicate counter increments)

Admin Panel:

  • Added sensor display in IOC admin with sensor_list method
  • Added filter_horizontal for sensors field

Related issues

Closes #779

Type of change

  • New feature (non-breaking change which adds functionality).

Checklist

  • I have read and understood the rules about how to Contribute to this project.
  • The pull request is for the branch develop.
  • I have added documentation of the new features.
  • Linter (Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors.
  • If changes were made to an existing model/serializer/view, the docs were updated and regenerated (check CONTRIBUTE.md).
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.

@opbot-xd opbot-xd marked this pull request as ready for review February 5, 2026 15:53
Copilot AI review requested due to automatic review settings February 5, 2026 15:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements sensor tracking for Indicators of Compromise (IoCs) by establishing a many-to-many relationship between the IOC and Sensor models. This enhancement enables GreedyBear to track which T-Pot sensors detected which IoCs, addressing issue #779 for better visibility into threat detection sources.

Changes:

  • Added sensors ManyToManyField to the IOC model with corresponding database migration
  • Refactored SensorRepository to use dict-based caching and renamed add_sensor() to get_or_create_sensor() for clearer semantics
  • Enhanced extraction pipeline to capture sensor information from Elasticsearch hits and associate them with IoCs through the new add_sensor_to_ioc() repository method

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
greedybear/models.py Added sensors ManyToManyField to IOC model for tracking which sensors detected each IoC
greedybear/migrations/0035_add_sensors_to_ioc.py Database migration adding sensors field and renaming TorExitNode index
greedybear/cronjobs/repositories/sensor.py Refactored to use dict-based cache storing Sensor objects; renamed method to get_or_create_sensor
greedybear/cronjobs/repositories/ioc.py Added add_sensor_to_ioc method for associating sensors with IoCs
greedybear/cronjobs/extraction/pipeline.py Modified to extract sensor from t-pot_ip_ext field and include in hit dictionaries
greedybear/cronjobs/extraction/ioc_processor.py Updated add_ioc to accept optional sensor parameter and associate it with IoCs
greedybear/cronjobs/extraction/utils.py Changed iocs_from_hits to return tuples of (IOC, sensors list)
greedybear/cronjobs/extraction/strategies/generic.py Updated to handle sensor lists, processing first sensor with IoC creation and adding remaining sensors separately
greedybear/cronjobs/extraction/strategies/cowrie.py Updated scanner and payload extraction to handle sensors correctly
greedybear/admin.py Added sensor_list display method and filter_horizontal for sensors field; improved Sensor admin with focused display fields
tests/test_sensor_repository.py Updated all tests for renamed method and return value changes from boolean to Sensor object
tests/test_ioc_processor.py Updated sensor filtering test to use new cache-based approach
tests/test_extraction_utils.py Updated all tests to handle new tuple return format from iocs_from_hits
tests/test_extraction_strategies.py Added test for multi-sensor handling and updated existing tests for new tuple format
tests/test_cowrie_extraction.py Updated test for new tuple format
tests/greedybear/cronjobs/test_extraction_pipeline_grouping.py Updated to use get_or_create_sensor method name
tests/greedybear/cronjobs/test_extraction_pipeline_e2e.py Updated to use get_or_create_sensor method name
tests/init.py Added cache initialization to ExtractionTestCase setUp for sensor filtering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +13 to +17
migrations.RenameIndex(
model_name='torexitnode',
new_name='greedybear__ip_addr_6bc095_idx',
old_name='greedybear_ip_addr_tor_idx',
),
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration includes a RenameIndex operation on the torexitnode model that appears unrelated to adding sensors to IoC. The index renaming from 'greedybear_ip_addr_tor_idx' to 'greedybear__ip_addr_6bc095_idx' is a Django auto-generated naming change but doesn't relate to the IoC-Sensor relationship feature described in the PR. Consider separating this into its own migration to keep changes focused and easier to review/rollback if needed.

Suggested change
migrations.RenameIndex(
model_name='torexitnode',
new_name='greedybear__ip_addr_6bc095_idx',
old_name='greedybear_ip_addr_tor_idx',
),

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @regulartim what to do about this?
It got autogenerated when I ran makemigrations for the IoC-Sensor feature, Django picked up this orphaned change.
Should I create 2 migration files for this or let it be as it is?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the pending migration and pushed it to develop. Good catch! 👍

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This means you probably have to re-make your migration.)

@opbot-xd
Copy link
Copy Markdown
Contributor Author

opbot-xd commented Feb 5, 2026

Test Coverage Report

All modified files have comprehensive test coverage:

File Coverage Status
greedybear/models.py 100% Full coverage of new sensors ManyToManyField
greedybear/cronjobs/repositories/sensor.py 100% All methods including refactored get_or_create_sensor() tested
greedybear/cronjobs/repositories/ioc.py 93.67% New add_sensor_to_ioc() method fully tested
greedybear/cronjobs/extraction/ioc_processor.py 97.92% Sensor parameter handling covered
greedybear/cronjobs/extraction/pipeline.py 100% Sensor extraction from Elasticsearch hits tested
greedybear/cronjobs/extraction/strategies/generic.py 100% IoC-sensor association logic covered
greedybear/cronjobs/extraction/strategies/cowrie.py 83.56% Core sensor tracking functionality tested
greedybear/admin.py 88.64% Sensor display in admin panel covered

Overall: 92.04% coverage across all modified files

- Add ManyToMany relationship between IOC and Sensor models
- Refactor SensorRepository to use dict cache and return Sensor objects
- Rename add_sensor() to get_or_create_sensor() for clarity
- Update extraction pipeline to capture and pass sensor data through strategies
- Modify iocs_from_hits() to return (IOC, sensors) tuples
- Update IocProcessor to accept sensor parameter and associate with IOCs
- Add add_sensor_to_ioc() method to IocRepository
- Fix bug: process each IoC once to avoid duplicate counter increments
- Update admin panel to display sensor list for each IoC
- Add comprehensive tests for sensor tracking (104 tests passing)

This enables tracking which T-Pot sensors detected which IoCs,
providing valuable forensic data in multi-sensor deployments.
This migration adds the sensors field which depends on the TorExitNode
index rename migration (0035) from upstream develop.
Copilot AI review requested due to automatic review settings February 6, 2026 06:21
@opbot-xd opbot-xd force-pushed the feature/ioc-sensor-relationship branch from da0d921 to 22c2bb2 Compare February 6, 2026 06:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@opbot-xd opbot-xd requested a review from regulartim February 6, 2026 06:53
@opbot-xd
Copy link
Copy Markdown
Contributor Author

opbot-xd commented Feb 6, 2026

Hi @regulartim the PR is ready for review. Please review it when you get time. Thanks :)

Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opbot-xd ! Good work as always! :) But we might be able to simplify the process.

- Remove add_sensor_to_ioc wrapper method from IocRepository
  (ManyToManyField.add is already idempotent)
- Keep iocs_from_hits returning list[IOC] instead of tuples
- Attach sensors to IOC via _sensors_to_add temporary attribute
- Handle sensors in _merge_iocs for existing IOCs (consolidates merge logic)
- Handle sensors after save for new IOCs
- Use IOC.sensors.add(sensor) directly in strategies
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 38 to 42
return None
sensor = Sensor(address=ip)
sensor.save()
self.cache.add(ip)
self.cache[ip] = sensor
self.log.info(f"added sensor {ip} to the database")
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a cache miss this always creates/saves a new Sensor row. Since Sensor.address isn’t unique, this can silently create duplicate Sensor records (and in multi-process/concurrent runs the in-memory cache won’t prevent duplicates). Consider using Sensor.objects.get_or_create(address=ip) (and update the cache from the returned instance).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot raised a valid point here about potential duplicate Sensor records in multi-process/concurrent Celery environments.

The issue: On a cache miss, the code creates a new Sensor. In multi-worker setups, two workers could simultaneously create sensors for the same IP since the in-memory cache isn't shared.

Proposed fixes:

  1. Use Sensor.objects.get_or_create(address=ip) for atomic creation
  2. Add unique=True constraint to Sensor.address in the model

My confusion: Should Sensor.address be unique? Logically it makes sense (each sensor IP should exist only once), but I want to confirm before adding a database constraint + migration.

@regulartim What do you think - should I add the unique constraint, or just use get_or_create as a simple fix?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Currently we don't have a multi-worker setup, but that can happen in the future.

Should Sensor.address be unique?

Yes.

What do you think - should I add the unique constraint, or just use get_or_create as a simple fix?

Maybe both!? The unique constraint is the right thing to do on the database level. But then we still need to catch the described edge case on the application level, which can be done with get_or_create. Do you agree?

- Add unique=True to Sensor.address field (database-level protection)
- Use Sensor.objects.get_or_create() for atomic, race-safe creation
- Add get_queryset prefetch_related for N+1 query optimization in admin

This prevents duplicate Sensor records in multi-worker Celery environments
as suggested by @regulartim in code review.
@opbot-xd opbot-xd requested a review from regulartim February 9, 2026 18:52
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things left...

Copilot AI review requested due to automatic review settings February 10, 2026 08:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

greedybear/cronjobs/extraction/utils.py:103

  • The PR description says iocs_from_hits() returns tuples of (IOC, sensors list), but the implementation still returns list[IOC] and stores sensors on a temporary _sensors_to_add attribute. Please either update the PR description/docs to match the implemented contract or adjust the function signature/return type to match the described behavior to avoid confusing future maintainers.
def iocs_from_hits(hits: list[dict]) -> list[IOC]:
    """
    Convert Elasticsearch hits into IOC objects with associated sensors.
    Groups hits by source IP, filters out non-global addresses, and
    constructs IOC objects with aggregated data.
    Enriches IOCs with FireHol categories at creation time to ensure
    only fresh data is used.

    Args:
        hits: List of Elasticsearch hit dictionaries.

    Returns:
        List of IOC instances, one per unique source IP.
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +22
migrations.AlterField(
model_name='sensor',
name='address',
field=models.CharField(max_length=15, unique=True),
),
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration adds a unique constraint to Sensor.address. If the database already contains duplicate sensor rows (which was possible before the constraint), applying this migration will fail. Consider adding a data migration step before AlterField to deduplicate existing Sensor records (and update any FK/M2M references if applicable) so the schema migration is safe to run on existing installations.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid point by Copilot, what do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that is a very valid point.

I have updated the migration file to include a data migration step (deduplicate_sensors) that runs before the AlterField operation.

This step identifies all Sensor records with duplicate addresses. Keeps the oldest record (lowest ID), Deletes the duplicates (safely, by ID).

This ensures the unique=True constraint can be applied safely to existing databases without raising an IntegrityError.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are in this commit 3e0becc4c4a689bdffef3a56508416045cf1aaf5

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not see that! 👍

@opbot-xd opbot-xd requested a review from regulartim February 10, 2026 10:08
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing. Other than that, we are ready to merge! :)

Comment on lines +18 to +22
migrations.AlterField(
model_name='sensor',
name='address',
field=models.CharField(max_length=15, unique=True),
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid point by Copilot, what do you think?

Comment on lines +18 to +22
migrations.AlterField(
model_name='sensor',
name='address',
field=models.CharField(max_length=15, unique=True),
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I did not see that! 👍

@regulartim regulartim merged commit ee0d79c into intelowlproject:develop Feb 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants