expose tags in API responses and add tag-based filtering. Closes #522 by opbot-xd · Pull Request #899 · intelowlproject/GreedyBear

opbot-xd · 2026-02-27T19:03:32Z

Description

Implemented Phase 2 of the IP enrichment feature by exposing tags and enabling tag-based filtering.

Key changes:

Exposed tags natively on the feeds_advanced and feeds (public) JSON endpoints by batch-fetching tags and eliminating N+1 DB queries (Tag.objects.filter(ioc_id__in=...)).
Included tags natively via the EnrichmentSerializer (in the IOC object) to expose them on the /api/enrichment/ endpoint.
Enabled filtering directly by tag_key and matching via substring on tag_value exclusively on feeds_advanced (authenticated).
Public JSON endpoint (feeds) exposes the tags but strictly ignores tag queries. Text/CSV endpoints are intentionally untouched and return only IOC values.
Implemented comprehensive tests (tests/api/views/test_feeds_tags.py) for tag inclusion, structural validation, cross-IOC bleeding, filtering combinations (keys and values), and unauthenticated access locks. Test suite time also optimized by dropping redundant DB queries inside feeds_response.

Related issues

Closes #522

Type of change

Bug fix (non-breaking change which fixes an issue).
New feature (non-breaking change which adds functionality).
Breaking change (fix or feature that would cause existing functionality to not work as expected).
Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Please complete this checklist carefully. It helps guide your contribution and lets maintainers verify that all requirements are met.

Formalities

I have read and understood the rules about how to Contribute to this project.
I chose an appropriate title for the pull request in the form: <feature name>. Closes #999
My branch is based on develop.
The pull request is for the branch develop.
I have reviewed and verified any LLM-generated code included in this PR.

Docs and tests

I documented my code changes with docstrings and/or comments.
I have checked if my changes affect user-facing behavior that is described in in the docs. If so, I also created a pull request in the docs repository.
Linter (Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
I have added tests for the feature/bug I solved.
All the tests gave 0 errors.

GUI changes

Ignore this section if you did not make any changes to the GUI.

I have provided a screenshot of the result in the PR.
I have created new frontend tests for the new component or updated existing ones.

Copilot

Pull request overview

This PR implements Phase 2 of the IP enrichment tagging feature by exposing IOC tags in JSON API responses and enabling tag-based filtering on the authenticated advanced feeds endpoint.

Changes:

Adds tags to IOC objects returned by /api/feeds/advanced/, /api/feeds/ (paginated JSON), and /api/enrichment.
Implements tag-based filtering (tag_key, tag_value) for /api/feeds/advanced/ only.
Adds a dedicated test suite to validate tag inclusion, structure, filtering behavior, and access controls.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`api/views/utils.py`	Adds tag filter parameters, tag-based queryset filtering (advanced-only), and batch tag fetch to avoid N+1 queries in JSON feed responses.
`api/views/feeds.py`	Enables tag filtering for `feeds_advanced` and documents new query parameters.
`api/serializers.py`	Adds `TagSerializer`, exposes tags through `IOCSerializer`, and updates request/response serializers to accept/describe tag fields.
`tests/api/views/test_feeds_tags.py`	Introduces comprehensive tests for tag exposure and tag filtering behavior across feeds and enrichment endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api/views/utils.py

api/views/feeds.py

api/serializers.py

tests/api/views/test_feeds_tags.py

opbot-xd · 2026-02-27T19:44:42Z

Hi @regulartim the PR is ready for review. Review it when you get time. Thanks.

regulartim · 2026-02-28T09:30:22Z

Hey @opbot-xd ! There is a merge conflict, probably due to a recent merge. Could you please solve it? I am planning to review Monday morning (european time 😉).

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

greedybear/migrations/0038_tag.py:21

This migration file is being modified to add db_index=True on existing fields. In Django, migrations are intended to be immutable once merged/applied; editing an existing migration can leave already-migrated databases without the new indexes while fresh installs get them, causing schema drift.

Instead, revert changes to this existing migration and add a new migration (e.g. AlterField/AddIndex) that adds the desired indexes.
tests/api/views/test_feeds_tags.py:75

These filter tests don’t currently assert that IOCs with non-matching tags (e.g. self.ioc_2, which has a different tag_key) are excluded. As written, the test would still pass if the endpoint returned extra IOCs alongside the expected one.

Consider asserting that all returned IOCs match the requested tag criteria (or at least explicitly assert self.ioc_2 is not present) to prevent false positives.

    def test_200_filter_by_tag_key(self):
        """Filtering by tag_key should return only IOCs with matching tags."""
        response = self.client.get("/api/feeds/advanced/?tag_key=malware")
        self.assertEqual(response.status_code, 200)

        iocs = response.json()["iocs"]
        values = {i["value"] for i in iocs}
        self.assertIn(self.ioc.name, values)
        self.assertNotIn(self.ioc_domain.name, values)

api/views/utils.py:81

FeedRequestParams’ class docstring lists the supported attributes but doesn’t mention the newly added tag_key/tag_value fields. Updating the attribute list would keep the inline documentation accurate for future maintainers.

        self.verbose = query_params.get("verbose", "false").lower()
        self.paginate = query_params.get("paginate", "false").lower()
        self.format = query_params.get("format_", "json").lower()
        self.tag_key = query_params.get("tag_key", "").strip()
        self.tag_value = query_params.get("tag_value", "").strip()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-28T22:33:59Z

greedybear/models.py

+    key = models.CharField(max_length=128, db_index=True)
+    value = models.CharField(max_length=256, db_index=True)


Adding a plain btree index on Tag.value is unlikely to help the new tag_value substring filter, since PostgreSQL typically cannot use a btree index for ILIKE/__icontains queries. This adds write/storage overhead without improving the main query.

If tag_value filtering needs to be performant, consider a trigram GIN/GiST index (pg_trgm) or changing the filter semantics to equality/prefix where an index can be used.

Hi @regulartim
When I was integrating Phase 2, my PR had previously created the 0038_add_tag_model migration file. Due to copilot review feedback, i needed to optimize the database queries by adding db_index=True directly into the Tag model table. To avoid polluting the DB migrations with two sequential model modifications back-to-back, I deleted 0038 and completely regenerated it as a clean new migration called 0038_tag.py.

In the meantime, develop was merged into my branch, bringing in a new migration 0039 which had 0038_add_tag_model hardcoded as its dependency. I manually updated dependencies = [('greedybear', '0038_tag')] inside 0039_ioc_attacker_country_sensor_country_and_more.py to fix this clash and restore the DB schema graph cleanly!

is it fine or would it create some complications?

Hey @opbot-xd ! :) I think it's better to just create a new migration. Changing old migration files after they landed in a branch that other people could use, might break their instances in unpredictable ways.

regulartim · 2026-03-02T07:25:15Z

I have checked if my changes affect user-facing behavior that is described in in the docs. If so, I also created a pull request in the docs repository.

I think your changes DO affect user-facing behavior as the API responses contain additional fields now, right?

api/views/feeds.py

api/views/utils.py

greedybear/migrations/0038_tag.py

api/views/utils.py

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

greedybear/migrations/0040_alter_tag_key.py

opbot-xd · 2026-03-02T17:19:04Z

I think your changes DO affect user-facing behavior as the API responses contain additional fields now, right?

Yes, would raise a PR at docs repo once this get merged. Updated the description.

Also sorry for late responses, was busy with university midterm exams.

regulartim · 2026-03-02T17:46:33Z

Also sorry for late responses, was busy with university midterm exams.

No problem, take your time! :) If you want to pause your work on this, feel free to just convert the PR to a draft again. This is not blocking anything and I think you are a reliable person, so I don't mind if you pause your work for a few days or even weeks.

opbot-xd · 2026-03-02T18:27:28Z

No problem, take your time! :) If you want to pause your work on this, feel free to just convert the PR to a draft again. This is not blocking anything and I think you are a reliable person, so I don't mind if you pause your work for a few days or even weeks.

Thanks for understanding! My exams are over now, so I’ll like to resume work on this.

regulartim

Hey @opbot-xd ! :) I still see a few issues here:

I am not sure about queriing tags in the feed_response function, see comment
there is a pending migration, you have to run makemigrations again

regulartim · 2026-03-03T09:26:06Z

api/views/utils.py

+            # Annotate tags via a DB JOIN rather than a Python-level join.
+            # `tags_json` avoids conflicting with the `tags` reverse FK on IOC.
+            # For paginated (list) paths the queryset is already evaluated, so we
+            # run a second targeted query on the slice IDs instead.
+            tags_lookup: dict = {}
+            if isinstance(iocs, list):
+                ioc_ids = [ioc.id for ioc in iocs if hasattr(ioc, "id")]
+                if ioc_ids:
+                    tag_rows = (
+                        IOC.objects.filter(id__in=ioc_ids)
+                        .annotate(
+                            tags_json=ArrayAgg(
+                                JSONObject(key=F("tags__key"), value=F("tags__value"), source=F("tags__source")),
+                                filter=Q(tags__isnull=False),
+                                default=Value([]),
+                                distinct=True,  # prevent duplication from the honeypot JOIN
+                            )
+                        )
+                        .values("id", "tags_json")
+                    )
+                    tags_lookup = {row["id"]: row["tags_json"] for row in tag_rows}
+            else:
+                iocs = iocs.annotate(
+                    tags_json=ArrayAgg(
+                        JSONObject(key=F("tags__key"), value=F("tags__value"), source=F("tags__source")),
+                        filter=Q(tags__isnull=False),
+                        default=Value([]),
+                        distinct=True,  # prevent duplication from the honeypot JOIN
+                    )
+                )
+                required_fields = tuple(f if f != "tags" else "tags_json" for f in required_fields)

-            # Collect values; `honeypots` will contain the list of associated honeypot names
-            iocs = (ioc_as_dict(ioc, required_fields) for ioc in iocs) if isinstance(iocs, list) else iocs.values(*required_fields)
-            for ioc in iocs:
+            # Generate dictionaries iterably to avoid holding both dicts and json_list in memory
+            iocs_iter: object
+            if isinstance(iocs, list):
+                iocs_iter = (ioc_as_dict(ioc, set(required_fields) | {"id"}) for ioc in iocs)  # id needed for tags_lookup
+            else:
+                iocs_iter = iocs.values(*required_fields).iterator(chunk_size=2000)
+            for ioc in iocs_iter:


Why do you fetch the tag here and not in get_queryset? There you could just join them like the general honeypots:

if not is_aggregated: iocs = iocs.filter(general_honeypot__active=True) iocs = iocs.annotate(honeypots=ArrayAgg("general_honeypot__name")) iocs = iocs.annotate(tags_json=ArrayAgg(...) iocs = iocs.order_by(feed_params.ordering) iocs = iocs[: int(feed_params.feed_size)]

This would be much simpler as we don't need to cover the case where iocs is no queryset.

Nice!
This approach is much cleaner! Though I had to also add the tags_json annotation to the repository methods (get_scanners_by_pks and get_recent_scanners) since they bypass get_queryset but still call feeds_response for ML scoring workflows.

The benefit is worth it though.

Changes made:

Moved tags_json annotation to get_queryset alongside honeypots

Removed tags_lookup dictionary logic from feeds_response

Added tags_json annotation to IocRepository.get_scanners_by_pks() and IocRepository.get_recent_scanners()

…ntelowlproject#522)

…ndex in 0040 - Rename 0038_tag.py → 0038_add_tag_model.py to match upstream/develop exactly - Remove db_index=True from 0038 (keep it identical to upstream) - Add 0040_tag_key_index.py: separate migration for Tag.key btree index on top of upstream's 0039 (GeoIP attacker_country/sensor.country)

…ject) for tags - Remove _prefetch_tags helper (manual reimplementation of prefetch_related) - Annotate tags directly on the queryset using ArrayAgg(JSONObject(key, value, source)) with Q(tags__isnull=False) to skip IOCs without tags and distinct=True to prevent duplication from the general_honeypot JOIN - For paginated (list) paths: run a targeted ArrayAgg query on the slice IDs - Use annotation name tags_json to avoid conflict with the tags reverse FK on IOC - Remove redundant comments, keep only non-obvious ones Addresses code review feedback from regulartim.

Pass tag_key and tag_value as explicit kwargs to get_queryset instead of storing them in FeedRequestParams. They are read directly from request.query_params only in feeds_advanced, so the standard feeds endpoint can never trigger tag filtering regardless of what query params are passed. Removes enable_tag_filtering flag entirely. Suggested by regulartim.

- Annotate tags_json in get_queryset alongside honeypots annotation - Simplifies feeds_response by removing dual list/queryset handling - Add tags_json to repository methods (get_scanners_by_pks, get_recent_scanners) - Eliminates separate query for paginated IOCs Addresses regulartim feedback.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

greedybear/cronjobs/repositories/ioc.py

api/views/utils.py

- Only annotate tags_json when format is JSON to avoid unnecessary JOINs and aggregation for txt/csv downloads - Add distinct=True to honeypots ArrayAgg in repository methods to prevent duplicate names when IOCs have multiple tags Addresses Copilot feedback.

regulartim

Looks good, thanks @opbot-xd ! :) Only minor details left.

regulartim · 2026-03-04T07:32:42Z

greedybear/cronjobs/repositories/ioc.py

If I understand your code correctly, the changes to this file should server the tags to the ML jobs, right? But there, they are thrown away again, because get_features does not know of them.

Also, I am not a 100% sure if it is a good idea to use the Tags for training. This might introduce weird biases. I had the similar doubts when I added the ip_reputation as a feature, but decided to include it after testing the inclusion thoroughly.

So, bottom line: I would prefer not to add the Tags to scoring for now. This would be a separate issue / PR requiring a lot of research and testing.

Thanks for pointing this out!
Updated the logic to handle both cases correctly:

if isinstance(iocs, list): has_tags_annotation = bool(iocs) and hasattr(iocs[0], "tags_json") else: has_tags_annotation = "tags_json" in getattr( iocs, "query", type("", (), {"annotations": {}})()" ).annotations

api/views/utils.py

…nd adapt API views to conditionally include it and truncate tag query parameters.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

api/views/utils.py:306

In the isinstance(iocs, list) branch, set(required_fields) is rebuilt for every IOC because it’s inside the generator expression. Precomputing the set once (e.g., fields_set = set(required_fields)) and reusing it will avoid repeated allocations when paginating large result sets.

            iocs_iter: object
            if isinstance(iocs, list):
                iocs_iter = (ioc_as_dict(ioc, set(required_fields)) for ioc in iocs)
            else:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

api/views/feeds.py

…ers in advanced feed view.

regulartim

Alright, thanks for your work! :)

…lowlproject#522 (intelowlproject#899) * feat: expose tags in API responses and add tag-based filtering * feat(api): expose tags in API responses and add tag-based filtering (intelowlproject#522) * refactor(migrations): rename 0038 to match upstream and add Tag.key index in 0040 - Rename 0038_tag.py → 0038_add_tag_model.py to match upstream/develop exactly - Remove db_index=True from 0038 (keep it identical to upstream) - Add 0040_tag_key_index.py: separate migration for Tag.key btree index on top of upstream's 0039 (GeoIP attacker_country/sensor.country) * refactor(feeds): replace _prefetch_tags with DB-level ArrayAgg(JSONObject) for tags - Remove _prefetch_tags helper (manual reimplementation of prefetch_related) - Annotate tags directly on the queryset using ArrayAgg(JSONObject(key, value, source)) with Q(tags__isnull=False) to skip IOCs without tags and distinct=True to prevent duplication from the general_honeypot JOIN - For paginated (list) paths: run a targeted ArrayAgg query on the slice IDs - Use annotation name tags_json to avoid conflict with the tags reverse FK on IOC - Remove redundant comments, keep only non-obvious ones Addresses code review feedback from regulartim. * refactor(feeds): move tag_key/tag_value out of FeedRequestParams Pass tag_key and tag_value as explicit kwargs to get_queryset instead of storing them in FeedRequestParams. They are read directly from request.query_params only in feeds_advanced, so the standard feeds endpoint can never trigger tag filtering regardless of what query params are passed. Removes enable_tag_filtering flag entirely. Suggested by regulartim. * refactor(feeds): move tags_json annotation to get_queryset - Annotate tags_json in get_queryset alongside honeypots annotation - Simplifies feeds_response by removing dual list/queryset handling - Add tags_json to repository methods (get_scanners_by_pks, get_recent_scanners) - Eliminates separate query for paginated IOCs Addresses regulartim feedback. * perf: optimize tags_json annotation and fix honeypots deduplication - Only annotate tags_json when format is JSON to avoid unnecessary JOINs and aggregation for txt/csv downloads - Add distinct=True to honeypots ArrayAgg in repository methods to prevent duplicate names when IOCs have multiple tags Addresses Copilot feedback. * Refactor: Remove `tags_json` annotation from IOC repository queries and adapt API views to conditionally include it and truncate tag query parameters. * refactor: remove redundant assignment of verbose and paginate parameters in advanced feed view.

Copilot AI review requested due to automatic review settings February 27, 2026 19:03

Copilot started reviewing on behalf of opbot-xd February 27, 2026 19:04 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

api/views/utils.py Outdated Show resolved Hide resolved

api/views/utils.py Outdated Show resolved Hide resolved

api/views/feeds.py Outdated Show resolved Hide resolved

api/serializers.py Outdated Show resolved Hide resolved

tests/api/views/test_feeds_tags.py Show resolved Hide resolved

Copilot AI review requested due to automatic review settings February 28, 2026 22:28

Copilot started reviewing on behalf of opbot-xd February 28, 2026 22:28 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

regulartim requested changes Mar 2, 2026

View reviewed changes

api/views/feeds.py Outdated Show resolved Hide resolved

api/views/utils.py Outdated Show resolved Hide resolved

greedybear/migrations/0038_tag.py Show resolved Hide resolved

api/views/utils.py Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 2, 2026 16:54

opbot-xd force-pushed the feature/522-phase2-tag-feeds branch from 55a4e61 to 498807c Compare March 2, 2026 16:54

Copilot started reviewing on behalf of opbot-xd March 2, 2026 16:56 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

greedybear/migrations/0040_alter_tag_key.py Show resolved Hide resolved

opbot-xd requested a review from regulartim March 2, 2026 18:20

regulartim requested changes Mar 3, 2026

View reviewed changes

opbot-xd added 6 commits March 4, 2026 07:37

feat: expose tags in API responses and add tag-based filtering

812de7c

feat(api): expose tags in API responses and add tag-based filtering (i…

ea4f88c

…ntelowlproject#522)

Copilot AI review requested due to automatic review settings March 4, 2026 02:12

opbot-xd force-pushed the feature/522-phase2-tag-feeds branch from cbcc2f1 to e4083e9 Compare March 4, 2026 02:12

Copilot started reviewing on behalf of opbot-xd March 4, 2026 02:13 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

greedybear/cronjobs/repositories/ioc.py Outdated Show resolved Hide resolved

greedybear/cronjobs/repositories/ioc.py Outdated Show resolved Hide resolved

api/views/utils.py Outdated Show resolved Hide resolved

api/views/utils.py Show resolved Hide resolved

opbot-xd requested a review from regulartim March 4, 2026 03:01

regulartim requested changes Mar 4, 2026

View reviewed changes

Refactor: Remove tags_json annotation from IOC repository queries a…

7185782

…nd adapt API views to conditionally include it and truncate tag query parameters.

Copilot AI review requested due to automatic review settings March 4, 2026 11:46

Copilot started reviewing on behalf of opbot-xd March 4, 2026 11:47 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

api/views/feeds.py Show resolved Hide resolved

refactor: remove redundant assignment of verbose and paginate paramet…

9a439bb

…ers in advanced feed view.

opbot-xd requested a review from regulartim March 4, 2026 12:03

regulartim approved these changes Mar 4, 2026

View reviewed changes

regulartim merged commit 6e5df77 into intelowlproject:develop Mar 4, 2026
4 checks passed

opbot-xd mentioned this pull request Mar 4, 2026

docs(GreedyBear): document tags in API responses, tag-based filtering, and enrichment config. Related to intelowlproject/GreedyBear#899 intelowlproject/docs#59

Open

4 tasks

		key = models.CharField(max_length=128, db_index=True)
		value = models.CharField(max_length=256, db_index=True)

Uh oh!

Conversation

opbot-xd commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Type of change

Checklist

Formalities

Docs and tests

GUI changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

opbot-xd commented Feb 27, 2026

Uh oh!

regulartim commented Feb 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

opbot-xd Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

regulartim Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

regulartim commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

opbot-xd commented Mar 2, 2026

Uh oh!

regulartim commented Mar 2, 2026

Uh oh!

opbot-xd commented Mar 2, 2026

Uh oh!

regulartim left a comment

Choose a reason for hiding this comment

Uh oh!

regulartim Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

regulartim Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

opbot-xd Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

regulartim left a comment

Choose a reason for hiding this comment

Uh oh!

regulartim Mar 4, 2026

opbot-xd commented Feb 27, 2026 •

edited

Loading