Skip to content

expose tags in API responses and add tag-based filtering. Closes #522#899

Merged
regulartim merged 9 commits intointelowlproject:developfrom
opbot-xd:feature/522-phase2-tag-feeds
Mar 4, 2026
Merged

expose tags in API responses and add tag-based filtering. Closes #522#899
regulartim merged 9 commits intointelowlproject:developfrom
opbot-xd:feature/522-phase2-tag-feeds

Conversation

@opbot-xd
Copy link
Copy Markdown
Contributor

@opbot-xd opbot-xd commented Feb 27, 2026

Description

Implemented Phase 2 of the IP enrichment feature by exposing tags and enabling tag-based filtering.

Key changes:

  • Exposed tags natively on the feeds_advanced and feeds (public) JSON endpoints by batch-fetching tags and eliminating N+1 DB queries (Tag.objects.filter(ioc_id__in=...)).
  • Included tags natively via the EnrichmentSerializer (in the IOC object) to expose them on the /api/enrichment/ endpoint.
  • Enabled filtering directly by tag_key and matching via substring on tag_value exclusively on feeds_advanced (authenticated).
  • Public JSON endpoint (feeds) exposes the tags but strictly ignores tag queries. Text/CSV endpoints are intentionally untouched and return only IOC values.
  • Implemented comprehensive tests (tests/api/views/test_feeds_tags.py) for tag inclusion, structural validation, cross-IOC bleeding, filtering combinations (keys and values), and unauthenticated access locks. Test suite time also optimized by dropping redundant DB queries inside feeds_response.

Related issues

Closes #522

Type of change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Please complete this checklist carefully. It helps guide your contribution and lets maintainers verify that all requirements are met.

Formalities

  • I have read and understood the rules about how to Contribute to this project.
  • I chose an appropriate title for the pull request in the form: <feature name>. Closes #999
  • My branch is based on develop.
  • The pull request is for the branch develop.
  • I have reviewed and verified any LLM-generated code included in this PR.

Docs and tests

  • I documented my code changes with docstrings and/or comments.
  • I have checked if my changes affect user-facing behavior that is described in in the docs. If so, I also created a pull request in the docs repository.
  • Linter (Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • I have added tests for the feature/bug I solved.
  • All the tests gave 0 errors.

GUI changes

Ignore this section if you did not make any changes to the GUI.

  • I have provided a screenshot of the result in the PR.
  • I have created new frontend tests for the new component or updated existing ones.

Copilot AI review requested due to automatic review settings February 27, 2026 19:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phase 2 of the IP enrichment tagging feature by exposing IOC tags in JSON API responses and enabling tag-based filtering on the authenticated advanced feeds endpoint.

Changes:

  • Adds tags to IOC objects returned by /api/feeds/advanced/, /api/feeds/ (paginated JSON), and /api/enrichment.
  • Implements tag-based filtering (tag_key, tag_value) for /api/feeds/advanced/ only.
  • Adds a dedicated test suite to validate tag inclusion, structure, filtering behavior, and access controls.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
api/views/utils.py Adds tag filter parameters, tag-based queryset filtering (advanced-only), and batch tag fetch to avoid N+1 queries in JSON feed responses.
api/views/feeds.py Enables tag filtering for feeds_advanced and documents new query parameters.
api/serializers.py Adds TagSerializer, exposes tags through IOCSerializer, and updates request/response serializers to accept/describe tag fields.
tests/api/views/test_feeds_tags.py Introduces comprehensive tests for tag exposure and tag filtering behavior across feeds and enrichment endpoints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@opbot-xd
Copy link
Copy Markdown
Contributor Author

Hi @regulartim the PR is ready for review. Review it when you get time. Thanks.

@regulartim
Copy link
Copy Markdown
Collaborator

Hey @opbot-xd ! There is a merge conflict, probably due to a recent merge. Could you please solve it? I am planning to review Monday morning (european time 😉).

Copilot AI review requested due to automatic review settings February 28, 2026 22:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

greedybear/migrations/0038_tag.py:21

  • This migration file is being modified to add db_index=True on existing fields. In Django, migrations are intended to be immutable once merged/applied; editing an existing migration can leave already-migrated databases without the new indexes while fresh installs get them, causing schema drift.

Instead, revert changes to this existing migration and add a new migration (e.g. AlterField/AddIndex) that adds the desired indexes.
tests/api/views/test_feeds_tags.py:75

  • These filter tests don’t currently assert that IOCs with non-matching tags (e.g. self.ioc_2, which has a different tag_key) are excluded. As written, the test would still pass if the endpoint returned extra IOCs alongside the expected one.

Consider asserting that all returned IOCs match the requested tag criteria (or at least explicitly assert self.ioc_2 is not present) to prevent false positives.

    def test_200_filter_by_tag_key(self):
        """Filtering by tag_key should return only IOCs with matching tags."""
        response = self.client.get("/api/feeds/advanced/?tag_key=malware")
        self.assertEqual(response.status_code, 200)

        iocs = response.json()["iocs"]
        values = {i["value"] for i in iocs}
        self.assertIn(self.ioc.name, values)
        self.assertNotIn(self.ioc_domain.name, values)

api/views/utils.py:81

  • FeedRequestParams’ class docstring lists the supported attributes but doesn’t mention the newly added tag_key/tag_value fields. Updating the attribute list would keep the inline documentation accurate for future maintainers.
        self.verbose = query_params.get("verbose", "false").lower()
        self.paginate = query_params.get("paginate", "false").lower()
        self.format = query_params.get("format_", "json").lower()
        self.tag_key = query_params.get("tag_key", "").strip()
        self.tag_value = query_params.get("tag_value", "").strip()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +194 to +195
key = models.CharField(max_length=128, db_index=True)
value = models.CharField(max_length=256, db_index=True)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a plain btree index on Tag.value is unlikely to help the new tag_value substring filter, since PostgreSQL typically cannot use a btree index for ILIKE/__icontains queries. This adds write/storage overhead without improving the main query.

If tag_value filtering needs to be performant, consider a trigram GIN/GiST index (pg_trgm) or changing the filter semantics to equality/prefix where an index can be used.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @regulartim
When I was integrating Phase 2, my PR had previously created the 0038_add_tag_model migration file. Due to copilot review feedback, i needed to optimize the database queries by adding db_index=True directly into the Tag model table. To avoid polluting the DB migrations with two sequential model modifications back-to-back, I deleted 0038 and completely regenerated it as a clean new migration called 0038_tag.py.

In the meantime, develop was merged into my branch, bringing in a new migration 0039 which had 0038_add_tag_model hardcoded as its dependency. I manually updated dependencies = [('greedybear', '0038_tag')] inside 0039_ioc_attacker_country_sensor_country_and_more.py to fix this clash and restore the DB schema graph cleanly!

is it fine or would it create some complications?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opbot-xd ! :) I think it's better to just create a new migration. Changing old migration files after they landed in a branch that other people could use, might break their instances in unpredictable ways.

@regulartim
Copy link
Copy Markdown
Collaborator

I have checked if my changes affect user-facing behavior that is described in in the docs. If so, I also created a pull request in the docs repository.

I think your changes DO affect user-facing behavior as the API responses contain additional fields now, right?

Copilot AI review requested due to automatic review settings March 2, 2026 16:54
@opbot-xd opbot-xd force-pushed the feature/522-phase2-tag-feeds branch from 55a4e61 to 498807c Compare March 2, 2026 16:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@opbot-xd
Copy link
Copy Markdown
Contributor Author

opbot-xd commented Mar 2, 2026

I think your changes DO affect user-facing behavior as the API responses contain additional fields now, right?

Yes, would raise a PR at docs repo once this get merged. Updated the description.

Also sorry for late responses, was busy with university midterm exams.

@regulartim
Copy link
Copy Markdown
Collaborator

Also sorry for late responses, was busy with university midterm exams.

No problem, take your time! :) If you want to pause your work on this, feel free to just convert the PR to a draft again. This is not blocking anything and I think you are a reliable person, so I don't mind if you pause your work for a few days or even weeks.

@opbot-xd opbot-xd requested a review from regulartim March 2, 2026 18:20
@opbot-xd
Copy link
Copy Markdown
Contributor Author

opbot-xd commented Mar 2, 2026

No problem, take your time! :) If you want to pause your work on this, feel free to just convert the PR to a draft again. This is not blocking anything and I think you are a reliable person, so I don't mind if you pause your work for a few days or even weeks.

Thanks for understanding! My exams are over now, so I’ll like to resume work on this.

Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opbot-xd ! :) I still see a few issues here:

  • I am not sure about queriing tags in the feed_response function, see comment
  • there is a pending migration, you have to run makemigrations again

Comment on lines +274 to +312
# Annotate tags via a DB JOIN rather than a Python-level join.
# `tags_json` avoids conflicting with the `tags` reverse FK on IOC.
# For paginated (list) paths the queryset is already evaluated, so we
# run a second targeted query on the slice IDs instead.
tags_lookup: dict = {}
if isinstance(iocs, list):
ioc_ids = [ioc.id for ioc in iocs if hasattr(ioc, "id")]
if ioc_ids:
tag_rows = (
IOC.objects.filter(id__in=ioc_ids)
.annotate(
tags_json=ArrayAgg(
JSONObject(key=F("tags__key"), value=F("tags__value"), source=F("tags__source")),
filter=Q(tags__isnull=False),
default=Value([]),
distinct=True, # prevent duplication from the honeypot JOIN
)
)
.values("id", "tags_json")
)
tags_lookup = {row["id"]: row["tags_json"] for row in tag_rows}
else:
iocs = iocs.annotate(
tags_json=ArrayAgg(
JSONObject(key=F("tags__key"), value=F("tags__value"), source=F("tags__source")),
filter=Q(tags__isnull=False),
default=Value([]),
distinct=True, # prevent duplication from the honeypot JOIN
)
)
required_fields = tuple(f if f != "tags" else "tags_json" for f in required_fields)

# Collect values; `honeypots` will contain the list of associated honeypot names
iocs = (ioc_as_dict(ioc, required_fields) for ioc in iocs) if isinstance(iocs, list) else iocs.values(*required_fields)
for ioc in iocs:
# Generate dictionaries iterably to avoid holding both dicts and json_list in memory
iocs_iter: object
if isinstance(iocs, list):
iocs_iter = (ioc_as_dict(ioc, set(required_fields) | {"id"}) for ioc in iocs) # id needed for tags_lookup
else:
iocs_iter = iocs.values(*required_fields).iterator(chunk_size=2000)
for ioc in iocs_iter:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you fetch the tag here and not in get_queryset? There you could just join them like the general honeypots:

    if not is_aggregated:
        iocs = iocs.filter(general_honeypot__active=True)
        iocs = iocs.annotate(honeypots=ArrayAgg("general_honeypot__name"))
        iocs = iocs.annotate(tags_json=ArrayAgg(...)
        iocs = iocs.order_by(feed_params.ordering)
        iocs = iocs[: int(feed_params.feed_size)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much simpler as we don't need to cover the case where iocs is no queryset.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
This approach is much cleaner! Though I had to also add the tags_json annotation to the repository methods (get_scanners_by_pks and get_recent_scanners) since they bypass get_queryset but still call feeds_response for ML scoring workflows.

The benefit is worth it though.

Changes made:

  • Moved tags_json annotation to get_queryset alongside honeypots
  • Removed tags_lookup dictionary logic from feeds_response
  • Added tags_json annotation to IocRepository.get_scanners_by_pks() and IocRepository.get_recent_scanners()

opbot-xd added 6 commits March 4, 2026 07:37
…ndex in 0040

- Rename 0038_tag.py → 0038_add_tag_model.py to match upstream/develop exactly
- Remove db_index=True from 0038 (keep it identical to upstream)
- Add 0040_tag_key_index.py: separate migration for Tag.key btree index
  on top of upstream's 0039 (GeoIP attacker_country/sensor.country)
…ject) for tags

- Remove _prefetch_tags helper (manual reimplementation of prefetch_related)
- Annotate tags directly on the queryset using ArrayAgg(JSONObject(key, value, source))
  with Q(tags__isnull=False) to skip IOCs without tags and distinct=True to prevent
  duplication from the general_honeypot JOIN
- For paginated (list) paths: run a targeted ArrayAgg query on the slice IDs
- Use annotation name tags_json to avoid conflict with the tags reverse FK on IOC
- Remove redundant comments, keep only non-obvious ones

Addresses code review feedback from regulartim.
Pass tag_key and tag_value as explicit kwargs to get_queryset instead of
storing them in FeedRequestParams. They are read directly from
request.query_params only in feeds_advanced, so the standard feeds
endpoint can never trigger tag filtering regardless of what query params
are passed. Removes enable_tag_filtering flag entirely.

Suggested by regulartim.
- Annotate tags_json in get_queryset alongside honeypots annotation
- Simplifies feeds_response by removing dual list/queryset handling
- Add tags_json to repository methods (get_scanners_by_pks, get_recent_scanners)
- Eliminates separate query for paginated IOCs

Addresses regulartim feedback.
Copilot AI review requested due to automatic review settings March 4, 2026 02:12
@opbot-xd opbot-xd force-pushed the feature/522-phase2-tag-feeds branch from cbcc2f1 to e4083e9 Compare March 4, 2026 02:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Only annotate tags_json when format is JSON to avoid unnecessary
  JOINs and aggregation for txt/csv downloads
- Add distinct=True to honeypots ArrayAgg in repository methods to
  prevent duplicate names when IOCs have multiple tags

Addresses Copilot feedback.
@opbot-xd opbot-xd requested a review from regulartim March 4, 2026 03:01
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @opbot-xd ! :) Only minor details left.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your code correctly, the changes to this file should server the tags to the ML jobs, right? But there, they are thrown away again, because get_features does not know of them.

Also, I am not a 100% sure if it is a good idea to use the Tags for training. This might introduce weird biases. I had the similar doubts when I added the ip_reputation as a feature, but decided to include it after testing the inclusion thoroughly.

So, bottom line: I would prefer not to add the Tags to scoring for now. This would be a separate issue / PR requiring a lot of research and testing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out!
Updated the logic to handle both cases correctly:

if isinstance(iocs, list):
    has_tags_annotation = bool(iocs) and hasattr(iocs[0], "tags_json")
else:
    has_tags_annotation = "tags_json" in getattr(
        iocs, "query", type("", (), {"annotations": {}})()"
    ).annotations

…nd adapt API views to conditionally include it and truncate tag query parameters.
Copilot AI review requested due to automatic review settings March 4, 2026 11:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

api/views/utils.py:306

  • In the isinstance(iocs, list) branch, set(required_fields) is rebuilt for every IOC because it’s inside the generator expression. Precomputing the set once (e.g., fields_set = set(required_fields)) and reusing it will avoid repeated allocations when paginating large result sets.
            iocs_iter: object
            if isinstance(iocs, list):
                iocs_iter = (ioc_as_dict(ioc, set(required_fields)) for ioc in iocs)
            else:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@opbot-xd opbot-xd requested a review from regulartim March 4, 2026 12:03
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks for your work! :)

@regulartim regulartim merged commit 6e5df77 into intelowlproject:develop Mar 4, 2026
4 checks passed
cclts pushed a commit to cclts/GreedyBear that referenced this pull request Mar 11, 2026
…lowlproject#522 (intelowlproject#899)

* feat: expose tags in API responses and add tag-based filtering

* feat(api): expose tags in API responses and add tag-based filtering (intelowlproject#522)

* refactor(migrations): rename 0038 to match upstream and add Tag.key index in 0040

- Rename 0038_tag.py → 0038_add_tag_model.py to match upstream/develop exactly
- Remove db_index=True from 0038 (keep it identical to upstream)
- Add 0040_tag_key_index.py: separate migration for Tag.key btree index
  on top of upstream's 0039 (GeoIP attacker_country/sensor.country)

* refactor(feeds): replace _prefetch_tags with DB-level ArrayAgg(JSONObject) for tags

- Remove _prefetch_tags helper (manual reimplementation of prefetch_related)
- Annotate tags directly on the queryset using ArrayAgg(JSONObject(key, value, source))
  with Q(tags__isnull=False) to skip IOCs without tags and distinct=True to prevent
  duplication from the general_honeypot JOIN
- For paginated (list) paths: run a targeted ArrayAgg query on the slice IDs
- Use annotation name tags_json to avoid conflict with the tags reverse FK on IOC
- Remove redundant comments, keep only non-obvious ones

Addresses code review feedback from regulartim.

* refactor(feeds): move tag_key/tag_value out of FeedRequestParams

Pass tag_key and tag_value as explicit kwargs to get_queryset instead of
storing them in FeedRequestParams. They are read directly from
request.query_params only in feeds_advanced, so the standard feeds
endpoint can never trigger tag filtering regardless of what query params
are passed. Removes enable_tag_filtering flag entirely.

Suggested by regulartim.

* refactor(feeds): move tags_json annotation to get_queryset

- Annotate tags_json in get_queryset alongside honeypots annotation
- Simplifies feeds_response by removing dual list/queryset handling
- Add tags_json to repository methods (get_scanners_by_pks, get_recent_scanners)
- Eliminates separate query for paginated IOCs

Addresses regulartim feedback.

* perf: optimize tags_json annotation and fix honeypots deduplication

- Only annotate tags_json when format is JSON to avoid unnecessary
  JOINs and aggregation for txt/csv downloads
- Add distinct=True to honeypots ArrayAgg in repository methods to
  prevent duplicate names when IOCs have multiple tags

Addresses Copilot feedback.

* Refactor: Remove `tags_json` annotation from IOC repository queries and adapt API views to conditionally include it and truncate tag query parameters.

* refactor: remove redundant assignment of verbose and paginate parameters in advanced feed view.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants