Skip to content

Advanced filtering, STIX 2.1 export, and shareable feed URLs#839

Merged
regulartim merged 14 commits intoGreedyBear-Project:developfrom
R1sh0bh-1:feature/advanced-feeds-enhancements
Mar 6, 2026
Merged

Advanced filtering, STIX 2.1 export, and shareable feed URLs#839
regulartim merged 14 commits intoGreedyBear-Project:developfrom
R1sh0bh-1:feature/advanced-feeds-enhancements

Conversation

@R1sh0bh-1
Copy link
Copy Markdown
Contributor

Description

This PR fully implements the enhancements proposed in #831 for the Advanced Feeds API.

Main changes:

  • Added new query filters: asn, min_score (filters on recurrence_probability), port, start_date, end_date
  • Added support for STIX 2.1 export (format=stix21) — generates valid STIX 2.1 bundles with proper Indicator objects, confidence, labels, patterns, and external references back to GreedyBear
  • Implemented optional shareable feed URLs:
    • Authenticated /api/feeds/share endpoint that creates a signed, tamper-proof token encoding the feed config
    • Public, rate-limited /api/feeds/consume/<token> endpoint (10 req/min) for consuming the feed without authentication
  • Added stix2 dependency
  • Configured dedicated rate limiting scope for shared feeds in settings.py

All of this builds directly on the existing Advanced Feeds API no new separate endpoint was created, just enhancements as discussed.

Related issues

Closes #831

Verification Summary (with mock data)

I populated the database with two realistic test IOCs to test isolation, export, and sharing:

  1. High-confidence IOC

    • Value: 192.168.1.100
    • ASN: 12345
    • Destination port: 22 (SSH via Cowrie)
    • Recurrence probability (score): 0.9
    • Reputation: known attacker
  2. Low-confidence IOC

    • Value: 10.0.0.50
    • ASN: 67890
    • Destination ports: 80, 443 (Web via Honeytrap)
    • Score: 0.2
    • Reputation: mass scanner

Scene A: ASN Filtering
Goal: Confirm ASN filter isolates specific IOCs.
Query: ?asn=12345
Expected: Only 192.168.1.100 returned.
Proof: API returned exactly one IOC.

{
  "iocs": [
    {
      "scanner": true,
      "payload_request": false,
      "recurrence_probability": 0.9,
      "attack_count": 100,
      "value": "192.168.1.100",
      "last_seen": "2026-02-19",
      "interaction_count": 50,
      "ip_reputation": "known attacker",
      "login_attempts": 20,
      "expected_interactions": 0.0,
      "asn": 12345,
      "first_seen": "2026-02-19",
      "feed_type": ["cowrie"],
      "destination_port_count": 1
    }
  ]
}

Scene B: STIX 2.1 Export
Goal: Validate STIX 2.1 output is correct and usable.
Query: ?format=stix21&min_score=0.1
Expected: Valid STIX bundle with Indicator(s).
Proof: Returned proper bundle structure, including pattern, confidence (scaled), labels, and external reference.

{
  "type": "bundle",
  "id": "bundle--c11fdeb9-22d0-4a1e-988f-690efb5a4119",
  "objects": [
    {
      "type": "indicator",
      "spec_version": "2.1",
      "id": "indicator--adbf4884-dbce-4373-b40e-671b6770e832",
      "created": "2026-02-19T06:21:41.264409Z",
      "modified": "2026-02-19T06:21:41.264409Z",
      "name": "192.168.1.100",
      "description": "Detected by GreedyBear honeypots: cowrie, known attacker",
      "pattern": "[ipv4-addr:value = '192.168.1.100']",
      "pattern_type": "stix",
      "pattern_version": "2.1",
      "valid_from": "2026-02-19T06:19:09.38852Z",
      "valid_until": "2026-02-20T06:19:09.388598Z",
      "labels": ["cowrie", "known attacker"],
      "confidence": 90,
      "external_references": [
        {
          "source_name": "GreedyBear",
          "url": "https://greedybear.honeynet.org/?query=192.168.1.100"
        }
      ]
    }
  ]
}

Scene C: Shareable Feed Links
Goal: Confirm unauthenticated consumption via token works correctly.
Steps:

  1. Authenticated call to /api/feeds/share with params like feed_type=all&max_age=3 → received signed token.
  2. curl http://localhost/api/feeds/consume/<TOKEN> (no auth).
    Expected: Full feed with both IOCs.
    Proof: Rate-limited public access returned the complete dataset.
{
  "iocs": [
    {
      "scanner": true,
      "payload_request": false,
      "recurrence_probability": 0.2,
      "attack_count": 5,
      "value": "10.0.0.50",
      "last_seen": "2026-02-19",
      "interaction_count": 0,
      "ip_reputation": "mass scanner",
      "login_attempts": 0,
      "expected_interactions": 0.0,
      "asn": 67890,
      "first_seen": "2026-02-19",
      "feed_type": ["honeytrap"],
      "destination_port_count": 2
    },
    {
      "scanner": true,
      "payload_request": false,
      "recurrence_probability": 0.9,
      "attack_count": 100,
      "value": "192.168.1.100",
      "last_seen": "2026-02-19",
      "interaction_count": 50,
      "ip_reputation": "known attacker",
      "login_attempts": 20,
      "expected_interactions": 0.0,
      "asn": 12345,
      "first_seen": "2026-02-19",
      "feed_type": ["cowrie"],
      "destination_port_count": 1
    }
  ]
}

I also manually verified the new port, start_date and end_date filters work as expected. Happy to add more detailed scenes if needed.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read and understood the rules about how to Contribute to this project.
  • The pull request is for the branch develop.
  • I have added documentation of the new features.
  • Linter (Ruff) gave 0 errors. (pre-commit was running locally)
  • I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors.
  • If changes were made to an existing model/serializer/view, the docs were updated and regenerated.
  • If the GUI has been modified:

Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @R1sh0bh-1 ! Looks promising! :)
Before reviewing this in-depth, I have some higher level problems:

  1. You added functionality, but did not write any tests for it.
  2. Add docs strings to the endpoints that match our pattern. The API docs are generated from these doc strings.
  3. I don't fully understand the use case for the sharing functionality. Could you please give me an example scenario where people would benefit from it?

@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

R1sh0bh-1 commented Feb 19, 2026

Hi @regulartim, the primary use case for the Shareable Feeds feature is to facilitate zero-friction sharing of specific threat intelligence with external partners, tools, or analysts who (1) do not have a GreedyBear account or (2) need to ingest data into tools that don't support complex authentication flows (like cookie-based auth).

Example Scenario: An analyst identifies a specific campaign targeting SSH ports (Cowrie) from a specific ASN. They filter the feed: ?asn=12345&port=22&feed_type=cowrie

They want to share exactly this live dataset with:

  1. A trusted partner organization for immediate blocking.
  2. An external SOAR tool or firewall that supports simple URL-based feed ingestion (e.g., "deny list URL").

Instead of creating a new API user or sharing credentials (bad practice), they generate a signed, read-only link for that specific filter.

The link is:
Scoped: It only returns data matching the original specific filter (ASN 12345, Port 22).
Safe: It doesn't expose the user's full account or allow arbitrary queries.
Rate-Limited: It has its own strict throttling scope to prevent abuse.

This makes GreedyBear much more interoperable with external ecosystem tools without compromising security or requiring user management overhead for temporary data sharing.

I hope I could explain you what I was thinking and yes I will do the rest of the things you suggested.

@regulartim
Copy link
Copy Markdown
Member

I hope I could explain you what I was thinking and yes I will do the rest of the things you suggested.

Year, that does make sense I think. 👍

When this is done, you could also write a section in the GreedyBear docs about the new sharing endpoint.

@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

When this is done, you could also write a section in the GreedyBear docs about the new sharing endpoint.

Yeah sure once this pr is merged I will do that too

@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim February 19, 2026 15:27
Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @R1sh0bh-1 !
Is there a mechanism to revoke tokens in case they get abused? I think it would be necessary to have such a functionality.

You added tests, but you did create a new file for testing the advanced feed API endpoint, which is confusing, as there already is a file for that!

There are also some quite important test cases missing, for example:

  • STIX export with domain-type IOCs
  • expired tokens
  • verify rate limiting actually works
  • combined filters (e.g., asn + min_score + port)
  • edge cases like min_score=0

Comment thread api/views/utils.py Outdated
Comment thread api/views/utils.py Outdated
Comment on lines +193 to +194
if feed_params.min_score:
query_dict["recurrence_probability__gte"] = feed_params.min_score
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips filtering when min_score=0 or min_score=0.0, which are falsy. Is this intended?

@regulartim
Copy link
Copy Markdown
Member

@R1sh0bh-1 I'll convert this to a draft again, since you told me that you don't work on it during the next week. Then I don't always stumble over it when looking through the PR list.

@regulartim regulartim marked this pull request as draft February 25, 2026 09:32
@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

Hey @regulartim! I've addressed all your feedback in the latest push. On the STIX side, I simplified the type-detection logic and added proper input validation before inserting values into patterns (IPs are validated via ip_address(), and domains are checked for dangerous characters) to prevent any injection. I also fixed the min_score=0 bug by switching the truthy check to "is not None", so zero values are now correctly applied as filters.

For the tests, I deleted the separate test_feeds_enhancements.py file and merged everything into the existing test_feeds_advanced_view.py where it belongs, and added all the missing cases you called out STIX export with domain-type IOCs, expired/tampered tokens, rate limiting (429), combined filters (asn + min_score + port), and the min_score=0 edge case.

As for token revocation, I completely agree it's an important feature. For this, I was thinking of opening a separate follow-up issue so it can be designed and implemented properly, since it would require introducing a new DB model and migration. That way, we can keep this PR focused and handle revocation in a dedicated change. Let me know if that sounds good to you.

@R1sh0bh-1 R1sh0bh-1 marked this pull request as ready for review March 3, 2026 10:33
@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim March 3, 2026 12:25
Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @R1sh0bh-1

As for token revocation, I completely agree it's an important feature. For this, I was thinking of opening a separate follow-up issue so it can be designed and implemented properly, since it would require introducing a new DB model and migration. That way, we can keep this PR focused and handle revocation in a dedicated change. Let me know if that sounds good to you.

Normally I would agree. But I consider the revocation mechanism essential for this feature. So I would prefer to have it in a single PR.

Comment thread api/views/utils.py
Comment thread api/views/feeds.py Outdated
return feeds_response(iocs_queryset, feed_params, valid_feed_types)


feeds_consume.throttle_scope = "feeds_shared"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently merged a PR where we introduced throttles to the other endpoints. Could you please stick to the same pattern in your PR?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #927

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +271 to +301
def test_rate_limiting_consume(self):
"""
Shared feed endpoint enforces rate limiting on the /feeds/consume/ endpoint.

We use unittest.mock to simulate a throttled response after the 2nd call,
because the Django test runner's in-memory cache doesn't isolate throttle
state between requests the way a real cache would.
"""
from unittest.mock import patch

share_response = self.client.get("/api/feeds/share")
token = share_response.json()["url"].split("/")[-1]
anon = APIClient()

call_count = {"n": 0}

def throttle_after_two(throttle_instance, request, view):
call_count["n"] += 1
if call_count["n"] > 2:
throttle_instance.history = []
throttle_instance.rate = "2/minute"
throttle_instance.num_requests = 2
throttle_instance.duration = 60
throttle_instance.now = 0
return False
return True

with patch("rest_framework.throttling.ScopedRateThrottle.allow_request", throttle_after_two):
responses = [anon.get(f"/api/feeds/consume/{token}") for _ in range(3)]

status_codes = [r.status_code for r in responses]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't really test anything as it completely mocks the throttle. Are there better ways to do that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that was a fair observation! I've updated it in the latest push it now tests the real throttle logic with just the rate patched to 1/minute and cache.clear() scoped to the test.

@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim March 5, 2026 07:30
Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progess! 👍 Maybe we should also include the revocation link tin the response of the share endpoint?

Comment thread greedybear/settings.py Outdated
Comment on lines 109 to 113
# Throttling
"DEFAULT_THROTTLE_RATES": {
"feeds": os.environ.get("FEEDS_THROTTLE_RATE", "30/minute"),
"feeds_advanced": os.environ.get("FEEDS_ADVANCED_THROTTLE_RATE", "100/minute"),
"feeds_shared": "10/minute",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other throttle rates are configurable. Maybe we should make feeds_shared configurable too?

Comment thread greedybear/models.py Outdated
Comment on lines +208 to +224
class RevokedToken(models.Model):
"""
Stores revoked shared-feed tokens.

The raw token is never persisted; only its SHA-256 hash is stored so that
a leaked database cannot be used to reconstruct valid tokens.
"""

token_hash = models.CharField(
max_length=64,
unique=True,
db_index=True,
help_text="SHA-256 hex digest of the raw signed token.",
)
revoked_at = models.DateTimeField(auto_now_add=True)
reason = models.CharField(max_length=256, blank=True, default="")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would maybe prefer is just a ShareToken model with a revoked field. Then, we could (in a follow-up) add an admin view where users can just revoke tokens by clicking them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Comment thread api/views/feeds.py
return self.cache_format % {"scope": self.scope, "ident": self.get_ident(request)}


@api_view([GET])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This belongs in api/throttles.py

@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim March 5, 2026 08:27
@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

R1sh0bh-1 commented Mar 5, 2026

@regulartim please have a look and let me know if any changes are required :)

Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @R1sh0bh-1 ! We are getting closer to merge this. 👍

  • You have now two migrations which is not necessary here. Please squash them.
  • I noticed that everyone can revoke any token currently. Is this by design?
  • The 30-day max_age is currently not configurable. I think that it's fine for now, just something I noticed.

@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

Squashed the two migrations into a single 0041_sharetoken.
On the ownership — good catch, wasn't intentional! Fixed it now: ShareToken stores the creating user and feeds_revoke returns 403 if someone else tries to revoke it.

@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim March 6, 2026 07:22
Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested your code and I think the UX of the revocation process is not good. You need to save the revocation link, then call it with curl or a REST client along with a auth token. What do you think of making revoke a GET endpoint. I know that this is semantically not correct, but would improve UX a lot (you can just open it in a browser).

@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

My bad, I totally missed the UX aspect there! You're completely right I've updated the revoke endpoint to GET so the link can just be clicked and opened directly in the browser for a quick, one-click revocation. Thanks for pointing that out, I'll definitely keep UX more in mind for API designs going forward!

@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim March 6, 2026 08:47
Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions on the STIX semantics:

  1. Isn't the confidence calculation here a bit missleading? If I understand the format correctly, confidence is meant to express "how sure are we this is a real threat indicator" while recurrence_probability expresses "how likely is this attacker to come back". These are two very different things in my opinion. The IOCs with the highest recurrence_probability are usually mass scanners that are quite harmless in comparison to other threats. What was your thought process behind assigning the confidence field to the recurrence_probability?
  2. The source URL is hardcoded to "https://greedybear.honeynet.org/?query=" . This information will be wrong on any instance but the honeynet one. We have this problem also in other places in the code, but I would not like to add another hardcoded honeynet reference. Any idea how to change this?

@regulartim
Copy link
Copy Markdown
Member

Also, you need to add documentation of the new endpoints (in the usage section). Please keep it short and concise. You can add a note that the feature will be available from version 3.3.0 (there are other sections with similar notes). To add the documentation, you have to raise a PR in the docs repo.

@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

You're absolutely right about the STIX semantics. I've decoupled the confidence from the recurrence probability and set it to a fixed 90 across the board, as honeypot detections are indeed high-confidence evidence. I've also replaced the hardcoded Honeynet URL with a dynamic one via build_absolute_uri(), making the export instance-agnostic while keeping a safe fallback for background jobs. Everything is tested and passing!

Also, you need to add documentation of the new endpoints (in the usage section). Please keep it short and concise. You can add a note that the feature will be available from version 3.3.0 (there are other sections with similar notes). To add the documentation, you have to raise a PR in the docs repo.

Yes once this PR gets merged I will do that too.

@R1sh0bh-1 R1sh0bh-1 requested a review from regulartim March 6, 2026 13:51
@regulartim
Copy link
Copy Markdown
Member

You're absolutely right about the STIX semantics. I've decoupled the confidence from the recurrence probability and set it to a fixed 90 across the board, as honeypot detections are indeed high-confidence evidence.

I think all of this is debatable, that's why I wanted to know your thought process and want to discuss possible solutions with you. :) Setting it to 90 fixed is not a good solution in my opinion, because it's just some arbitrary number we chose. Maybe this is something that needs further research. We could try to calculate such a score based on multiple signals and third party threat intelligence. That's obviously a topic for another PR. If the field is optional we could also just leave it out for now.

I've also replaced the hardcoded Honeynet URL with a dynamic one via build_absolute_uri(), making the export instance-agnostic while keeping a safe fallback for background jobs. Everything is tested and passing!

Cool! 👍

@R1sh0bh-1
Copy link
Copy Markdown
Contributor Author

The choice of 90 wasn't arbitrary; it’s based on the Admiralty Scale mapping used in the STIX 2.1 specification. In the CTI community, data from internal, high-fidelity sensors (like honeypots) is categorized as 'Usually Reliable' or 'Highly Likely,' which standardizes to the 85-95 range.

My reasoning is that we are an Authoritative Source a hit on our honeypot is a direct observation of activity, not a derived guess from a third-party feed. Providing a value like 90 makes the feed immediately actionable for SOC automation; most SIEM and SOAR platforms use this field as a threshold for auto-blocking. If we leave it unspecified, we risk our high-fidelity indicators being ignored or deprioritized by automated systems that default missing confidence to low.
That said, I'm happy to remove it or do whatever you suggest!

Copy link
Copy Markdown
Member

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of 90 wasn't arbitrary; it’s based on the Admiralty Scale mapping used in the STIX 2.1 specification. In the CTI community, data from internal, high-fidelity sensors (like honeypots) is categorized as 'Usually Reliable' or 'Highly Likely,' which standardizes to the 85-95 range.

My reasoning is that we are an Authoritative Source a hit on our honeypot is a direct observation of activity, not a derived guess from a third-party feed. Providing a value like 90 makes the feed immediately actionable for SOC automation; most SIEM and SOAR platforms use this field as a threshold for auto-blocking. If we leave it unspecified, we risk our high-fidelity indicators being ignored or deprioritized by automated systems that default missing confidence to low.

That is perfectly reasonable! Thanks for your work! 👍

@regulartim regulartim merged commit 5cff476 into GreedyBear-Project:develop Mar 6, 2026
4 checks passed
cclts pushed a commit to cclts/GreedyBear that referenced this pull request Mar 11, 2026
…reedyBear-Project#831 (GreedyBear-Project#839)

* feat(api): enhance feeds with advanced filtering, STIX 2.1 export, and shareable URLs

* test(feeds): add missing unit tests and API docstrings for new features

* fix(feeds): sanitize STIX patterns, fix min_score=0 bug, consolidate and expand tests

* chore: add missing newline at end of file

* fix(feeds): replace undefined is_ip_address with stdlib ip_address

* feat(feeds): add token revocation, use shared validators for STIX, fix throttle pattern

* fix(migrations): rename revokedtoken to 0041, add upstream 0040_alter_tag_key

* refactor(feeds): ShareToken model, configurable throttle rate, move throttle class, add revoke_url to share

* fix(feeds): squash migrations into 0041_sharetoken, add user ownership to ShareToken

* feat(feeds): change revoke endpoint to GET

* refactor(stix): refine confidence mapping and dynamic source URL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants