Advanced filtering, STIX 2.1 export, and shareable feed URLs#839
Conversation
regulartim
left a comment
There was a problem hiding this comment.
Hey @R1sh0bh-1 ! Looks promising! :)
Before reviewing this in-depth, I have some higher level problems:
- You added functionality, but did not write any tests for it.
- Add docs strings to the endpoints that match our pattern. The API docs are generated from these doc strings.
- I don't fully understand the use case for the sharing functionality. Could you please give me an example scenario where people would benefit from it?
|
Hi @regulartim, the primary use case for the Shareable Feeds feature is to facilitate zero-friction sharing of specific threat intelligence with external partners, tools, or analysts who (1) do not have a GreedyBear account or (2) need to ingest data into tools that don't support complex authentication flows (like cookie-based auth). Example Scenario: An analyst identifies a specific campaign targeting SSH ports (Cowrie) from a specific ASN. They filter the feed: ?asn=12345&port=22&feed_type=cowrie They want to share exactly this live dataset with:
Instead of creating a new API user or sharing credentials (bad practice), they generate a signed, read-only link for that specific filter. The link is: This makes GreedyBear much more interoperable with external ecosystem tools without compromising security or requiring user management overhead for temporary data sharing. I hope I could explain you what I was thinking and yes I will do the rest of the things you suggested. |
Year, that does make sense I think. 👍 When this is done, you could also write a section in the GreedyBear docs about the new sharing endpoint. |
Yeah sure once this pr is merged I will do that too |
regulartim
left a comment
There was a problem hiding this comment.
Hey @R1sh0bh-1 !
Is there a mechanism to revoke tokens in case they get abused? I think it would be necessary to have such a functionality.
You added tests, but you did create a new file for testing the advanced feed API endpoint, which is confusing, as there already is a file for that!
There are also some quite important test cases missing, for example:
- STIX export with domain-type IOCs
- expired tokens
- verify rate limiting actually works
- combined filters (e.g., asn + min_score + port)
- edge cases like min_score=0
| if feed_params.min_score: | ||
| query_dict["recurrence_probability__gte"] = feed_params.min_score |
There was a problem hiding this comment.
This skips filtering when min_score=0 or min_score=0.0, which are falsy. Is this intended?
|
@R1sh0bh-1 I'll convert this to a draft again, since you told me that you don't work on it during the next week. Then I don't always stumble over it when looking through the PR list. |
|
Hey @regulartim! I've addressed all your feedback in the latest push. On the STIX side, I simplified the type-detection logic and added proper input validation before inserting values into patterns (IPs are validated via ip_address(), and domains are checked for dangerous characters) to prevent any injection. I also fixed the min_score=0 bug by switching the truthy check to "is not None", so zero values are now correctly applied as filters. For the tests, I deleted the separate test_feeds_enhancements.py file and merged everything into the existing test_feeds_advanced_view.py where it belongs, and added all the missing cases you called out STIX export with domain-type IOCs, expired/tampered tokens, rate limiting (429), combined filters (asn + min_score + port), and the min_score=0 edge case. As for token revocation, I completely agree it's an important feature. For this, I was thinking of opening a separate follow-up issue so it can be designed and implemented properly, since it would require introducing a new DB model and migration. That way, we can keep this PR focused and handle revocation in a dedicated change. Let me know if that sounds good to you. |
regulartim
left a comment
There was a problem hiding this comment.
Hey @R1sh0bh-1
As for token revocation, I completely agree it's an important feature. For this, I was thinking of opening a separate follow-up issue so it can be designed and implemented properly, since it would require introducing a new DB model and migration. That way, we can keep this PR focused and handle revocation in a dedicated change. Let me know if that sounds good to you.
Normally I would agree. But I consider the revocation mechanism essential for this feature. So I would prefer to have it in a single PR.
| return feeds_response(iocs_queryset, feed_params, valid_feed_types) | ||
|
|
||
|
|
||
| feeds_consume.throttle_scope = "feeds_shared" |
There was a problem hiding this comment.
We recently merged a PR where we introduced throttles to the other endpoints. Could you please stick to the same pattern in your PR?
| def test_rate_limiting_consume(self): | ||
| """ | ||
| Shared feed endpoint enforces rate limiting on the /feeds/consume/ endpoint. | ||
|
|
||
| We use unittest.mock to simulate a throttled response after the 2nd call, | ||
| because the Django test runner's in-memory cache doesn't isolate throttle | ||
| state between requests the way a real cache would. | ||
| """ | ||
| from unittest.mock import patch | ||
|
|
||
| share_response = self.client.get("/api/feeds/share") | ||
| token = share_response.json()["url"].split("/")[-1] | ||
| anon = APIClient() | ||
|
|
||
| call_count = {"n": 0} | ||
|
|
||
| def throttle_after_two(throttle_instance, request, view): | ||
| call_count["n"] += 1 | ||
| if call_count["n"] > 2: | ||
| throttle_instance.history = [] | ||
| throttle_instance.rate = "2/minute" | ||
| throttle_instance.num_requests = 2 | ||
| throttle_instance.duration = 60 | ||
| throttle_instance.now = 0 | ||
| return False | ||
| return True | ||
|
|
||
| with patch("rest_framework.throttling.ScopedRateThrottle.allow_request", throttle_after_two): | ||
| responses = [anon.get(f"/api/feeds/consume/{token}") for _ in range(3)] | ||
|
|
||
| status_codes = [r.status_code for r in responses] |
There was a problem hiding this comment.
This test doesn't really test anything as it completely mocks the throttle. Are there better ways to do that?
There was a problem hiding this comment.
You're right, that was a fair observation! I've updated it in the latest push it now tests the real throttle logic with just the rate patched to 1/minute and cache.clear() scoped to the test.
regulartim
left a comment
There was a problem hiding this comment.
Good progess! 👍 Maybe we should also include the revocation link tin the response of the share endpoint?
| # Throttling | ||
| "DEFAULT_THROTTLE_RATES": { | ||
| "feeds": os.environ.get("FEEDS_THROTTLE_RATE", "30/minute"), | ||
| "feeds_advanced": os.environ.get("FEEDS_ADVANCED_THROTTLE_RATE", "100/minute"), | ||
| "feeds_shared": "10/minute", |
There was a problem hiding this comment.
The other throttle rates are configurable. Maybe we should make feeds_shared configurable too?
| class RevokedToken(models.Model): | ||
| """ | ||
| Stores revoked shared-feed tokens. | ||
|
|
||
| The raw token is never persisted; only its SHA-256 hash is stored so that | ||
| a leaked database cannot be used to reconstruct valid tokens. | ||
| """ | ||
|
|
||
| token_hash = models.CharField( | ||
| max_length=64, | ||
| unique=True, | ||
| db_index=True, | ||
| help_text="SHA-256 hex digest of the raw signed token.", | ||
| ) | ||
| revoked_at = models.DateTimeField(auto_now_add=True) | ||
| reason = models.CharField(max_length=256, blank=True, default="") | ||
|
|
There was a problem hiding this comment.
What I would maybe prefer is just a ShareToken model with a revoked field. Then, we could (in a follow-up) add an admin view where users can just revoke tokens by clicking them.
| return self.cache_format % {"scope": self.scope, "ident": self.get_ident(request)} | ||
|
|
||
|
|
||
| @api_view([GET]) |
There was a problem hiding this comment.
This belongs in api/throttles.py
…hrottle class, add revoke_url to share
|
@regulartim please have a look and let me know if any changes are required :) |
regulartim
left a comment
There was a problem hiding this comment.
Hey @R1sh0bh-1 ! We are getting closer to merge this. 👍
- You have now two migrations which is not necessary here. Please squash them.
- I noticed that everyone can revoke any token currently. Is this by design?
- The 30-day max_age is currently not configurable. I think that it's fine for now, just something I noticed.
|
Squashed the two migrations into a single 0041_sharetoken. |
regulartim
left a comment
There was a problem hiding this comment.
I tested your code and I think the UX of the revocation process is not good. You need to save the revocation link, then call it with curl or a REST client along with a auth token. What do you think of making revoke a GET endpoint. I know that this is semantically not correct, but would improve UX a lot (you can just open it in a browser).
|
My bad, I totally missed the UX aspect there! You're completely right I've updated the revoke endpoint to GET so the link can just be clicked and opened directly in the browser for a quick, one-click revocation. Thanks for pointing that out, I'll definitely keep UX more in mind for API designs going forward! |
regulartim
left a comment
There was a problem hiding this comment.
Two questions on the STIX semantics:
- Isn't the confidence calculation here a bit missleading? If I understand the format correctly, confidence is meant to express "how sure are we this is a real threat indicator" while recurrence_probability expresses "how likely is this attacker to come back". These are two very different things in my opinion. The IOCs with the highest recurrence_probability are usually mass scanners that are quite harmless in comparison to other threats. What was your thought process behind assigning the confidence field to the recurrence_probability?
- The source URL is hardcoded to "https://greedybear.honeynet.org/?query=" . This information will be wrong on any instance but the honeynet one. We have this problem also in other places in the code, but I would not like to add another hardcoded honeynet reference. Any idea how to change this?
|
Also, you need to add documentation of the new endpoints (in the usage section). Please keep it short and concise. You can add a note that the feature will be available from version 3.3.0 (there are other sections with similar notes). To add the documentation, you have to raise a PR in the docs repo. |
|
You're absolutely right about the STIX semantics. I've decoupled the confidence from the recurrence probability and set it to a fixed 90 across the board, as honeypot detections are indeed high-confidence evidence. I've also replaced the hardcoded Honeynet URL with a dynamic one via build_absolute_uri(), making the export instance-agnostic while keeping a safe fallback for background jobs. Everything is tested and passing!
Yes once this PR gets merged I will do that too. |
I think all of this is debatable, that's why I wanted to know your thought process and want to discuss possible solutions with you. :) Setting it to 90 fixed is not a good solution in my opinion, because it's just some arbitrary number we chose. Maybe this is something that needs further research. We could try to calculate such a score based on multiple signals and third party threat intelligence. That's obviously a topic for another PR. If the field is optional we could also just leave it out for now.
Cool! 👍 |
|
The choice of 90 wasn't arbitrary; it’s based on the Admiralty Scale mapping used in the STIX 2.1 specification. In the CTI community, data from internal, high-fidelity sensors (like honeypots) is categorized as 'Usually Reliable' or 'Highly Likely,' which standardizes to the 85-95 range. My reasoning is that we are an Authoritative Source a hit on our honeypot is a direct observation of activity, not a derived guess from a third-party feed. Providing a value like 90 makes the feed immediately actionable for SOC automation; most SIEM and SOAR platforms use this field as a threshold for auto-blocking. If we leave it unspecified, we risk our high-fidelity indicators being ignored or deprioritized by automated systems that default missing confidence to low. |
regulartim
left a comment
There was a problem hiding this comment.
The choice of 90 wasn't arbitrary; it’s based on the Admiralty Scale mapping used in the STIX 2.1 specification. In the CTI community, data from internal, high-fidelity sensors (like honeypots) is categorized as 'Usually Reliable' or 'Highly Likely,' which standardizes to the 85-95 range.
My reasoning is that we are an Authoritative Source a hit on our honeypot is a direct observation of activity, not a derived guess from a third-party feed. Providing a value like 90 makes the feed immediately actionable for SOC automation; most SIEM and SOAR platforms use this field as a threshold for auto-blocking. If we leave it unspecified, we risk our high-fidelity indicators being ignored or deprioritized by automated systems that default missing confidence to low.
That is perfectly reasonable! Thanks for your work! 👍
…reedyBear-Project#831 (GreedyBear-Project#839) * feat(api): enhance feeds with advanced filtering, STIX 2.1 export, and shareable URLs * test(feeds): add missing unit tests and API docstrings for new features * fix(feeds): sanitize STIX patterns, fix min_score=0 bug, consolidate and expand tests * chore: add missing newline at end of file * fix(feeds): replace undefined is_ip_address with stdlib ip_address * feat(feeds): add token revocation, use shared validators for STIX, fix throttle pattern * fix(migrations): rename revokedtoken to 0041, add upstream 0040_alter_tag_key * refactor(feeds): ShareToken model, configurable throttle rate, move throttle class, add revoke_url to share * fix(feeds): squash migrations into 0041_sharetoken, add user ownership to ShareToken * feat(feeds): change revoke endpoint to GET * refactor(stix): refine confidence mapping and dynamic source URL
Description
This PR fully implements the enhancements proposed in #831 for the Advanced Feeds API.
Main changes:
asn,min_score(filters onrecurrence_probability),port,start_date,end_dateformat=stix21) — generates valid STIX 2.1 bundles with proper Indicator objects, confidence, labels, patterns, and external references back to GreedyBear/api/feeds/shareendpoint that creates a signed, tamper-proof token encoding the feed config/api/feeds/consume/<token>endpoint (10 req/min) for consuming the feed without authenticationstix2dependencysettings.pyAll of this builds directly on the existing Advanced Feeds API no new separate endpoint was created, just enhancements as discussed.
Related issues
Closes #831
Verification Summary (with mock data)
I populated the database with two realistic test IOCs to test isolation, export, and sharing:
High-confidence IOC
Low-confidence IOC
Scene A: ASN Filtering
Goal: Confirm ASN filter isolates specific IOCs.
Query:
?asn=12345Expected: Only 192.168.1.100 returned.
Proof: API returned exactly one IOC.
{ "iocs": [ { "scanner": true, "payload_request": false, "recurrence_probability": 0.9, "attack_count": 100, "value": "192.168.1.100", "last_seen": "2026-02-19", "interaction_count": 50, "ip_reputation": "known attacker", "login_attempts": 20, "expected_interactions": 0.0, "asn": 12345, "first_seen": "2026-02-19", "feed_type": ["cowrie"], "destination_port_count": 1 } ] }Scene B: STIX 2.1 Export
Goal: Validate STIX 2.1 output is correct and usable.
Query:
?format=stix21&min_score=0.1Expected: Valid STIX bundle with Indicator(s).
Proof: Returned proper bundle structure, including pattern, confidence (scaled), labels, and external reference.
{ "type": "bundle", "id": "bundle--c11fdeb9-22d0-4a1e-988f-690efb5a4119", "objects": [ { "type": "indicator", "spec_version": "2.1", "id": "indicator--adbf4884-dbce-4373-b40e-671b6770e832", "created": "2026-02-19T06:21:41.264409Z", "modified": "2026-02-19T06:21:41.264409Z", "name": "192.168.1.100", "description": "Detected by GreedyBear honeypots: cowrie, known attacker", "pattern": "[ipv4-addr:value = '192.168.1.100']", "pattern_type": "stix", "pattern_version": "2.1", "valid_from": "2026-02-19T06:19:09.38852Z", "valid_until": "2026-02-20T06:19:09.388598Z", "labels": ["cowrie", "known attacker"], "confidence": 90, "external_references": [ { "source_name": "GreedyBear", "url": "https://greedybear.honeynet.org/?query=192.168.1.100" } ] } ] }Scene C: Shareable Feed Links
Goal: Confirm unauthenticated consumption via token works correctly.
Steps:
/api/feeds/sharewith params likefeed_type=all&max_age=3→ received signed token.curl http://localhost/api/feeds/consume/<TOKEN>(no auth).Expected: Full feed with both IOCs.
Proof: Rate-limited public access returned the complete dataset.
{ "iocs": [ { "scanner": true, "payload_request": false, "recurrence_probability": 0.2, "attack_count": 5, "value": "10.0.0.50", "last_seen": "2026-02-19", "interaction_count": 0, "ip_reputation": "mass scanner", "login_attempts": 0, "expected_interactions": 0.0, "asn": 67890, "first_seen": "2026-02-19", "feed_type": ["honeytrap"], "destination_port_count": 2 }, { "scanner": true, "payload_request": false, "recurrence_probability": 0.9, "attack_count": 100, "value": "192.168.1.100", "last_seen": "2026-02-19", "interaction_count": 50, "ip_reputation": "known attacker", "login_attempts": 20, "expected_interactions": 0.0, "asn": 12345, "first_seen": "2026-02-19", "feed_type": ["cowrie"], "destination_port_count": 1 } ] }I also manually verified the new
port,start_dateandend_datefilters work as expected. Happy to add more detailed scenes if needed.Type of change
Checklist
develop.Ruff) gave 0 errors. (pre-commit was running locally)