Add auto extraction of FireHol lists. Closes #548#642
Add auto extraction of FireHol lists. Closes #548#642mlodic merged 7 commits intointelowlproject:developfrom
Conversation
|
This PR follows the existing |
greedybear/cronjobs/firehol.py
Outdated
| self._update_ioc(line, source) | ||
|
|
||
| except requests.RequestException as e: | ||
| self.log.error(f"Network error fetching {source}: {e}") |
There was a problem hiding this comment.
the RequestException can wrap only the requests.get, there's no need to wrap all the logic
There was a problem hiding this comment.
Will move the RequestException to only wrap the network call.
greedybear/cronjobs/firehol.py
Outdated
| self.log.info(f"Processing {source} from {url}") | ||
| try: | ||
| response = requests.get(url, timeout=60) | ||
| if response.status_code != 200: |
There was a problem hiding this comment.
a response.raise_for_status() is enough and more comprehensive that this check.
There was a problem hiding this comment.
Makes sense! Will use raise_for_status() for cleaner error handling.
greedybear/cronjobs/firehol.py
Outdated
| except Exception as e: | ||
| self.log.exception(f"Unexpected error processing {source}: {e}") | ||
|
|
||
| def _update_ioc(self, ip_address, source): |
There was a problem hiding this comment.
I won't update the already existing IOCs because the extracted information from FireHol is to be considered as "new" intelligence, so, on the contrary, I would instead touch where a new IOC is usually added by the already existing normal routines. There I expect a query of that newly found IP address in the FireHolList, and ONLY for the recently added IP addresses (the "added" parameter should queried). If it's present, then populate the firehol_categories.
There was a problem hiding this comment.
considering this, we should also add an additional routine at the end of this cron that deletes the old FireHol items because they are not useful anymore in the database. That database could become really big so it is important to keep it clean
There was a problem hiding this comment.
Makes sense to only enrich newly discovered IOCs rather than retroactively updating all existing ones. Will:
- Remove immediate IOC updates during extraction
- Add a batch enrichment step for recently added IOCs
- Add cleanup routine for old FireHolList entries
Thanks for the guidance!
greedybear/cronjobs/firehol.py
Outdated
| self._cleanup_old_entries() | ||
|
|
||
| def _enrich_recent_iocs(self): | ||
| """ |
There was a problem hiding this comment.
my idea was to leverage the firehol data for fresh new data: to do that we need to touch the "iocs_from_hits" method where we actually save newly found IOCs. This is important to keep the data consistent.
In that case, since a version is deployed, we will have the new IOCs populated with the new data.
The old ones, even if collected in the last day, should not be touched because the information collected from firehol should be considered fresh only at the time of extraction. So this method must be moved where the IOC objects is populated
There was a problem hiding this comment.
You're absolutely right, enriching recent IOCs retroactively could lead to inconsistencies.
I'll move the FireHol enrichment logic to the iocs_from_hits function where IOC objects are initially created. Now FireHol categories are populated at IOC creation time.
The _enrich_recent_iocs method has been removed from the FireHolCron class as it's no longer needed.
Changes made:
- greedybear/cronjobs/extraction/utils.py: Added FireHol enrichment at IOC creation
- greedybear/cronjobs/firehol.py: Removed
_enrich_recent_iocsmethod
| if extracted_ip.is_loopback or extracted_ip.is_private or extracted_ip.is_multicast or extracted_ip.is_link_local or extracted_ip.is_reserved: | ||
| continue | ||
|
|
||
| # Get FireHol categories for this IP at creation time |
There was a problem hiding this comment.
I like this, just one thing: this adds complexity to the current function, could you please move it outside of the main function? I expect a single line in iocs_from_hits
There was a problem hiding this comment.
Done! I've extracted the FireHol enrichment logic into a separate get_firehol_categories() function. Now iocs_from_hits has a clean single-line call as you suggested:
def iocs_from_hits(hits: list[dict]) -> list[IOC]:
"""Convert Elasticsearch hits into IOC objects..."""
hits_by_ip = defaultdict(list)
for hit in hits:
hits_by_ip[hit["src_ip"]].append(hit)
iocs = []
for ip, hits in hits_by_ip.items():
dest_ports = [hit["dest_port"] for hit in hits if "dest_port" in hit]
extracted_ip = ip_address(ip)
if extracted_ip.is_loopback or extracted_ip.is_private or extracted_ip.is_multicast or extracted_ip.is_link_local or extracted_ip.is_reserved:
continue
firehol_categories = get_firehol_categories(ip, extracted_ip) # Single line call
ioc = IOC(
name=ip,
type=get_ioc_type(ip),
interaction_count=len(hits),
ip_reputation=correct_ip_reputation(ip, hits[0].get("ip_rep", "")),
asn=hits[0].get("geoip", {}).get("asn"),
destination_ports=sorted(set(dest_ports)),
login_attempts=len(hits) if hits[0].get("type", "") == "Heralding" else 0,
firehol_categories=firehol_categories,
)
# ... rest of the code| network_range = ip_network(entry.ip_address, strict=False) | ||
| if extracted_ip in network_range and entry.source not in firehol_categories: | ||
| firehol_categories.append(entry.source) | ||
| except (ValueError, IndexError): |
There was a problem hiding this comment.
could you also add a test for this inside the IocsFromHitsTestCase class?
There was a problem hiding this comment.
Nice catch! I missed this in tests. I've added comprehensive tests for the FireHol enrichment functionality in the IocsFromHitsTestCase class. The tests cover:
- Exact IP match enrichment (.ipset files)
- CIDR network range match enrichment (.netset files)
- No match scenarios
- Mixed exact and network range matches
- Source deduplication
All tests passing.
mlodic
left a comment
There was a problem hiding this comment.
@opbot-xd great job, there's one last thing: another PR has been merged before this so we need to align the migration numbers. See: https://github.com/intelowlproject/GreedyBear/tree/develop/greedybear/migrations, there's already a n.23 migration so your new migrations must be 24 and 25 and should depend on that new one.
To do that ,please pull changed from develop and adjust the migrations. Then we are finally done.
Please @regulartim don't merge other PRs with new migrations to avoid additional conflicts in the meanwhile
- Add Celery beat schedule for weekly FireHol extraction - Register FireHolList model in Django admin - Expose firehol_categories field in Feeds API responses - Add firehol_categories to IOC admin list display - Improve error handling with specific exception types - Simplify verbose comments in firehol.py - Merge conflicting migrations from develop branch - Update serializer tests for new firehol_categories field All 267 tests passing
- Extract base_path variable for FireHol URLs - Narrow RequestException scope to only wrap network call - Use raise_for_status() for cleaner HTTP error handling - Only enrich recently added IOCs (within 24h) instead of all existing ones - Add cleanup routine to delete FireHolList entries older than 30 days - Update tests to match new enrichment behavior All 267 tests passing
- Move FireHol category enrichment from separate job step to iocs_from_hits() where IOCs are created, ensuring only fresh data is applied at extraction time - Add support for CIDR network ranges (netsets) using ipaddress library - Remove _enrich_recent_iocs() method as enrichment now happens at IOC creation - Update enrichment logic to handle both exact IP matches (.ipset) and network range membership (.netset) for proper dshield.netset support - Update test to reflect new behavior where FireHolCron only downloads data, enrichment happens automatically during IOC creation
- Test exact IP match enrichment (for .ipset files) - Test CIDR network range match enrichment (for .netset files) - Test no match scenario returns empty categories - Test mixed exact and network range matches - Test deduplication of FireHol sources
- Move FireHol category lookup logic to dedicated get_firehol_categories() function - Simplify iocs_from_hits() to single-line call as requested - Improves code readability and reduces function complexity
- Renamed 0023_ioc_firehol_categories to 0024 (after upstream 0023_rename_massscanners) - Renamed 0024_merge to 0025 - Updated dependencies to point to correct parent migrations - All tests passing (277/277)
ad65441 to
42741c8
Compare
|
Done! I've synced my fork with the upstream develop branch and fixed the migration conflicts. Since PR #643 was merged with migration 0023_rename_massscanners_massscanner_and_more, I've renumbered our migrations accordingly: the FireHol categories migration is now 0024_ioc_firehol_categories_alter_statistics_view_and_more (depending on the new 0023), and the merge migration is now 0025_merge_20251223_2100. I've also updated all the dependencies to point to the correct parent migrations. Migrations have been tested and all 277 tests are passing. |
|
thanks for the fast and thorough feedback, it is really appreciated. |
Description
Implements auto-extraction of FireHol lists to enhance IOC classification and improve threat intelligence.
This feature enables GreedyBear to:
Changes
firehol_categoriesfield to IOC model for classification metadatablocklist_de: IP addresses involved in attacksgreensnow: Known scanning IPsbruteforceblocker: Brute force attack sourcesdshield: DShield top attackers (CIDR blocks)firehol_categoriesin Feeds API responsesRelated issues
Closes #548
Type of change
Checklist
develop.Black,Flake,Isort) gave 0 errors.