fix: eliminate N+1 queries in IocRepository.add_honeypot_to_ioc(). Closes #1012#1024
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the IOC extraction pipeline by eliminating avoidable per-IOC database queries when associating GeneralHoneypot records to IOC records.
Changes:
- Store
GeneralHoneypotmodel instances (not booleans) inIocRepository._honeypot_cacheand adjustis_enabled()accordingly. - Reduce M2M query overhead by prefetching
general_honeypotinget_ioc_by_name()and adding anis_newfast-path to skip membership checks for newly created IOCs. - Add/extend unit tests asserting query counts and cache behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
greedybear/cronjobs/repositories/ioc.py |
Implements honeypot object caching, adds is_new optimization, and prefetches M2M relations. |
greedybear/cronjobs/extraction/ioc_processor.py |
Tracks whether an IOC was newly created and passes is_new into the repository method. |
tests/test_ioc_repository.py |
Adds test coverage for cache contents, prefetch effectiveness, and query-count expectations. |
tests/test_ioc_processor.py |
Updates mock assertion to validate the new is_new argument is passed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fc1e334 to
434458f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hey @Abhijeet17o ! I really appreciate your effort, but I would prefer if we could keep this PR focused on the two improvements that you proposed in the issue. Right now, it is hard for me to see the purpose of the single code changes. I am sure that all your optimizations benefit performance, but they also add a lot of additional code (=complexity) and I am not sure if that is worth it. |
|
Hey @regulartim, you are right to point it out that I have kind of went beyond my scope. The PR grew beyond its original scope because I incorporated several automated code review suggestions that each seemed reasonable in isolation, but together added complexity that wasn't necessary for the two fixes I proposed. |
1bc7bcc to
bf7195d
Compare
|
Hey @regulartim! |
|
Hey @regulartim, honeypot_set = {hp.name for hp in ioc.general_honeypot.all()}
if honeypot_name not in honeypot_set:
honeypot = self._honeypot_cache.get(self._normalize_name(honeypot_name))
if honeypot is not None:
ioc.general_honeypot.add(honeypot) # this fires an extra queryif Can you validate this? if yes I'll implement a solution for it in this PR itself EDIT: Implemented the requited changes aswell 👍 |
Two targeted fixes: 1. Add prefetch_related('general_honeypot') to get_ioc_by_name() so subsequent .all() calls in add_honeypot_to_ioc() read from Django's prefetch cache instead of hitting the DB per IOC. 2. Change _honeypot_cache to store GeneralHoneypot objects instead of booleans, so add_honeypot_to_ioc() can call .add() using the cached object directly without an extra get_hp_by_name() DB hit.
bf7195d to
49daaac
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
527709c to
7331793
Compare
regulartim
left a comment
There was a problem hiding this comment.
Hey @Abhijeet17o ! I'll just copy/paste my comment from above:
Hey @Abhijeet17o ! I really appreciate your effort, but I would prefer if we could keep this PR focused on the two improvements that you proposed in the issue. Right now, it is hard for me to see the purpose of the single code changes. I am sure that all your optimizations benefit performance, but they also add a lot of additional code (=complexity) and I am not sure if that is worth it.
|
Hey @regulartim ! I've already stripped things back down, the current diff is only 20 lines of production code that is in
The major code addition was in the |
@Abhijeet17o You added this additional logic, which is not one of the fixes you proposed in the issue. Or am I wrong? |
|
@regulartim I'll do the needful now, thanks! 👍 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
Hey @regulartim, |
regulartim
left a comment
There was a problem hiding this comment.
Looks good, thanks! 👍
Description
IocRepository.add_honeypot_to_ioc()was firing two avoidable DB queries per IOC on every extraction run.Problem 1 - cache stored booleans, not objects.
_honeypot_cachewas keyed by normalized name but held only the booleanhp.activeflag. Whenadd_honeypot_to_ioc()needed the actualGeneralHoneypotobject to callioc.general_honeypot.add(), the cache was useless and it fell back toget_hp_by_name(), which is a liveSELECT, for every single IOC in the batch.Fix: store the complete
GeneralHoneypotinstance in_honeypot_cacheat__init__and increate_honeypot(). Updateis_enabled()to read.activefrom the cached object instead of the old boolean.Problem 2 - no prefetch on
get_ioc_by_name().get_ioc_by_name()fetched IOCs withoutprefetch_related. Every call toioc.general_honeypot.all()insideadd_honeypot_to_ioc()therefore fired a separate lazySELECTper existing IOC.Fix: add
prefetch_related("general_honeypot")toget_ioc_by_name(). For IOCs retrieved this way,.all()reads the already-loaded prefetch cache at zero cost.Combined effect: for a 5,000-IP batch, the two original queries (honeypot object lookup + lazy M2M SELECT) drop to zero for every IOC whose honeypot is already associated, and to one query (the M2M
INSERT) for new associations. This saves up to ~10,000 round trips per run on top of the savings from #1008.Related issues
IocProcessor.add_ioc():add_honeypot_to_ioc()Fires Two Avoidable DB Queries Per IOC #1012Type of change
Checklist
Formalities
fix: eliminate N+1 queries in IocRepository.add_honeypot_to_ioc(). Closes #1012develop. (Branchfix/ioc-repository-n-plus-1-queries-cleanwas created directly fromorigin/developand contains exactly one commit on top of it.)develop.Docs and tests
Ruff) gave 0 errors. (Verified locally withruff checkandruff format --checkacross all changed files.)test_honeypot_cache_stores_generalhoneypot_objects- asserts that every value in_honeypot_cacheis aGeneralHoneypotinstance, not a boolean.test_get_ioc_by_name_prefetches_general_honeypot- asserts that accessinggeneral_honeypot.all()on a repo-fetched IOC fires 0 queries.test_add_honeypot_to_ioc_uses_cache_not_db- asserts that when the honeypot is in cache and the IOC was fetched with prefetch, associated IOCs cost 0 queries and new associations cost exactly 1 query (M2MINSERTonly).test_create_honeypot_stores_object_in_cache- asserts the cache holds aGeneralHoneypotinstance aftercreate_honeypot().GUI changes
Not applicable - no frontend changes.