Skip to content

fix: eliminate N+1 queries in IocRepository.add_honeypot_to_ioc(). Closes #1012#1024

Merged
regulartim merged 3 commits intointelowlproject:developfrom
Abhijeet17o:fix/ioc-repository-n-plus-1-queries-clean
Mar 16, 2026
Merged

fix: eliminate N+1 queries in IocRepository.add_honeypot_to_ioc(). Closes #1012#1024
regulartim merged 3 commits intointelowlproject:developfrom
Abhijeet17o:fix/ioc-repository-n-plus-1-queries-clean

Conversation

@Abhijeet17o
Copy link
Copy Markdown
Contributor

@Abhijeet17o Abhijeet17o commented Mar 11, 2026

Description

IocRepository.add_honeypot_to_ioc() was firing two avoidable DB queries per IOC on every extraction run.

Problem 1 - cache stored booleans, not objects.
_honeypot_cache was keyed by normalized name but held only the boolean hp.active flag. When add_honeypot_to_ioc() needed the actual GeneralHoneypot object to call ioc.general_honeypot.add(), the cache was useless and it fell back to get_hp_by_name(), which is a live SELECT, for every single IOC in the batch.

Fix: store the complete GeneralHoneypot instance in _honeypot_cache at __init__ and in create_honeypot(). Update is_enabled() to read .active from the cached object instead of the old boolean.

Problem 2 - no prefetch on get_ioc_by_name().
get_ioc_by_name() fetched IOCs without prefetch_related. Every call to ioc.general_honeypot.all() inside add_honeypot_to_ioc() therefore fired a separate lazy SELECT per existing IOC.

Fix: add prefetch_related("general_honeypot") to get_ioc_by_name(). For IOCs retrieved this way, .all() reads the already-loaded prefetch cache at zero cost.

Combined effect: for a 5,000-IP batch, the two original queries (honeypot object lookup + lazy M2M SELECT) drop to zero for every IOC whose honeypot is already associated, and to one query (the M2M INSERT) for new associations. This saves up to ~10,000 round trips per run on top of the savings from #1008.

Related issues

Type of change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Formalities

  • I have read and understood the rules about how to Contribute to this project.
  • I chose an appropriate title for the pull request in the form: fix: eliminate N+1 queries in IocRepository.add_honeypot_to_ioc(). Closes #1012
  • My branch is based on develop. (Branch fix/ioc-repository-n-plus-1-queries-clean was created directly from origin/develop and contains exactly one commit on top of it.)
  • The pull request is for the branch develop.
  • I have reviewed and verified any LLM-generated code included in this PR.

Docs and tests

  • I documented my code changes with docstrings and/or comments.
  • I have checked if my changes affect user-facing behavior that is described in the docs. (These are internal query optimizations with no effect on API responses, feed output, or any user-visible behavior. No docs PR is needed.)
  • Linter (Ruff) gave 0 errors. (Verified locally with ruff check and ruff format --check across all changed files.)
  • I have added tests for the feature/bug I solved. Specifically:
    • test_honeypot_cache_stores_generalhoneypot_objects - asserts that every value in _honeypot_cache is a GeneralHoneypot instance, not a boolean.
    • test_get_ioc_by_name_prefetches_general_honeypot - asserts that accessing general_honeypot.all() on a repo-fetched IOC fires 0 queries.
    • test_add_honeypot_to_ioc_uses_cache_not_db - asserts that when the honeypot is in cache and the IOC was fetched with prefetch, associated IOCs cost 0 queries and new associations cost exactly 1 query (M2M INSERT only).
    • test_create_honeypot_stores_object_in_cache - asserts the cache holds a GeneralHoneypot instance after create_honeypot().
  • All the tests gave 0 errors. (750/750 tests pass locally.)

GUI changes

Not applicable - no frontend changes.

Copilot AI review requested due to automatic review settings March 11, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the IOC extraction pipeline by eliminating avoidable per-IOC database queries when associating GeneralHoneypot records to IOC records.

Changes:

  • Store GeneralHoneypot model instances (not booleans) in IocRepository._honeypot_cache and adjust is_enabled() accordingly.
  • Reduce M2M query overhead by prefetching general_honeypot in get_ioc_by_name() and adding an is_new fast-path to skip membership checks for newly created IOCs.
  • Add/extend unit tests asserting query counts and cache behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
greedybear/cronjobs/repositories/ioc.py Implements honeypot object caching, adds is_new optimization, and prefetches M2M relations.
greedybear/cronjobs/extraction/ioc_processor.py Tracks whether an IOC was newly created and passes is_new into the repository method.
tests/test_ioc_repository.py Adds test coverage for cache contents, prefetch effectiveness, and query-count expectations.
tests/test_ioc_processor.py Updates mock assertion to validate the new is_new argument is passed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Abhijeet17o Abhijeet17o marked this pull request as draft March 11, 2026 14:44
@Abhijeet17o Abhijeet17o force-pushed the fix/ioc-repository-n-plus-1-queries-clean branch from fc1e334 to 434458f Compare March 11, 2026 14:54
@Abhijeet17o Abhijeet17o marked this pull request as ready for review March 11, 2026 15:22
Copilot AI review requested due to automatic review settings March 11, 2026 15:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 11, 2026 15:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@regulartim
Copy link
Copy Markdown
Collaborator

Hey @Abhijeet17o ! I really appreciate your effort, but I would prefer if we could keep this PR focused on the two improvements that you proposed in the issue. Right now, it is hard for me to see the purpose of the single code changes. I am sure that all your optimizations benefit performance, but they also add a lot of additional code (=complexity) and I am not sure if that is worth it.

@Abhijeet17o
Copy link
Copy Markdown
Contributor Author

Hey @regulartim, you are right to point it out that I have kind of went beyond my scope. The PR grew beyond its original scope because I incorporated several automated code review suggestions that each seemed reasonable in isolation, but together added complexity that wasn't necessary for the two fixes I proposed.
The extra optimizations were real improvements, but they belong in separate focused PRs
So, right now I'm stripping it back to just the two changes from the issue.

@Abhijeet17o Abhijeet17o force-pushed the fix/ioc-repository-n-plus-1-queries-clean branch from 1bc7bcc to bf7195d Compare March 12, 2026 05:14
@Abhijeet17o
Copy link
Copy Markdown
Contributor Author

Hey @regulartim!
Can you please review the changes made.
I've narrowed the focus on problems that I had mentioned in the issue.

@Abhijeet17o
Copy link
Copy Markdown
Contributor Author

Abhijeet17o commented Mar 12, 2026

Hey @regulartim,
I wanted to solve one question that I had. So in the following code:

honeypot_set = {hp.name for hp in ioc.general_honeypot.all()}
if honeypot_name not in honeypot_set: 
    honeypot = self._honeypot_cache.get(self._normalize_name(honeypot_name))
    if honeypot is not None:
        ioc.general_honeypot.add(honeypot)  # this fires an extra query

if honeypot_name comes in as "cowrie" but the DB stores "Cowrie", the guard (not in honeypot_set) doesn't recognize it as already associated, so it falls through to the [add()]. The cache lookup finds the object (because it uses _normalize_name), and Django's M2M add() is idempotent so no duplicate is created but an unnecessary DB query fires

Can you validate this? if yes I'll implement a solution for it in this PR itself

EDIT: Implemented the requited changes aswell 👍



Two targeted fixes:

1. Add prefetch_related('general_honeypot') to get_ioc_by_name() so
   subsequent .all() calls in add_honeypot_to_ioc() read from
   Django's prefetch cache instead of hitting the DB per IOC.

2. Change _honeypot_cache to store GeneralHoneypot objects instead
   of booleans, so add_honeypot_to_ioc() can call .add() using the
   cached object directly without an extra get_hp_by_name() DB hit.
Copilot AI review requested due to automatic review settings March 12, 2026 11:54
@Abhijeet17o Abhijeet17o force-pushed the fix/ioc-repository-n-plus-1-queries-clean branch from bf7195d to 49daaac Compare March 12, 2026 11:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 12, 2026 14:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Abhijeet17o Abhijeet17o force-pushed the fix/ioc-repository-n-plus-1-queries-clean branch from 527709c to 7331793 Compare March 12, 2026 14:51
@Abhijeet17o Abhijeet17o requested a review from regulartim March 12, 2026 14:56
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Abhijeet17o ! I'll just copy/paste my comment from above:

Hey @Abhijeet17o ! I really appreciate your effort, but I would prefer if we could keep this PR focused on the two improvements that you proposed in the issue. Right now, it is hard for me to see the purpose of the single code changes. I am sure that all your optimizations benefit performance, but they also add a lot of additional code (=complexity) and I am not sure if that is worth it.

@Abhijeet17o
Copy link
Copy Markdown
Contributor Author

Hey @regulartim ! I've already stripped things back down, the current diff is only 20 lines of production code that is in ioc.py, it is working as intended. This is all tied to two fixes:

  • Fix 1: one line adding prefetch_related("general_honeypot") to get_ioc_by_name()
  • Fix 2: three lines changing _honeypot_cache to store objects instead of booleans, and updating is_enabled() and create_honeypot() accordingly
  • The error log in add_honeypot_to_ioc() was added at the suggestion that you made ("I guess we should log an error then")

The major code addition was in the test_ioc_repository.py‎ where tests for the modifications were added and documented.
So, actually I would want some guidance on would you like me to remove any specific tests you feel are excessive? Or how would you like me to approach it?

@regulartim
Copy link
Copy Markdown
Collaborator

Hey @regulartim,
I wanted to solve one question that I had. So in the following code:
honeypot_set = {hp.name for hp in ioc.general_honeypot.all()}
if honeypot_name not in honeypot_set:
honeypot = self._honeypot_cache.get(self._normalize_name(honeypot_name))
if honeypot is not None:
ioc.general_honeypot.add(honeypot) # this fires an extra query
if honeypot_name comes in as "cowrie" but the DB stores "Cowrie", the guard (not in honeypot_set) doesn't recognize it as already associated, so it falls through to the [add()]. The cache lookup finds the object (because it uses _normalize_name), and Django's M2M add() is idempotent so no duplicate is created but an unnecessary DB query fires
Can you validate this? if yes I'll implement a solution for it in this PR itself
EDIT: Implemented the requited changes aswell 👍

@Abhijeet17o You added this additional logic, which is not one of the fixes you proposed in the issue. Or am I wrong?

@Abhijeet17o
Copy link
Copy Markdown
Contributor Author

@regulartim
Oh, I think now I actually get your point. So test cases addition is fine but the modifications that are made to the non-test scripts should strictly be related to what has been defined in the issue.
Correct me if I'm wrong

I'll do the needful now, thanks! 👍

Copilot AI review requested due to automatic review settings March 14, 2026 05:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@Abhijeet17o Abhijeet17o requested a review from regulartim March 14, 2026 05:42
@Abhijeet17o
Copy link
Copy Markdown
Contributor Author

Hey @regulartim,
I have stripped things down now, and I don't think so there is anything that is out of scope for this issue. Can you review it once, thanks!

Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! 👍

@regulartim regulartim merged commit 143f6ff into intelowlproject:develop Mar 16, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants