Skip to content

Improve API performance. Closes #394#397

Merged
regulartim merged 18 commits intoGreedyBear-Project:developfrom
regulartim:improve_api_performance
Dec 13, 2024
Merged

Improve API performance. Closes #394#397
regulartim merged 18 commits intoGreedyBear-Project:developfrom
regulartim:improve_api_performance

Conversation

@regulartim
Copy link
Copy Markdown
Member

@regulartim regulartim commented Dec 9, 2024

Description

Two changes are introduced here:

  1. By using prefetch_related I could reduce the time the mentioned API request takes from ~6 seconds to ~2 seconds. This change is completly contained in eda16cd
  2. By implementing custom serialization / validation and removing the serializers from djangos rest_framework, the request time could be reduced further to ~200ms. The downside is that some IOC properties (e.g. value, first_seen, ...) are not checked as carefully as before. But since these values come from our DB anyway, I personally think that the performance improvement is worth the inferior validation.

Related issues

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue).

Checklist

  • I have read and understood the rules about how to Contribute to this project.
  • The pull request is for the branch develop.
  • I have added documentation of the new features.
  • Linters (Black, Flake, Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors.
  • If changes were made to an existing model/serializer/view, the docs were updated and regenerated (check CONTRIBUTE.md).
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.

Important Rules

  • If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
  • If your changes decrease the overall tests coverage (you will know after the Codecov CI job is done), you should add the required tests to fix the problem
  • Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review. After being reviewed and received a "change request", you should explicitly ask for a review again once you have made the requested changes.

@regulartim regulartim force-pushed the improve_api_performance branch from 888b400 to 6461156 Compare December 9, 2024 19:15
@regulartim regulartim marked this pull request as ready for review December 10, 2024 10:18
@regulartim regulartim requested a review from mlodic December 10, 2024 10:18
Comment thread api/views.py
Comment thread api/serializers.py Outdated
Comment on lines +10 to +12
VALID_ATTACK_TYPES = {"scanner", "payload_request", "all"}
VALID_AGES = {"persistent", "recent"}
VALID_FORMATS = {"csv", "json", "txt"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally it would be better to have these values separated in a consts.py file so other modules can leverage it without importing all the stuff here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them completely as they are not needed anymore.

Comment thread api/serializers.py Outdated
Comment on lines +54 to +58
@cache
def feed_type_validation(feed_type: str) -> bool:
general_honeypots = GeneralHoneypot.objects.all().filter(active=True)
valid_types = set(["log4j", "cowrie", "all"] + [hp.name.lower() for hp in general_honeypots])
return feed_type in valid_types
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caching the query is a great idea, thank you!

However, this would make the output of the GeneralHoneypot.objects.all().filter(active=True) always the same, even if an admin changed something in that table. So this should not be cached in this way.

In my opinion, the better solution would be to calculate the items from GeneralHoneypots only once at the start of the available API views (feeds and feeds_pagination) and then pass the result as a parameter to the get_queryset and the feeds_response function and then, utlimately, to the Serializer for the usage.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: I am not able to pass the set of valid feed types through the Serializer to the validation function. If I understand the documentation correctly, the validation functions are build to only validate the data that was serialized by the serializer. But I do not serialize the set of valid feed types, so it ca not be passed to the validation function.
Suggestion: Instead of calling the feed_type_validation function inside the serializer class, we call it after the instantiation of the serializer in get_queryset and feeds_response. Are you happy with that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I mean, I don't want to force any kind of specific pattern, like the DRF, because I think it is better to find a solution fast than to focus to code perfectionism because that can be always opinionable.

Another option can be using the object-level validation and extra-context. In this way you can pass all the data that you want through the serializers and with the more generic validator you can do all the stuff that you want

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra-context hint was what i needed! Thanks a lot! :)

Comment thread api/serializers.py Outdated
Comment on lines +61 to +95
def feed_request_validation(data: dict) -> bool:
if not feed_type_validation(data["feed_type"]):
raise serializers.ValidationError(f"Invalid feed_type: {data['feed_type']}")
if data["attack_type"] not in VALID_ATTACK_TYPES:
raise serializers.ValidationError(f"Invalid attack_type: {data['attack_type']}")
if data["age"] not in VALID_AGES:
raise serializers.ValidationError(f"Invalid age: {data['age']}")
if data["format"] not in VALID_FORMATS:
raise serializers.ValidationError(f"Invalid format: {data['format']}")
return True


def ioc_validation(data: dict) -> bool:
if not feed_type_validation(data["feed_type"]):
raise serializers.ValidationError(f"Invalid feed_type: {data['feed_type']}")
if data["times_seen"] < 1:
raise serializers.ValidationError(f"Invalid value for 'times_seen'': {data['times_seen']}")
if not (data[SCANNER] or data[PAYLOAD_REQUEST]):
raise serializers.ValidationError(f"Invalid data: IOC is neither {SCANNER} nor {PAYLOAD_REQUEST}")
return True


def serialize_ioc(ioc, feed_type: str) -> dict:
data = {
"value": ioc.name,
SCANNER: ioc.scanner,
PAYLOAD_REQUEST: ioc.payload_request,
"first_seen": ioc.first_seen.strftime("%Y-%m-%d"),
"last_seen": ioc.last_seen.strftime("%Y-%m-%d"),
"times_seen": ioc.times_seen,
"feed_type": feed_type,
}
if not ioc_validation(data):
raise serializers.ValidationError(f"Unknown error while validating {data}")
return data
Copy link
Copy Markdown
Collaborator

@mlodic mlodic Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if I am not an admirer of the Django Rest Framework at all, the custom Serializers are way more easy to read and to maintain. I would not change that stuff to simple functions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll look into it again. Maybe we can find a way to use the Django Serializers without such a massive performance loss.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I have not found a possibility to do that. Since we do not pass the IOC object itself, but a dictionary into the Serializer, validation is mandatory. This seems to be expensive. I would like to add a constant that allows us to skip the serialisation via the Serializer and thus bypass the validation.

@regulartim regulartim requested a review from mlodic December 12, 2024 11:04
Comment thread greedybear/consts.py Outdated
POST = "POST"

FEEDS_LICENSE = "https://github.com/honeynet/GreedyBear/blob/main/FEEDS_LICENSE.md"
SKIP_FEED_VALIDATION = False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if you want to dynamically change this without having to re-build the code, you can add it as an environment variable in env_file_template and then extract it as a Django constant in settings.py

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, thanks! :) Will do that and merge.

@regulartim regulartim merged commit 5f0e7e6 into GreedyBear-Project:develop Dec 13, 2024
@regulartim regulartim deleted the improve_api_performance branch December 13, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants