Refactor extraction process. Closes #622.#624
Merged
regulartim merged 23 commits intodevelopfrom Dec 20, 2025
Merged
Conversation
(this is now tested together with the elastic repository)
with configurable initial extraction interval
Member
Author
|
Sorry @mlodic for the huge amount of changes in a single PR. But don't be scared, most of the lines are doc strings and tests anyway. :D |
mlodic
approved these changes
Dec 19, 2025
Collaborator
mlodic
left a comment
There was a problem hiding this comment.
great job! can you also add the schema and its explanation in a separate .md file in the root of the greedybear folder? In that way it is easier to find its reference, otherwise it would be easily lost between all the PRs.
This was referenced Dec 20, 2025
Closed
12 tasks
This was referenced Mar 3, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces a complete rework of the extraction process. The idea is to improve testability, extensibility and maintainability by following some best practices:
The new process flow looks like this:
sequenceDiagram participant Job as ExtractionJob participant Pipeline as ExtractionPipeline participant Elastic as ElasticRepository participant Factory as StrategyFactory participant Strategy as ExtractionStrategy participant Processor as IocProcessor participant Repo as IocRepository Job->>Pipeline: execute() Pipeline->>Elastic: search(minutes_back) Elastic-->>Pipeline: hits[] loop Each honeypot Pipeline->>Factory: get_strategy(honeypot) Factory-->>Pipeline: strategy Pipeline->>Strategy: extract_from_hits(hits) Strategy->>Strategy: iocs_from_hits(hits) loop Each IOC Strategy->>Processor: add_ioc(ioc) Processor->>Repo: get_ioc_by_name(name) alt IOC exists Processor->>Processor: merge_iocs() Processor->>Repo: save(ioc) else New IOC Processor->>Repo: save(ioc) end end end Pipeline->>Pipeline: UpdateScores()A single ExtractionPipeline instance orchestrates the extraction of all available honeypots. Is uses the ElasticRepository to receive a list of all honeypot hits from a certain time window. For each honeypot it gets the corresponding ExtractionStrategy, which contains all the extraction logic that is specific for a certain type of honeypot (e.g. Cowrie). The ExtractionStrategy uses this logic to create IOC objects and hands them to the IocProcessor, which is responsible for - well - processing them so they can be written to the database via the IocRepository.
Key changes (functional)
Next steps
main.(I will open separate issues / PRs for them.)
Related issues
Type of change
Checklist
develop.Black,Flake,Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.Important Rules