Strategy-specific E2E tests and edge cases for ExtractionPipeline. Closes #636#740
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive end-to-end tests for the ExtractionPipeline class, building on the foundation laid in PR #735. The tests verify the complete extraction workflow from Elasticsearch hits through strategy selection to IOC persistence and scoring.
Changes:
- Added 24 new tests (~795 lines) covering strategy-specific E2E scenarios, edge cases, and factory integration
- Tests validate Cowrie, Log4pot, and Generic extraction strategies through the full pipeline
- Edge case tests verify error handling, validation, and boundary conditions
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…mocks - Replace weak assertGreaterEqual(result, 0) with specific mock.call_count assertions - Fix E2E tests to use proper ExtractionStrategyFactory mocking pattern - Remove unnecessary UpdateScores patch decorators from factory tests - Remove unused mock_scores parameters
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Coverage
All 446 tests passing. Hello @regulartim the PR is ready for review. |
regulartim
left a comment
There was a problem hiding this comment.
Nice! 👍 Two things might need to be changed though:
- Yes, the file is too long. I thing you can split it up by test category (for example factory tests, edge cases, strategy E2E) and name them something like
test_extraction_pipeline_factory.py. - In the strategy specific tests, you are mocking the ExtractionStrategyFactory and the strategies themselves. Is there a good reason to do this? Isn't it best to just mock the repository, such that we are testing the whole extraction process as it runs in production?
|
Hi @regulartim However, you're right that for true E2E testing, we should let the real components run and only mock at the repository boundary. This gives us confidence that the actual integration works. I'll refactor to use the real factory and strategies. |
- Split monolithic test file into 4 focused files - E2E tests now use real ExtractionStrategyFactory and strategies - Only mock repositories at the boundary - Tests actual integration path as it runs in production
- test_honeypot_skipped_when_not_ready (grouping file) - test_strategy_returns_empty_ioc_records (E2E file) - test_partial_strategy_success (E2E file) - test_large_batch_of_hits (E2E file)
regulartim
left a comment
There was a problem hiding this comment.
Sorry, I just realized that none of the tests verifies that the extracted IOCs are actually populated. Could you add at least one E2E test that inspects the actual IOC content (IP, honeypot type) rather than just the count?
- Add TestIocContentVerification class with 3 tests for IOC content verification - Move E2ETestCase class to tests/__init__.py for shared usage (reviewer feedback) - Split edge cases into test_extraction_pipeline_edge_cases.py Edge cases now clearly document when mocking is required: - test_partial_strategy_success: Mocks factory (needs to force exception) - test_large_batch_of_hits_with_real_strategy: Uses REAL strategy Tests added: - test_cowrie_ioc_content_verified: Verifies IOC has correct IP - test_multiple_honeypots_ioc_content_verified: Verifies multiple IOCs - test_ioc_scanner_field_contains_honeypot_type: Verifies scanner field Addresses reviewer feedback to: 1. Verify actual IOC content, not just count 2. Move shared test infrastructure to tests/__init__.py 3. Keep test files focused and manageable in size 4. Use real strategies where possible in tests
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @regulartim Regarding the We encountered a trade-off between using real strategies (as requested) and having unconditional deterministic assertions.
Decision:
I believe this adheres best to the goal of "testing the actual integration path" while still adding the requested content verification. |
|
Hey @opbot-xd ! Thank you for your work. Are you done with working on this? To me it looks good now. Would like to approve and merge. |
|
Yes I have finished my work over this PR. Should I update the PR desc? |
|
No, all good! :) |
Description
This PR adds comprehensive strategy-specific E2E tests and edge cases for the
ExtractionPipelineclass, completing the test coverage requested in issue #636.Building on PR #735 (pipeline infrastructure + core flow tests), this PR adds:
New Test Classes (24 tests total, ~795 lines):
TestCowrieStrategyE2E(3 tests)TestLog4potStrategyE2E(3 tests)TestGenericStrategyE2E(3 tests)TestMixedHoneypotE2E(1 test)TestEdgeCasesE2E(10 tests)is_ready_for_extractionreturning False)TestFactoryIntegration(4 tests)Related issues
Closes #636
Type of change
Checklist
develop.Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.