Replace regex IP validation with ipaddress stdlib in EnrichmentSerializer. Closes #881#885
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes IP validation in EnrichmentSerializer by replacing regex-based validation (which incorrectly accepted invalid IPs like "999.999.999.999") with Python's ipaddress.ip_address() from the stdlib. The change also adds IPv6 support and properly rejects IPs with octets outside the 0-255 range.
Changes:
- Replaced
REGEX_IPpattern matching withipaddress.ip_address()for proper IP validation (IPv4 and IPv6) - Enhanced domain validation to require at least one alphabetic character
- Added comprehensive test coverage for invalid IPs, IPv6 addresses, and whitespace handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| api/serializers.py | Replaced regex IP validation with ipaddress.ip_address(), added .strip() for whitespace handling, enhanced domain validation logic, removed unused REGEX_IP import |
| tests/api/views/test_enrichment_view.py | Added 4 new test cases covering invalid IPs with out-of-range octets (999.999.999.999, 256.1.1.1), valid IPv6 addresses, and whitespace handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| observable = data["query"].strip() | ||
|
|
||
| try: | ||
| ipaddress.ip_address(observable) | ||
| is_ip = True | ||
| except ValueError: | ||
| is_ip = False | ||
|
|
||
| # A valid domain must match the domain regex AND contain at least one | ||
| is_domain = bool(re.match(REGEX_DOMAIN, observable)) and any(c.isalpha() for c in observable) | ||
|
|
||
| if not is_ip and not is_domain: | ||
| raise serializers.ValidationError("Observable is not a valid IP address or domain") | ||
|
|
||
| try: | ||
| required_object = IOC.objects.get(name=observable) |
There was a problem hiding this comment.
The stripped observable is validated but not written back to data["query"]. This means whitespace in the query parameter will be preserved in the database lookup and potentially stored. You should add data["query"] = observable after line 42 to ensure the stripped value is used consistently throughout the validation and database operations.
api/serializers.py
Outdated
| except ValueError: | ||
| is_ip = False | ||
|
|
||
| # A valid domain must match the domain regex AND contain at least one |
There was a problem hiding this comment.
The comment on line 50 is incomplete: "A valid domain must match the domain regex AND contain at least one". It should complete the thought, e.g., "A valid domain must match the domain regex AND contain at least one alphabetic character".
| # A valid domain must match the domain regex AND contain at least one | |
| # A valid domain must match the domain regex AND contain at least one alphabetic character |
There was a problem hiding this comment.
Copilot is right here, no?
| def test_query_with_whitespace(self): | ||
| """Query with leading/trailing whitespace should be handled""" | ||
| response = self.client.get("/api/enrichment?query= 192.168.0.1 ") | ||
| self.assertEqual(response.status_code, 200) |
There was a problem hiding this comment.
The new validation logic adds a requirement that domains must contain at least one alphabetic character (line 51 in api/serializers.py). However, there are no new tests verifying that valid domains still work correctly after this change. Consider adding a test case that queries for a valid domain (e.g., "malicious.example.com" from the test fixtures) to ensure domain validation hasn't regressed.
regulartim
left a comment
There was a problem hiding this comment.
Hey @tanmayjoddar ! We need to make some changes here.
I'll use the existing is_ip_address() from api/views/utils.py directly instead of reimplementing.
You did not do that, why?
Also, you did not use our PR template and you did not follow the convention for PR titles. Please make sure you use the PR template and maybe read the contribution guidelines again.
|
Thanks for the review @regulartim! I initially tried importing is_ip_address() directly from api.views.utils as you suggested. However, it causes a circular import: api/serializers.py → from api.views.utils import is_ip_address api/views/utils.py (line 19) → from api.serializers import FeedsRequestSerializer This raises: ImportError: cannot import name 'is_ip_address' from partially initialized module 'api.views.utils' I used ipaddress.ip_address() directly as a workaround since it's the same underlying call that is_ip_address() makes. I have also updated the PR title and body to follow the contribution guidelines and PR template. Sorry for missing that initially. |
regulartim
left a comment
There was a problem hiding this comment.
Thanks for the clarification! 👍
I initially tried importing is_ip_address() directly from api.views.utils as you suggested. However, it causes a circular import
Then we should move the function to a different file I guess. What do you think about creating a greedybear/utils.py file and populate it with this function? We might find other shared functions that also should live there.
api/serializers.py
Outdated
| except ValueError: | ||
| is_ip = False | ||
|
|
||
| # A valid domain must match the domain regex AND contain at least one |
There was a problem hiding this comment.
Copilot is right here, no?
e9bc9a6 to
fa36d1b
Compare
|
Agreed, --Also fixed the incomplete comment flagged by Copilot in the latest push. |
regulartim
left a comment
There was a problem hiding this comment.
I'll move both to utils.py and update all import sites accordingly.
You did not do that yet. Do you want to do it here or in a separate PR?
|
I'll do it in this PR. Working on it now. |
fa36d1b to
863009e
Compare
|
Done — created greedybear/utils.py with both is_ip_address() and is_sha256hash(), updated all import sites, and removed the moved functions from api/views/utils.py. Ready for re-review. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| data["query"] = observable | ||
|
|
||
| # A valid domain must match the domain regex AND contain at least one alphabetic character | ||
| is_domain = bool(re.match(REGEX_DOMAIN, observable)) and any(c.isalpha() for c in observable) |
There was a problem hiding this comment.
The domain validation logic with the alphabetic character requirement is not covered by tests in the enrichment endpoint. While the IP validation tests are comprehensive, consider adding a test for valid domain queries (e.g., querying "malicious.example.com" which exists in test data) to ensure this validation logic works correctly.
…izer. Closes intelowlproject#881 - Create greedybear/utils.py with shared validation helpers (is_ip_address, is_sha256hash) - Replace REGEX_IP with is_ip_address() for proper IPv4/IPv6 validation - Add .strip() with write-back for whitespace handling - Improve domain validation to require at least one alphabetic character - Update all import sites to use greedybear.utils - Remove moved functions and unused imports from api/views/utils.py - Add regression tests for invalid IPs, valid IPv6, whitespace, and domains
863009e to
69d2575
Compare
regulartim
left a comment
There was a problem hiding this comment.
Good job, thanks! :)
…izer. Closes intelowlproject#881 (intelowlproject#885) - Create greedybear/utils.py with shared validation helpers (is_ip_address, is_sha256hash) - Replace REGEX_IP with is_ip_address() for proper IPv4/IPv6 validation - Add .strip() with write-back for whitespace handling - Improve domain validation to require at least one alphabetic character - Update all import sites to use greedybear.utils - Remove moved functions and unused imports from api/views/utils.py - Add regression tests for invalid IPs, valid IPv6, whitespace, and domains
Description
EnrichmentSerializer.validate()usedREGEX_IPwhich only checked digit count per octet (\d{1,3}), not the valid 0-255 range. This caused invalid IPs like999.999.999.999to be accepted. Additionally, valid IPv6 addresses like2001:db8::1were incorrectly rejected.Fixed by replacing the regex with
ipaddress.ip_address()from Python stdlib.Note: attempted to use the existing
is_ip_address()fromapi/views/utils.pyas suggested, but this caused a circular import sinceapi/views/utils.pyimports fromapi/serializers.py. Usedipaddress.ip_address()directly instead, which is the same underlying implementation. Happy to refactor if there is a preferred way to break the circular dependency.Related issues
Closes #881
Type of change
Checklist
Formalities
<feature name>. Closes #999develop.develop.Docs and tests
Ruff) gave 0 errors.GUI changes
No GUI changes were made.
Review process
PR was opened as ready for review after all 468 tests passed locally.