Skip to content

Replace regex IP validation with ipaddress stdlib in EnrichmentSerializer. Closes #881#885

Merged
regulartim merged 1 commit intointelowlproject:developfrom
tanmayjoddar:fix/enrichment-ip-validation-881-v2
Feb 27, 2026
Merged

Replace regex IP validation with ipaddress stdlib in EnrichmentSerializer. Closes #881#885
regulartim merged 1 commit intointelowlproject:developfrom
tanmayjoddar:fix/enrichment-ip-validation-881-v2

Conversation

@tanmayjoddar
Copy link
Copy Markdown
Contributor

@tanmayjoddar tanmayjoddar commented Feb 25, 2026

Description

EnrichmentSerializer.validate() used REGEX_IP which only checked digit count per octet (\d{1,3}), not the valid 0-255 range. This caused invalid IPs like 999.999.999.999 to be accepted. Additionally, valid IPv6 addresses like 2001:db8::1 were incorrectly rejected.

Fixed by replacing the regex with ipaddress.ip_address() from Python stdlib.

Note: attempted to use the existing is_ip_address() from api/views/utils.py as suggested, but this caused a circular import since api/views/utils.py imports from api/serializers.py. Used ipaddress.ip_address() directly instead, which is the same underlying implementation. Happy to refactor if there is a preferred way to break the circular dependency.

Related issues

Closes #881

Type of change

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • Chore (refactoring, dependency updates, CI/CD changes, code cleanup, docs-only changes).

Checklist

Formalities

  • I have read and understood the rules about how to Contribute to this project.
  • I chose an appropriate title for the pull request in the form: <feature name>. Closes #999
  • My branch is based on develop.
  • The pull request is for the branch develop.
  • I have reviewed and verified any LLM-generated code included in this PR.

Docs and tests

  • I documented my code changes with docstrings and/or comments.
  • I have checked if my changes affect user-facing behavior that is described in the docs. No user-facing behavior is affected.
  • Linter (Ruff) gave 0 errors.
  • I have added tests for the feature/bug I solved.
  • All the tests gave 0 errors.

GUI changes

No GUI changes were made.

Review process

PR was opened as ready for review after all 468 tests passed locally.

Copilot AI review requested due to automatic review settings February 25, 2026 22:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes IP validation in EnrichmentSerializer by replacing regex-based validation (which incorrectly accepted invalid IPs like "999.999.999.999") with Python's ipaddress.ip_address() from the stdlib. The change also adds IPv6 support and properly rejects IPs with octets outside the 0-255 range.

Changes:

  • Replaced REGEX_IP pattern matching with ipaddress.ip_address() for proper IP validation (IPv4 and IPv6)
  • Enhanced domain validation to require at least one alphabetic character
  • Added comprehensive test coverage for invalid IPs, IPv6 addresses, and whitespace handling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
api/serializers.py Replaced regex IP validation with ipaddress.ip_address(), added .strip() for whitespace handling, enhanced domain validation logic, removed unused REGEX_IP import
tests/api/views/test_enrichment_view.py Added 4 new test cases covering invalid IPs with out-of-range octets (999.999.999.999, 256.1.1.1), valid IPv6 addresses, and whitespace handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to 57
observable = data["query"].strip()

try:
ipaddress.ip_address(observable)
is_ip = True
except ValueError:
is_ip = False

# A valid domain must match the domain regex AND contain at least one
is_domain = bool(re.match(REGEX_DOMAIN, observable)) and any(c.isalpha() for c in observable)

if not is_ip and not is_domain:
raise serializers.ValidationError("Observable is not a valid IP address or domain")

try:
required_object = IOC.objects.get(name=observable)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stripped observable is validated but not written back to data["query"]. This means whitespace in the query parameter will be preserved in the database lookup and potentially stored. You should add data["query"] = observable after line 42 to ensure the stripped value is used consistently throughout the validation and database operations.

Copilot uses AI. Check for mistakes.
except ValueError:
is_ip = False

# A valid domain must match the domain regex AND contain at least one
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 50 is incomplete: "A valid domain must match the domain regex AND contain at least one". It should complete the thought, e.g., "A valid domain must match the domain regex AND contain at least one alphabetic character".

Suggested change
# A valid domain must match the domain regex AND contain at least one
# A valid domain must match the domain regex AND contain at least one alphabetic character

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is right here, no?

Comment on lines +79 to +82
def test_query_with_whitespace(self):
"""Query with leading/trailing whitespace should be handled"""
response = self.client.get("/api/enrichment?query= 192.168.0.1 ")
self.assertEqual(response.status_code, 200)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic adds a requirement that domains must contain at least one alphabetic character (line 51 in api/serializers.py). However, there are no new tests verifying that valid domains still work correctly after this change. Consider adding a test case that queries for a valid domain (e.g., "malicious.example.com" from the test fixtures) to ensure domain validation hasn't regressed.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tanmayjoddar ! We need to make some changes here.

I'll use the existing is_ip_address() from api/views/utils.py directly instead of reimplementing.

You did not do that, why?

Also, you did not use our PR template and you did not follow the convention for PR titles. Please make sure you use the PR template and maybe read the contribution guidelines again.

@tanmayjoddar tanmayjoddar changed the title fix(api): replace regex IP validation with ipaddress stdlib in EnrichmentSerializer fix(api): use ipaddress stdlib for IP validation in EnrichmentSerializer. Closes #881 Feb 26, 2026
@tanmayjoddar
Copy link
Copy Markdown
Contributor Author

Thanks for the review @regulartim!

I initially tried importing is_ip_address() directly from api.views.utils as you suggested. However, it causes a circular import:

api/serializers.py → from api.views.utils import is_ip_address

api/views/utils.py (line 19) → from api.serializers import FeedsRequestSerializer

This raises:

ImportError: cannot import name 'is_ip_address' from partially initialized module 'api.views.utils'

I used ipaddress.ip_address() directly as a workaround since it's the same underlying call that is_ip_address() makes.

I have also updated the PR title and body to follow the contribution guidelines and PR template. Sorry for missing that initially.

@tanmayjoddar tanmayjoddar changed the title fix(api): use ipaddress stdlib for IP validation in EnrichmentSerializer. Closes #881 Replace regex IP validation with ipaddress stdlib in EnrichmentSerializer. Closes #881 Feb 26, 2026
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! 👍

I initially tried importing is_ip_address() directly from api.views.utils as you suggested. However, it causes a circular import

Then we should move the function to a different file I guess. What do you think about creating a greedybear/utils.py file and populate it with this function? We might find other shared functions that also should live there.

except ValueError:
is_ip = False

# A valid domain must match the domain regex AND contain at least one
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is right here, no?

@tanmayjoddar tanmayjoddar force-pushed the fix/enrichment-ip-validation-881-v2 branch from e9bc9a6 to fa36d1b Compare February 27, 2026 07:05
@tanmayjoddar
Copy link
Copy Markdown
Contributor Author

Agreed, utils.py makes sense as a dependency-free home for shared validation helpers.
Looking at the codebase, is_sha256hash() is in the same situation as is_ip_address() — both are pure stdlib wrappers with no dependency on Django views, serializers, or models, but used across multiple layers:
-command_sequence.py— uses both
-cowrie_session.py — uses both
-erializers.py — needs is_ip_address() (this PR)
-test_validation_helpers.py — tests both
I'll move both to utils.py and update all import sites accordingly.

--Also fixed the incomplete comment flagged by Copilot in the latest push.

Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move both to utils.py and update all import sites accordingly.

You did not do that yet. Do you want to do it here or in a separate PR?

@tanmayjoddar
Copy link
Copy Markdown
Contributor Author

I'll do it in this PR. Working on it now.

Copilot AI review requested due to automatic review settings February 27, 2026 07:39
@tanmayjoddar tanmayjoddar force-pushed the fix/enrichment-ip-validation-881-v2 branch from fa36d1b to 863009e Compare February 27, 2026 07:39
@tanmayjoddar
Copy link
Copy Markdown
Contributor Author

tanmayjoddar commented Feb 27, 2026

Done — created greedybear/utils.py with both is_ip_address() and is_sha256hash(), updated all import sites, and removed the moved functions from api/views/utils.py. Ready for re-review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

data["query"] = observable

# A valid domain must match the domain regex AND contain at least one alphabetic character
is_domain = bool(re.match(REGEX_DOMAIN, observable)) and any(c.isalpha() for c in observable)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The domain validation logic with the alphabetic character requirement is not covered by tests in the enrichment endpoint. While the IP validation tests are comprehensive, consider adding a test for valid domain queries (e.g., querying "malicious.example.com" which exists in test data) to ensure this validation logic works correctly.

Copilot uses AI. Check for mistakes.
…izer. Closes intelowlproject#881

- Create greedybear/utils.py with shared validation helpers (is_ip_address, is_sha256hash)
- Replace REGEX_IP with is_ip_address() for proper IPv4/IPv6 validation
- Add .strip() with write-back for whitespace handling
- Improve domain validation to require at least one alphabetic character
- Update all import sites to use greedybear.utils
- Remove moved functions and unused imports from api/views/utils.py
- Add regression tests for invalid IPs, valid IPv6, whitespace, and domains
@tanmayjoddar tanmayjoddar force-pushed the fix/enrichment-ip-validation-881-v2 branch from 863009e to 69d2575 Compare February 27, 2026 07:50
Copy link
Copy Markdown
Collaborator

@regulartim regulartim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, thanks! :)

@regulartim regulartim merged commit d68c85c into intelowlproject:develop Feb 27, 2026
5 checks passed
cclts pushed a commit to cclts/GreedyBear that referenced this pull request Mar 11, 2026
…izer. Closes intelowlproject#881 (intelowlproject#885)

- Create greedybear/utils.py with shared validation helpers (is_ip_address, is_sha256hash)
- Replace REGEX_IP with is_ip_address() for proper IPv4/IPv6 validation
- Add .strip() with write-back for whitespace handling
- Improve domain validation to require at least one alphabetic character
- Update all import sites to use greedybear.utils
- Remove moved functions and unused imports from api/views/utils.py
- Add regression tests for invalid IPs, valid IPv6, whitespace, and domains
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants