feat(tools): add SSRF prevention for git clone URLs#505
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security of Git clone operations by introducing robust Server-Side Request Forgery (SSRF) prevention. It ensures that Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds SSRF-resistant git clone validation: new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/User
participant Tool as GitCloneTool
participant Scheme as is_allowed_clone_scheme
participant HostVal as validate_clone_url_host
participant DNS as DNS Resolver
participant GitExec as git executor
Client->>Tool: clone_repo(url, ...)
Tool->>Scheme: validate scheme
alt scheme invalid
Scheme-->>Tool: blocked
Tool-->>Client: Error (invalid scheme)
else scheme allowed
Scheme-->>Tool: allowed
Tool->>HostVal: validate_clone_url_host(url, policy)
HostVal->>HostVal: extract hostname
alt literal IP
HostVal->>HostVal: check IP blocklist
else domain name
HostVal->>DNS: resolve hostname (timeout)
DNS-->>HostVal: IPs / error
HostVal->>HostVal: check resolved IPs vs blocklist
end
alt blocked / DNS failure
HostVal-->>Tool: validation failed (SSRF/DNS)
Tool-->>Client: Error (SSRF/DNS)
else allowed
HostVal-->>Tool: validation passed
Tool->>GitExec: execute git clone
GitExec-->>Client: Clone complete
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces robust SSRF prevention for git clone URLs. The core logic is encapsulated in a new git_url_validator.py module, which performs scheme validation, hostname extraction, and asynchronous DNS resolution to ensure clone targets do not resolve to private or reserved IP addresses. The implementation is comprehensive, handling edge cases like IPv6-mapped IPv4 addresses and failing closed on all error paths. The new functionality is configurable through a GitCloneNetworkPolicy model and is thoroughly covered by an extensive suite of unit, integration, and property-based tests. The changes are well-structured and significantly improve the security posture of the git tooling.
| if normalized != self.hostname_allowlist: | ||
| object.__setattr__(self, "hostname_allowlist", normalized) | ||
| return self |
There was a problem hiding this comment.
While using object.__setattr__ on a frozen Pydantic model is a known pattern, a more idiomatic approach in Pydantic v2 is to return a new model instance with the updated value. This aligns better with the principles of immutability by creating a new object instead of mutating one in place.
You can achieve this using self.model_copy(update={...}) and returning the new instance when a change is needed.
| if normalized != self.hostname_allowlist: | |
| object.__setattr__(self, "hostname_allowlist", normalized) | |
| return self | |
| if normalized != self.hostname_allowlist: | |
| return self.model_copy(update={"hostname_allowlist": normalized}) | |
| return self |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/config/schema.py`:
- Around line 537-540: The RootConfig class docstring's Attributes section is
missing the new git_clone field; update the docstring for RootConfig to include
an entry for git_clone describing its type (GitCloneNetworkPolicy) and purpose
(Git clone SSRF prevention network policy), similar to how trust, promotion, and
coordination are documented so readers see the new Field defined as git_clone:
GitCloneNetworkPolicy = Field(default_factory=GitCloneNetworkPolicy,
description="Git clone SSRF prevention network policy").
In `@src/synthorg/tools/git_tools.py`:
- Around line 679-683: The rejection path currently logs the raw clone URL (in
the block using is_allowed_clone_scheme and GIT_CLONE_URL_REJECTED via
logger.warning), which can leak credentials; replace the raw url in the log with
a redacted value that contains only scheme+host (no userinfo or path). Reuse the
same sanitizer used in git_url_validator.py (the function that produces the
scheme+host redaction used around line 322) to produce the redacted URL, and
pass that redacted string to logger.warning instead of the original url.
In `@src/synthorg/tools/git_url_validator.py`:
- Around line 60-68: The _BLOCKED_NETWORKS denylist is missing the IPv6
multicast block ff00::/8; update the _BLOCKED_NETWORKS definition in
git_url_validator.py to include ipaddress.IPv6Network("ff00::/8") alongside the
other IPv6 networks (e.g., near the lines with ipaddress.IPv6Network("fc00::/7")
and "fe80::/10") so IPv6 multicast literals/responses are correctly treated as
denied.
- Around line 214-217: The current SCP-like check rejects any URL containing
"::", which inadvertently blocks bracketed IPv6 hosts like
git@[2001:db8::1]:repo.git; update the boolean expression (the return statement
that currently reads "@" in url and ":" in url and "::" not in url and "://" not
in url) to allow "::" when it appears inside a bracketed IPv6 literal (e.g.
detect '@[' and a closing ']' before the path colon) — e.g. permit URLs where
'@[' is present and a matching ']' occurs prior to the first ':' — while still
rejecting unbracketed "::" and "://"; reference the same SCP-like check and keep
behavior consistent with _extract_hostname which already supports bracketed
IPv6.
In `@tests/unit/tools/git/test_git_clone.py`:
- Around line 134-142: Remove the redundant parentheses around the URL literal
passed to tool.execute in the arguments dict: change the value for the "url" key
from ("https://internal-git.example.com/repo.git") to
"https://internal-git.example.com/repo.git" so the test remains the same but
without unnecessary grouping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8be66d2c-3eb0-4874-9cc8-d8d470d6d744
📒 Files selected for processing (11)
CLAUDE.mdsrc/synthorg/config/defaults.pysrc/synthorg/config/schema.pysrc/synthorg/observability/events/git.pysrc/synthorg/tools/__init__.pysrc/synthorg/tools/git_tools.pysrc/synthorg/tools/git_url_validator.pytests/unit/tools/git/conftest.pytests/unit/tools/git/test_git_clone.pytests/unit/tools/git/test_git_tools.pytests/unit/tools/git/test_git_url_validator.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python code. Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:(no parentheses) for exception syntax on Python 3.14. Do not useexcept (A, B):.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Docstrings are enforced by ruff D rules.
Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement.
Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves.
Do not mix static config fields with mutable runtime fields in one Pydantic model.
Use Pydantic v2 conventions:@computed_fieldfor derived values,NotBlankStrfor identifier/name fields (including optional and tuple variants), andmodel_validator/ConfigDict.
Useasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code. Prefer structured concurrency over barecreate_task.
Keep function bodies under 50 lines and files under 800 lines.
Handle errors explicitly. Never silently swallow exceptions.
Validate user input, external API responses, and config files at system boundaries.
Useruff checkandruff formatfor Python linting and formatting (88-character line length).
Usemypywith strict mode for type-checking all Python code.
Python version must be 3.14 or higher. PEP 649 provides native lazy annotations.
Files:
tests/unit/tools/git/test_git_tools.pysrc/synthorg/observability/events/git.pysrc/synthorg/config/defaults.pysrc/synthorg/config/schema.pysrc/synthorg/tools/__init__.pytests/unit/tools/git/test_git_url_validator.pytests/unit/tools/git/test_git_clone.pytests/unit/tools/git/conftest.pysrc/synthorg/tools/git_url_validator.pysrc/synthorg/tools/git_tools.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto mark test cases.
Prefer@pytest.mark.parametrizefor testing similar cases with multiple inputs.
Use Hypothesis for property-based testing in Python with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var (dev: 1000 examples, ci: 200 examples).
Files:
tests/unit/tools/git/test_git_tools.pytests/unit/tools/git/test_git_url_validator.pytests/unit/tools/git/test_git_clone.pytests/unit/tools/git/conftest.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logger withfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name for loggers. Never use_loggerorlog.
Use event name constants from domain-specific modules undersynthorg.observability.eventsfor all logging calls. Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured logging withlogger.info(EVENT, key=value)syntax. Never use format string logging likelogger.info('msg %s', val).
All error paths must log at WARNING or ERROR level with context before raising.
All state transitions must log at INFO level.
Use DEBUG level for object creation, internal flow, and entry/exit of key functions.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Usetest-provider,test-small-001, etc. in tests.
Library API reference is auto-generated from docstrings via mkdocstrings + Griffe indocs/api/. Use Google-style docstrings for public APIs.
Use Pydantic BaseModel for all data models. Frozen models for config/identity. Mutable-via-copy models for runtime state.
Files:
src/synthorg/observability/events/git.pysrc/synthorg/config/defaults.pysrc/synthorg/config/schema.pysrc/synthorg/tools/__init__.pysrc/synthorg/tools/git_url_validator.pysrc/synthorg/tools/git_tools.py
src/synthorg/observability/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Observability includes structured logging via
get_logger(__name__), correlation tracking, and log sinks.
Files:
src/synthorg/observability/events/git.py
src/synthorg/tools/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Tool registry supports built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge, role-based access, approval tool (request_human_approval).
Files:
src/synthorg/tools/__init__.pysrc/synthorg/tools/git_url_validator.pysrc/synthorg/tools/git_tools.py
🧠 Learnings (17)
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/synthorg/tools/**/*.py : Tool registry supports built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge, role-based access, approval tool (request_human_approval).
Applied to files:
tests/unit/tools/git/test_git_tools.pysrc/synthorg/config/schema.pysrc/synthorg/tools/__init__.pyCLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` for all logging calls. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to web/src/components/** : Vue components organized by feature (agents/, approvals/, budget/, common/, dashboard/, layout/, messages/, org-chart/, tasks/).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to web/src/{components,views,styles}/**/* : Vue dashboard uses PrimeVue components + Tailwind CSS styling. Global CSS in styles/, theme configuration in styles/.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to web/src/components/**/*.vue : Vue dashboard uses TypeScript. Organize components by feature (agents/, approvals/, budget/, common/, dashboard/, layout/, messages/, org-chart/, tasks/).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.651Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.651Z
Learning: Security architecture is documented in `docs/security.md` (comprehensive guide covering SecOps agent, rule engine, audit log, output scanning, risk classification, autonomy levels, timeout policies).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to docker/{Dockerfile*,compose.yml} : Docker: Backend uses 3-stage build (builder → setup → distroless runtime), Chainguard Python, non-root (UID 65532), CIS-hardened. Web uses nginxinc/nginx-unprivileged, Vue 3 SPA with PrimeVue + Tailwind CSS, SPA routing, API/WebSocket proxy to backend.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Every module with business logic must import logger with `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging`, `logging.getLogger()`, or `print()` in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Use structured logging with `logger.info(EVENT, key=value)` syntax. Never use format string logging like `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability includes structured logging via `get_logger(__name__)`, correlation tracking, and log sinks.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.
Applied to files:
CLAUDE.md
🧬 Code graph analysis (4)
src/synthorg/config/schema.py (1)
src/synthorg/tools/git_url_validator.py (1)
GitCloneNetworkPolicy(74-119)
src/synthorg/tools/__init__.py (1)
src/synthorg/tools/git_url_validator.py (1)
GitCloneNetworkPolicy(74-119)
tests/unit/tools/git/test_git_url_validator.py (1)
src/synthorg/tools/git_url_validator.py (5)
GitCloneNetworkPolicy(74-119)_extract_hostname(154-194)_is_blocked_ip(125-151)is_allowed_clone_scheme(197-217)validate_clone_url_host(300-352)
src/synthorg/tools/git_tools.py (1)
src/synthorg/tools/git_url_validator.py (3)
GitCloneNetworkPolicy(74-119)is_allowed_clone_scheme(197-217)validate_clone_url_host(300-352)
🔇 Additional comments (14)
src/synthorg/observability/events/git.py (1)
11-12: LGTM!New event constants follow the established naming convention (
git.clone.*) and are properly typed asFinal[str].CLAUDE.md (2)
135-135: LGTM!Documentation accurately reflects the new
git_url_validatormodule added for SSRF prevention.
199-199: LGTM!Event constants list correctly updated with the new
GIT_CLONE_SSRF_BLOCKEDandGIT_CLONE_DNS_FAILEDevents fromevents.git.src/synthorg/config/schema.py (1)
35-35: LGTM!Import correctly placed with other tools-related imports.
src/synthorg/config/defaults.py (1)
43-43: LGTM!Default entry follows the established pattern. The empty dict will be merged with
GitCloneNetworkPolicy's defaults via the schema'sdefault_factory.src/synthorg/tools/__init__.py (2)
31-31: LGTM!Import correctly placed and follows the established module organization.
61-61: LGTM!Export added in correct alphabetical position within
__all__.tests/unit/tools/git/test_git_clone.py (3)
1-12: LGTM!Clean module setup with appropriate imports and timeout configuration.
18-91: LGTM!Comprehensive test coverage for
GitCloneToolbasic operations: local clone, shallow clone with depth, directory traversal prevention, invalid URL rejection, and branch-specific cloning.
97-153: LGTM!SSRF prevention tests effectively cover key scenarios: loopback blocking, private IP blocking, allowlist bypass verification, and file:// scheme rejection. The allowlist test correctly validates that SSRF checks pass (no "blocked"/"ssrf" in output) while allowing the clone to fail for other reasons.
tests/unit/tools/git/test_git_tools.py (1)
11-12: LGTM!Good documentation explaining why
GitCloneToolis still imported despite clone-specific tests being moved totest_git_clone.py. The tool correctly remains in_ALL_GIT_TOOL_CLASSESfor property-based tests.tests/unit/tools/git/conftest.py (1)
176-200: LGTM!Fixture correctly updated to bypass both the scheme check (
is_allowed_clone_scheme) and the new SSRF host validation (validate_clone_url_host) for localfile://clone tests. The async signature of_allow_all_hostscorrectly matches the production function's signature.src/synthorg/tools/git_tools.py (1)
712-717: SSRF validation ordering looks correct.Running host/IP validation right before clone execution (after cheap local checks) keeps the flow fail-closed without unnecessary DNS work.
tests/unit/tools/git/test_git_url_validator.py (1)
617-617: Remove hardcoded@settings(max_examples=200)to respect Hypothesis profiles.Per coding guidelines,
HYPOTHESIS_PROFILEenvironment variable should control example counts (dev: 1000, ci: 200), not hardcoded decorators. Remove@settings(max_examples=200)from lines 617, 631, and 641.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/synthorg/tools/git_url_validator.py (1)
45-47:⚠️ Potential issue | 🟠 MajorCredential redaction is incomplete for non-HTTP URL forms.
Line [46] only redacts
https?://...@. If hostname extraction fails later (Line [344]), inputs likessh://token@/repo.gitcan still be logged with userinfo intact.🔒 Proposed hardening
-_CREDENTIAL_RE: Final[re.Pattern[str]] = re.compile(r"(https?://)[^@/]+@") +_URL_USERINFO_RE: Final[re.Pattern[str]] = re.compile( + r"((?:https?|ssh)://)[^@/]+@" +) +_SCP_USERINFO_RE: Final[re.Pattern[str]] = re.compile(r"^([^@/:]+)@") + +def _redact_clone_url(url: str) -> str: + """Redact userinfo in clone URL logs.""" + redacted = _URL_USERINFO_RE.sub(r"\1***@", url) + if "://" not in redacted: + redacted = _SCP_USERINFO_RE.sub("***@", redacted) + return redacted- redacted = _CREDENTIAL_RE.sub(r"\1***@", url) + redacted = _redact_clone_url(url)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/tools/git_url_validator.py` around lines 45 - 47, The current credential redaction regex _CREDENTIAL_RE only matches http(s) schemes; broaden it to detect userinfo for any URL scheme (e.g., match any scheme like ssh://user@ or token@) by updating _CREDENTIAL_RE to accept arbitrary scheme prefixes before the userinfo, and ensure the sanitization path used as a fallback when hostname extraction fails (the code around the hostname extraction at ~line 344) applies this regex so inputs like ssh://token@/repo.git are redacted before logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/tools/git/test_git_url_validator.py`:
- Around line 602-619: The test duplicates production IPv4 blocklist
(_ALL_BLOCKED_V4) causing drift; instead import and reuse the canonical
_BLOCKED_NETWORKS from the production module and replace the local
_ALL_BLOCKED_V4 tuple with a reference or derived tuple from that imported
constant (e.g., tuple(_BLOCKED_NETWORKS) or similar) so tests always reflect
current production ranges; update any tests that iterate over _ALL_BLOCKED_V4 to
use the imported _BLOCKED_NETWORKS symbol (or a frozen copy) and remove the
duplicated literal list.
---
Duplicate comments:
In `@src/synthorg/tools/git_url_validator.py`:
- Around line 45-47: The current credential redaction regex _CREDENTIAL_RE only
matches http(s) schemes; broaden it to detect userinfo for any URL scheme (e.g.,
match any scheme like ssh://user@ or token@) by updating _CREDENTIAL_RE to
accept arbitrary scheme prefixes before the userinfo, and ensure the
sanitization path used as a fallback when hostname extraction fails (the code
around the hostname extraction at ~line 344) applies this regex so inputs like
ssh://token@/repo.git are redacted before logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d683ebf7-46d2-4d99-9c97-67c7820414b3
📒 Files selected for processing (7)
CLAUDE.mdsrc/synthorg/config/schema.pysrc/synthorg/observability/events/git.pysrc/synthorg/tools/git_tools.pysrc/synthorg/tools/git_url_validator.pytests/unit/tools/git/test_git_clone.pytests/unit/tools/git/test_git_url_validator.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python code. Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:(no parentheses) for exception syntax on Python 3.14. Do not useexcept (A, B):.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Docstrings are enforced by ruff D rules.
Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement.
Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves.
Do not mix static config fields with mutable runtime fields in one Pydantic model.
Use Pydantic v2 conventions:@computed_fieldfor derived values,NotBlankStrfor identifier/name fields (including optional and tuple variants), andmodel_validator/ConfigDict.
Useasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code. Prefer structured concurrency over barecreate_task.
Keep function bodies under 50 lines and files under 800 lines.
Handle errors explicitly. Never silently swallow exceptions.
Validate user input, external API responses, and config files at system boundaries.
Useruff checkandruff formatfor Python linting and formatting (88-character line length).
Usemypywith strict mode for type-checking all Python code.
Python version must be 3.14 or higher. PEP 649 provides native lazy annotations.
Files:
src/synthorg/config/schema.pytests/unit/tools/git/test_git_url_validator.pysrc/synthorg/tools/git_tools.pytests/unit/tools/git/test_git_clone.pysrc/synthorg/tools/git_url_validator.pysrc/synthorg/observability/events/git.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logger withfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name for loggers. Never use_loggerorlog.
Use event name constants from domain-specific modules undersynthorg.observability.eventsfor all logging calls. Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured logging withlogger.info(EVENT, key=value)syntax. Never use format string logging likelogger.info('msg %s', val).
All error paths must log at WARNING or ERROR level with context before raising.
All state transitions must log at INFO level.
Use DEBUG level for object creation, internal flow, and entry/exit of key functions.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Usetest-provider,test-small-001, etc. in tests.
Library API reference is auto-generated from docstrings via mkdocstrings + Griffe indocs/api/. Use Google-style docstrings for public APIs.
Use Pydantic BaseModel for all data models. Frozen models for config/identity. Mutable-via-copy models for runtime state.
Files:
src/synthorg/config/schema.pysrc/synthorg/tools/git_tools.pysrc/synthorg/tools/git_url_validator.pysrc/synthorg/observability/events/git.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto mark test cases.
Prefer@pytest.mark.parametrizefor testing similar cases with multiple inputs.
Use Hypothesis for property-based testing in Python with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var (dev: 1000 examples, ci: 200 examples).
Files:
tests/unit/tools/git/test_git_url_validator.pytests/unit/tools/git/test_git_clone.py
src/synthorg/tools/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Tool registry supports built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge, role-based access, approval tool (request_human_approval).
Files:
src/synthorg/tools/git_tools.pysrc/synthorg/tools/git_url_validator.py
src/synthorg/observability/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Observability includes structured logging via
get_logger(__name__), correlation tracking, and log sinks.
Files:
src/synthorg/observability/events/git.py
🧠 Learnings (22)
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/synthorg/tools/**/*.py : Tool registry supports built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge, role-based access, approval tool (request_human_approval).
Applied to files:
src/synthorg/config/schema.pyCLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Use Pydantic BaseModel for all data models. Frozen models for config/identity. Mutable-via-copy models for runtime state.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to **/*.py : Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, use `copy.deepcopy()` at construction and wrap with `MappingProxyType` for read-only enforcement.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to web/src/components/** : Vue components organized by feature (agents/, approvals/, budget/, common/, dashboard/, layout/, messages/, org-chart/, tasks/).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to web/src/{components,views,styles}/**/* : Vue dashboard uses PrimeVue components + Tailwind CSS styling. Global CSS in styles/, theme configuration in styles/.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to web/src/components/**/*.vue : Vue dashboard uses TypeScript. Organize components by feature (agents/, approvals/, budget/, common/, dashboard/, layout/, messages/, org-chart/, tasks/).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.651Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.651Z
Learning: Security architecture is documented in `docs/security.md` (comprehensive guide covering SecOps agent, rule engine, audit log, output scanning, risk classification, autonomy levels, timeout policies).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to docker/{Dockerfile*,compose.yml} : Docker: Backend uses 3-stage build (builder → setup → distroless runtime), Chainguard Python, non-root (UID 65532), CIS-hardened. Web uses nginxinc/nginx-unprivileged, Vue 3 SPA with PrimeVue + Tailwind CSS, SPA routing, API/WebSocket proxy to backend.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Every module with business logic must import logger with `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging`, `logging.getLogger()`, or `print()` in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` for all logging calls. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Use structured logging with `logger.info(EVENT, key=value)` syntax. Never use format string logging like `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability includes structured logging via `get_logger(__name__)`, correlation tracking, and log sinks.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level.
Applied to files:
CLAUDE.md
🧬 Code graph analysis (4)
src/synthorg/config/schema.py (1)
src/synthorg/tools/git_url_validator.py (1)
GitCloneNetworkPolicy(82-127)
tests/unit/tools/git/test_git_url_validator.py (1)
src/synthorg/tools/git_url_validator.py (5)
GitCloneNetworkPolicy(82-127)_extract_hostname(162-202)_is_blocked_ip(133-159)is_allowed_clone_scheme(205-233)validate_clone_url_host(318-376)
src/synthorg/tools/git_tools.py (1)
src/synthorg/tools/git_url_validator.py (3)
GitCloneNetworkPolicy(82-127)is_allowed_clone_scheme(205-233)validate_clone_url_host(318-376)
tests/unit/tools/git/test_git_clone.py (2)
src/synthorg/tools/git_tools.py (7)
GitCloneTool(596-720)execute(86-104)execute(205-248)execute(314-360)execute(454-498)execute(556-590)execute(661-720)src/synthorg/tools/git_url_validator.py (1)
GitCloneNetworkPolicy(82-127)
🔇 Additional comments (7)
src/synthorg/observability/events/git.py (1)
11-13: Looks good: SSRF/DNS event constants are clear and consistent.These constants fit the existing git event naming scheme and support the new observability paths cleanly.
src/synthorg/tools/git_url_validator.py (1)
133-376: Strong fail-closed SSRF validation flow.Blocked-range checks, IPv6-mapped IPv4 handling, DNS timeout/error handling, and “all resolved IPs must be public” logic are implemented defensively.
tests/unit/tools/git/test_git_clone.py (1)
18-153: Good integration coverage for clone + SSRF paths.The added cases exercise both normal clone flows and SSRF policy behaviors, including blocked/private literals and allowlist bypass.
src/synthorg/config/schema.py (1)
422-423: Config/schema integration is clean.
RootConfig.git_cloneis properly documented and wired withdefault_factory=GitCloneNetworkPolicy, matching the new network policy surface.Also applies to: 538-541
tests/unit/tools/git/test_git_url_validator.py (1)
26-598: Great test depth for SSRF and DNS edge cases.The suite covers literal IPs, DNS failures, mixed DNS answers, IPv6 paths, and property-based guarantees well.
Also applies to: 622-662
src/synthorg/tools/git_tools.py (1)
612-613: SSRF enforcement is integrated in the right execution point.Running cheap local validations first and DNS-based host checks before
git cloneis a solid ordering for both performance and safety.Also applies to: 657-659, 668-670, 680-720
CLAUDE.md (1)
135-135: Docs update is aligned with the feature changes.The tool-registry and logging guidance now reflect the new git clone SSRF validation/event model.
Also applies to: 199-199
| @model_validator(mode="after") | ||
| def _normalize_allowlist(self) -> Self: | ||
| """Lowercase and deduplicate allowlist entries.""" | ||
| normalized = tuple(dict.fromkeys(h.lower() for h in self.hostname_allowlist)) | ||
| if normalized != self.hostname_allowlist: | ||
| object.__setattr__(self, "hostname_allowlist", normalized) | ||
| return self |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify frozen-model mutation pattern in validator implementation.
rg -n --type py 'object\.__setattr__\(|model_validator\(mode="after"\)|field_validator\(' src/synthorg/tools/git_url_validator.pyRepository: Aureliolo/synthorg
Length of output: 175
🏁 Script executed:
cat -n src/synthorg/tools/git_url_validator.py | head -150Repository: Aureliolo/synthorg
Length of output: 6729
Use field_validator for field normalization instead of mutating a frozen model.
Line 126 mutates the frozen model via object.__setattr__, which violates the immutability principle. A field_validator normalizes the field during construction without post-hoc mutation.
♻️ Proposed refactor
-from typing import Final, Self
+from typing import Final
...
-from pydantic import BaseModel, ConfigDict, Field, model_validator
+from pydantic import BaseModel, ConfigDict, Field, field_validator
...
- `@model_validator`(mode="after")
- def _normalize_allowlist(self) -> Self:
- """Lowercase and deduplicate allowlist entries."""
- normalized = tuple(dict.fromkeys(h.lower() for h in self.hostname_allowlist))
- if normalized != self.hostname_allowlist:
- object.__setattr__(self, "hostname_allowlist", normalized)
- return self
+ `@field_validator`("hostname_allowlist")
+ `@classmethod`
+ def _normalize_allowlist(
+ cls,
+ value: tuple[NotBlankStr, ...],
+ ) -> tuple[NotBlankStr, ...]:
+ """Lowercase and deduplicate allowlist entries."""
+ return tuple(dict.fromkeys(h.lower() for h in value))There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/tools/git/test_git_url_validator.py`:
- Around line 601-647: Add a symmetric property test for IPv6 non-blocked
addresses: create a new test method named test_non_blocked_ipv6_never_flagged
under TestValidateCloneUrlHostProperties that uses Hypothesis with
st.ip_addresses(v=6).filter(lambda ip: not any(ip in net for net in
_ALL_BLOCKED_V6)), calls assert _is_blocked_ip(str(ip)) is False, and reduce
examples (e.g. `@settings`(max_examples=50)) to avoid heavy filtering; this
mirrors test_non_blocked_ipv4_never_flagged and references the existing
_ALL_BLOCKED_V6 and _is_blocked_ip symbols.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fea07877-6e0a-4225-9cf5-766c7d3cd09b
📒 Files selected for processing (2)
src/synthorg/tools/git_url_validator.pytests/unit/tools/git/test_git_url_validator.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python code. Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:(no parentheses) for exception syntax on Python 3.14. Do not useexcept (A, B):.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Docstrings are enforced by ruff D rules.
Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement.
Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves.
Do not mix static config fields with mutable runtime fields in one Pydantic model.
Use Pydantic v2 conventions:@computed_fieldfor derived values,NotBlankStrfor identifier/name fields (including optional and tuple variants), andmodel_validator/ConfigDict.
Useasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code. Prefer structured concurrency over barecreate_task.
Keep function bodies under 50 lines and files under 800 lines.
Handle errors explicitly. Never silently swallow exceptions.
Validate user input, external API responses, and config files at system boundaries.
Useruff checkandruff formatfor Python linting and formatting (88-character line length).
Usemypywith strict mode for type-checking all Python code.
Python version must be 3.14 or higher. PEP 649 provides native lazy annotations.
Files:
src/synthorg/tools/git_url_validator.pytests/unit/tools/git/test_git_url_validator.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logger withfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never useimport logging,logging.getLogger(), orprint()in application code.
Always useloggeras the variable name for loggers. Never use_loggerorlog.
Use event name constants from domain-specific modules undersynthorg.observability.eventsfor all logging calls. Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured logging withlogger.info(EVENT, key=value)syntax. Never use format string logging likelogger.info('msg %s', val).
All error paths must log at WARNING or ERROR level with context before raising.
All state transitions must log at INFO level.
Use DEBUG level for object creation, internal flow, and entry/exit of key functions.
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Usetest-provider,test-small-001, etc. in tests.
Library API reference is auto-generated from docstrings via mkdocstrings + Griffe indocs/api/. Use Google-style docstrings for public APIs.
Use Pydantic BaseModel for all data models. Frozen models for config/identity. Mutable-via-copy models for runtime state.
Files:
src/synthorg/tools/git_url_validator.py
src/synthorg/tools/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Tool registry supports built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge, role-based access, approval tool (request_human_approval).
Files:
src/synthorg/tools/git_url_validator.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto mark test cases.
Prefer@pytest.mark.parametrizefor testing similar cases with multiple inputs.
Use Hypothesis for property-based testing in Python with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var (dev: 1000 examples, ci: 200 examples).
Files:
tests/unit/tools/git/test_git_url_validator.py
🧠 Learnings (6)
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to src/**/*.py : Use Pydantic BaseModel for all data models. Frozen models for config/identity. Mutable-via-copy models for runtime state.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to **/*.py : Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, use `copy.deepcopy()` at construction and wrap with `MappingProxyType` for read-only enforcement.
Applied to files:
src/synthorg/tools/git_url_validator.py
📚 Learning: 2026-03-16T20:59:42.650Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:59:42.650Z
Learning: Applies to **/*.py : Use Pydantic v2 conventions: `computed_field` for derived values, `NotBlankStr` for identifier/name fields (including optional and tuple variants), and `model_validator`/`ConfigDict`.
Applied to files:
src/synthorg/tools/git_url_validator.py
🧬 Code graph analysis (1)
tests/unit/tools/git/test_git_url_validator.py (1)
src/synthorg/tools/git_url_validator.py (5)
GitCloneNetworkPolicy(82-127)_extract_hostname(162-202)_is_blocked_ip(133-159)is_allowed_clone_scheme(205-233)validate_clone_url_host(318-376)
🔇 Additional comments (14)
tests/unit/tools/git/test_git_url_validator.py (6)
1-21: LGTM!Imports are well-organized, and the module-level timeout marker is appropriate for async DNS tests. Importing
_BLOCKED_NETWORKSfrom the production module ensures test data stays in sync with production blocklists.
27-91: LGTM!Comprehensive hostname extraction tests covering standard URLs, SCP-like syntax, IPv6 literals, and edge cases. Parametrization with descriptive IDs aids readability.
97-217: LGTM!Excellent coverage of blocked IP ranges including loopback, private, CGNAT, link-local, multicast, reserved IPv4/IPv6 ranges, and the critical IPv6-mapped IPv4 bypass vectors. The fail-closed test for unparseable inputs verifies secure default behavior.
223-275: LGTM!Comprehensive scheme validation covering allowed remote protocols, SCP-like syntax with IPv6 literals, and blocked patterns including flag injection, ext:: protocol, file:// scheme, and insecure http.
281-348: LGTM!Thorough coverage of the
GitCloneNetworkPolicymodel including defaults, immutability enforcement, timeout bounds, inf/nan rejection, and allowlist normalization with deduplication. TheNotBlankStrvalidation tests ensure empty/whitespace entries are properly rejected.
354-598: LGTM!Comprehensive async validation tests covering:
- Public/private IP resolution
- DNS rebinding with mixed results (one private blocks all)
- Literal IP handling (IPv4, IPv6, IPv6-mapped)
- Allowlist bypass with case-insensitive matching
- Fail-closed behavior for all DNS error paths (timeout, NXDOMAIN, empty results, unexpected exceptions)
- SCP-like URL validation
- Master switch (
block_private_ips=False) behaviorThe mock pattern using
monkeypatch.setattronloop.getaddrinfois appropriate for async DNS testing.src/synthorg/tools/git_url_validator.py (8)
1-36: LGTM!The module docstring appropriately documents the TOCTOU limitation with a note on network-level egress controls for high-security deployments. Logger setup follows project conventions with
get_logger(__name__)and event constants from the observability module.
38-76: LGTM!Comprehensive IP blocklist covering all RFC-reserved ranges:
- IPv4: loopback, private, CGNAT, link-local, TEST-NETs, multicast, reserved, broadcast
- IPv6: loopback, unspecified, NAT64, discard, Teredo, documentation, 6to4, ULA, link-local, multicast
The IPv6 multicast block (
ff00::/8) is now included, addressing the prior review concern.
121-127: Consider usingfield_validatorfor cleaner normalization.The
object.__setattr__pattern inmodel_validator(mode="after")works but circumvents the frozen model's immutability semantics during construction. Afield_validatornormalizes the field before the model is frozen, avoiding the workaround.♻️ Proposed refactor
-from typing import Final, Self +from typing import Final ... -from pydantic import BaseModel, ConfigDict, Field, model_validator +from pydantic import BaseModel, ConfigDict, Field, field_validator ... - `@model_validator`(mode="after") - def _normalize_allowlist(self) -> Self: - """Lowercase and deduplicate allowlist entries.""" - normalized = tuple(dict.fromkeys(h.lower() for h in self.hostname_allowlist)) - if normalized != self.hostname_allowlist: - object.__setattr__(self, "hostname_allowlist", normalized) - return self + `@field_validator`("hostname_allowlist", mode="after") + `@classmethod` + def _normalize_allowlist( + cls, + value: tuple[NotBlankStr, ...], + ) -> tuple[NotBlankStr, ...]: + """Lowercase and deduplicate allowlist entries.""" + return tuple(dict.fromkeys(h.lower() for h in value))
133-159: LGTM!Correct fail-closed behavior for unparseable IPs with appropriate logging. The IPv6-mapped IPv4 unwrapping at line 156-157 prevents
::ffff:127.0.0.1bypass attacks.
162-202: LGTM!Robust hostname extraction supporting standard URLs, SCP-like syntax, and IPv6 literals with proper bracket handling. Returns
Nonefor malformed inputs enabling fail-closed validation upstream.
205-233: LGTM!Scheme validation correctly:
- Blocks flag injection (
-cfoo=bar)- Allows
https://andssh://schemes- Supports SCP-like syntax with bracketed IPv6 literals (
git@[2001:db8::1]:repo.git)- Rejects
ext::protocol by detecting unbracketed::The prior review concern about SCP-like IPv6 rejection is now addressed.
239-312: LGTM!DNS resolution is correctly implemented with:
- Timeout handling via
asyncio.wait_for- Fail-closed behavior for all error paths (timeout, NXDOMAIN, unexpected exceptions, empty results)
- All-must-pass validation preventing DNS rebinding with mixed public/private results
- Appropriate logging levels (WARNING for expected failures, ERROR with
exc_infofor unexpected)
318-376: LGTM!Main validator correctly implements the validation flow:
- Hostname extraction with credential redaction for safe logging
- Allowlist bypass (early return for internal Git servers)
- Master switch audit logging when
block_private_ips=False- Literal IP handling without DNS overhead
- DNS resolution fallback for hostnames
The credential redaction pattern at line 338 ensures sensitive tokens don't leak into logs.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #505 +/- ##
==========================================
+ Coverage 93.08% 93.11% +0.02%
==========================================
Files 502 503 +1
Lines 24202 24312 +110
Branches 2313 2330 +17
==========================================
+ Hits 22529 22637 +108
- Misses 1327 1328 +1
- Partials 346 347 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/tools/git/test_git_url_validator.py`:
- Around line 354-366: The helper functions _dns_result and _dns_result_v6 use
numeric socket family/type/protocol magic numbers; replace these with the socket
module constants (e.g., socket.AF_INET, socket.SOCK_STREAM or socket.SOCK_DGRAM
as appropriate, and socket.IPPROTO_TCP/IPPROTO_UDP or the protocol used in
tests) so the returned tuples read [(socket.AF_INET, socket.SOCK_STREAM,
socket.IPPROTO_TCP, "", (addr, 0))] and [(socket.AF_INET6, socket.SOCK_STREAM,
socket.IPPROTO_TCP, "", (addr, 0, 0, 0))] to improve readability while
preserving behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3dfc12ac-97b6-4672-824c-f7340c60e882
📒 Files selected for processing (1)
tests/unit/tools/git/test_git_url_validator.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python code. Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:(no parentheses) for exception syntax on Python 3.14. Do not useexcept (A, B):.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Docstrings are enforced by ruff D rules.
Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement.
Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves.
Do not mix static config fields with mutable runtime fields in one Pydantic model.
Use Pydantic v2 conventions:@computed_fieldfor derived values,NotBlankStrfor identifier/name fields (including optional and tuple variants), andmodel_validator/ConfigDict.
Useasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code. Prefer structured concurrency over barecreate_task.
Keep function bodies under 50 lines and files under 800 lines.
Handle errors explicitly. Never silently swallow exceptions.
Validate user input, external API responses, and config files at system boundaries.
Useruff checkandruff formatfor Python linting and formatting (88-character line length).
Usemypywith strict mode for type-checking all Python code.
Python version must be 3.14 or higher. PEP 649 provides native lazy annotations.
Files:
tests/unit/tools/git/test_git_url_validator.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto mark test cases.
Prefer@pytest.mark.parametrizefor testing similar cases with multiple inputs.
Use Hypothesis for property-based testing in Python with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var (dev: 1000 examples, ci: 200 examples).
Files:
tests/unit/tools/git/test_git_url_validator.py
🧬 Code graph analysis (1)
tests/unit/tools/git/test_git_url_validator.py (1)
src/synthorg/tools/git_url_validator.py (5)
GitCloneNetworkPolicy(82-127)_extract_hostname(162-202)_is_blocked_ip(133-159)is_allowed_clone_scheme(205-233)validate_clone_url_host(318-376)
🔇 Additional comments (8)
tests/unit/tools/git/test_git_url_validator.py (8)
1-22: LGTM!Imports are well-organized. The module-level timeout marker and importing
_BLOCKED_NETWORKSfrom production to derive test data (preventing drift) are good practices.
27-92: LGTM!Comprehensive parametrized tests for hostname extraction covering standard URLs, SCP-like syntax, and edge cases with descriptive test IDs.
97-218: LGTM!Thorough coverage of blocked IP ranges including loopback, private, CGNAT, link-local, multicast, and reserved ranges for both IPv4 and IPv6. The IPv6-mapped IPv4 unwrapping and fail-closed behavior for unparseable IPs are properly tested.
223-276: LGTM!Good coverage of scheme validation including security-sensitive patterns like flag injection (
-cfoo=bar) andext::protocol blocking.
281-349: LGTM!Comprehensive tests for the
GitCloneNetworkPolicymodel including defaults, bounds validation, immutability,inf/nanrejection, and allowlist normalization/deduplication behavior.
372-599: Comprehensive async test coverage for SSRF prevention.Excellent coverage of:
- DNS rebinding attacks with mocked resolution
- Literal IP handling (IPv4, IPv6, IPv6-mapped IPv4)
- AWS metadata endpoint (
169.254.169.254)- Fail-closed behavior for all DNS error paths (timeout, NXDOMAIN, empty results, unexpected exceptions)
- Mixed DNS results requiring all resolved IPs to be public
- Allowlist bypass and case-insensitive matching
601-670: LGTM!Property-based tests correctly derive blocked ranges from the production
_BLOCKED_NETWORKSconstant, preventing drift. The IPv6 non-blocked test properly excludes IPv6-mapped IPv4 addresses that map to blocked ranges, with a reduced example count (max_examples=50) to account for filter overhead.
368-371: The async test methods in this class are correctly implemented without explicit@pytest.mark.asynciodecorators. The project is properly configured withasyncio_mode = "auto"inpyproject.toml, which automatically detects and runs async test functions without requiring manual markers. This approach aligns with the project's coding guidelines.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/tools/git/test_git_url_validator.py`:
- Around line 9-10: Remove the hardcoded max_examples values from the Hypothesis
settings decorators in tests/unit/tools/git/test_git_url_validator.py so the
HYPOTHESIS_PROFILE environment variable controls example counts; locate the four
occurrences of `@settings`(max_examples=...) (used on the test functions
referenced around lines 628, 642, 652, 669), and either remove the max_examples
argument or replace the decorator with `@settings`() (or omit the decorator if not
needed) so Hypothesis profiles (dev/ci) determine the number of examples; keep
the existing hypothesis imports (given, settings, strategies as st) intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ed35e578-aedf-4740-bc29-f99619228c88
📒 Files selected for processing (1)
tests/unit/tools/git/test_git_url_validator.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python code. Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:(no parentheses) for exception syntax on Python 3.14. Do not useexcept (A, B):.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Docstrings are enforced by ruff D rules.
Create new objects for immutability. Never mutate existing objects. For non-Pydantic internal collections, usecopy.deepcopy()at construction and wrap withMappingProxyTypefor read-only enforcement.
Use frozen Pydantic models for config and identity. Use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves.
Do not mix static config fields with mutable runtime fields in one Pydantic model.
Use Pydantic v2 conventions:@computed_fieldfor derived values,NotBlankStrfor identifier/name fields (including optional and tuple variants), andmodel_validator/ConfigDict.
Useasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code. Prefer structured concurrency over barecreate_task.
Keep function bodies under 50 lines and files under 800 lines.
Handle errors explicitly. Never silently swallow exceptions.
Validate user input, external API responses, and config files at system boundaries.
Useruff checkandruff formatfor Python linting and formatting (88-character line length).
Usemypywith strict mode for type-checking all Python code.
Python version must be 3.14 or higher. PEP 649 provides native lazy annotations.
Files:
tests/unit/tools/git/test_git_url_validator.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto mark test cases.
Prefer@pytest.mark.parametrizefor testing similar cases with multiple inputs.
Use Hypothesis for property-based testing in Python with@given+@settings. Control viaHYPOTHESIS_PROFILEenv var (dev: 1000 examples, ci: 200 examples).
Files:
tests/unit/tools/git/test_git_url_validator.py
🧬 Code graph analysis (1)
tests/unit/tools/git/test_git_url_validator.py (1)
src/synthorg/tools/git_url_validator.py (5)
GitCloneNetworkPolicy(82-127)_extract_hostname(162-202)_is_blocked_ip(133-159)is_allowed_clone_scheme(205-233)validate_clone_url_host(318-376)
🔇 Additional comments (1)
tests/unit/tools/git/test_git_url_validator.py (1)
610-616: Good drift-proof test data setup.Deriving
_ALL_BLOCKED_V4/_ALL_BLOCKED_V6from production_BLOCKED_NETWORKSkeeps property tests aligned with runtime policy.
| from hypothesis import given, settings | ||
| from hypothesis import strategies as st |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify this file no longer hardcodes per-test max_examples.
rg -n --type=py '@settings\(max_examples=' tests/unit/tools/git/test_git_url_validator.pyRepository: Aureliolo/synthorg
Length of output: 204
Remove hardcoded max_examples and rely on HYPOTHESIS_PROFILE environment variable.
Lines 628, 642, 652, and 669 use hardcoded @settings(max_examples=...) decorators, which bypass the HYPOTHESIS_PROFILE environment variable. Per coding guidelines, Hypothesis profile control should determine example counts (dev: 1000 examples, ci: 200 examples).
♻️ Proposed change
-from hypothesis import given, settings
+from hypothesis import given
from hypothesis import strategies as st
@@
- `@settings`(max_examples=200)
def test_blocked_ipv4_always_detected(self, ip: ipaddress.IPv4Address) -> None:
@@
- `@settings`(max_examples=200)
def test_blocked_ipv6_always_detected(self, ip: ipaddress.IPv6Address) -> None:
@@
- `@settings`(max_examples=200)
def test_non_blocked_ipv4_never_flagged(self, ip: ipaddress.IPv4Address) -> None:
@@
- `@settings`(max_examples=50)
def test_non_blocked_ipv6_never_flagged(self, ip: ipaddress.IPv6Address) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/tools/git/test_git_url_validator.py` around lines 9 - 10, Remove
the hardcoded max_examples values from the Hypothesis settings decorators in
tests/unit/tools/git/test_git_url_validator.py so the HYPOTHESIS_PROFILE
environment variable controls example counts; locate the four occurrences of
`@settings`(max_examples=...) (used on the test functions referenced around lines
628, 642, 652, 669), and either remove the max_examples argument or replace the
decorator with `@settings`() (or omit the decorator if not needed) so Hypothesis
profiles (dev/ci) determine the number of examples; keep the existing hypothesis
imports (given, settings, strategies as st) intact.
Block git clone to private/loopback/link-local IPs with async DNS resolution to prevent SSRF via agent-controlled URLs. Adds GitCloneNetworkPolicy config model with hostname allowlist for legitimate internal Git servers. All resolved IPs must be public; fails closed on DNS errors. - Extract git_url_validator.py from git_tools.py (scheme check, hostname extraction, IP blocklist, async DNS validation) - Add GIT_CLONE_SSRF_BLOCKED / GIT_CLONE_DNS_FAILED event constants - GitCloneTool accepts network_policy param, runs SSRF check after all cheap local validations (scheme, branch, directory) - Add git_clone field to RootConfig for YAML-level configuration - 76 new tests including Hypothesis property-based IP range coverage
Pre-reviewed by 11 agents, 19 findings addressed: - Expand _BLOCKED_NETWORKS with CGNAT, TEST-NETs, multicast, reserved, broadcast, NAT64, discard, and documentation ranges - Rename private API to public: is_allowed_clone_scheme, ALLOWED_CLONE_SCHEMES (cross-module contract) - Use NotBlankStr for hostname_allowlist entries, normalize to lowercase and deduplicate at construction via model_validator - Add logging on fail-closed paths: unparseable IP, hostname extraction failure, unexpected DNS exceptions (catch-all) - Document TOCTOU gap in module docstring and at call site - Extract _dns_failure helper to keep _resolve_and_check < 50 lines - Move clone tool tests to test_git_clone.py (file size < 800) - Add tests: IPv6 literal IPs, SCP literal IPs, bracket edge cases, inf/nan timeout rejection, allowlist normalization, scheme wiring, AF_INET6 DNS results, unexpected DNS exceptions - Update CLAUDE.md: package structure, fix phantom GIT_OPERATION_START, add new SSRF event constants to logging section
- Redact credentials from clone URLs before logging (git_tools, git_url_validator) to prevent leaking userinfo in structured logs - Add 6to4 (2002::/16), Teredo (2001::/32), IPv6 multicast (ff00::/8) to blocked networks — closes SSRF bypass via IPv6 tunneling ranges - Fix SCP-like scheme check to allow bracketed IPv6 literals (git@[2001:db8::1]:repo.git) while still blocking ext:: protocol - Log WARNING when block_private_ips=False is active (new GIT_CLONE_SSRF_DISABLED event) for audit trail on security bypasses - Log unexpected DNS exceptions at ERROR with exc_info traceback instead of WARNING through the generic _dns_failure helper - Add git_clone to RootConfig docstring Attributes section - Remove redundant parentheses in test_git_clone.py - Extend parametrized + Hypothesis property tests for new blocked ranges and SCP IPv6 scheme acceptance
…ts (#221) - Broaden _CREDENTIAL_RE to redact userinfo from any scheme:// URL (not just http/https) — covers ssh://token@host leakage - Derive _ALL_BLOCKED_V4 and _ALL_BLOCKED_V6 from production _BLOCKED_NETWORKS constant to prevent test/production drift
Add isinstance guard for IPv6Address.ipv4_mapped access to satisfy mypy union-attr check (Hypothesis ip_addresses returns IPv4 | IPv6).
…est helpers (#221) Use socket.AF_INET, socket.SOCK_STREAM, socket.IPPROTO_TCP instead of raw 2, 1, 6 in _dns_result and _dns_result_v6 test helpers.
de1a6ac to
6862a67
Compare
🤖 I have created a release *beep* *boop* --- ## [0.3.1](v0.3.0...v0.3.1) (2026-03-17) ### Features * **api:** RFC 9457 Phase 2 — ProblemDetail and content negotiation ([#496](#496)) ([30f7c49](30f7c49)) * **cli:** verify container image signatures and SLSA provenance on pull ([#492](#492)) ([bef272d](bef272d)), closes [#491](#491) * **engine:** implement context budget management in execution loops ([#520](#520)) ([181eb8a](181eb8a)), closes [#416](#416) * implement settings persistence layer (DB-backed config) ([#495](#495)) ([4bd99f7](4bd99f7)), closes [#450](#450) * **memory:** implement dual-mode archival in memory consolidation ([#524](#524)) ([4603c9e](4603c9e)), closes [#418](#418) * migrate config consumers to read through SettingsService ([#510](#510)) ([32f553d](32f553d)) * **settings:** implement settings change subscriptions for service hot-reload ([#526](#526)) ([53f908e](53f908e)), closes [#503](#503) * **settings:** register API config in SettingsService with 2-phase init ([#518](#518)) ([29f7481](29f7481)) * **tools:** add SSRF prevention for git clone URLs ([#505](#505)) ([492dd0d](492dd0d)) * **tools:** wire RootConfig.git_clone to GitCloneTool instantiation ([#519](#519)) ([b7d8172](b7d8172)) ### Bug Fixes * **api:** replace JWT query parameter with one-time ticket for WebSocket auth ([#493](#493)) ([22a25f6](22a25f6)), closes [#343](#343) ### Documentation * add uv cache lock contention handling to worktree skill ([#500](#500)) ([bd85a8d](bd85a8d)) * document RFC 9457 dual response formats in OpenAPI schema ([#506](#506)) ([8dd2524](8dd2524)) ### Maintenance * upgrade jsdom from 28 to 29 ([#499](#499)) ([1ea2249](1ea2249)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
GitCloneToolto prevent SSRF via agent-controlled clone URLsGitCloneNetworkPolicyfrozen Pydantic model with configurablehostname_allowlist(case-insensitive, normalized, deduplicated),block_private_ipsmaster switch, anddns_resolution_timeoutgit_url_validator.pyfromgit_tools.py— scheme check, hostname extraction, IP blocklist, DNS validation::ffff:127.0.0.1bypassgit_clonefield toRootConfigfor YAML-level configurationCloses #221
Review coverage
Pre-reviewed by 11 agents (code-reviewer, python-reviewer, security-reviewer, test-analyzer, silent-failure-hunter, async-concurrency-reviewer, conventions-enforcer, logging-audit, type-design-analyzer, issue-resolution-verifier, docs-consistency). 19 findings addressed in second commit.
Test plan
test_git_url_validator.pycovering all validator functionstest_git_clone.py(local clone, SSRF blocking, allowlist bypass, scheme wiring)🤖 Generated with Claude Code