fix(llm-client): harden URL validation against zone IDs, trailing dots, CGNAT#90
Conversation
…s, CGNAT Closes the llm-client URL-validation hardening item from the session-start audit. SSRF defense was solid for common cases but had gaps the audit called out: IPv6 zone IDs, hostname normalization, underrepresented IPv6 unspecified address. Hardening additions: 1. normalizeHostname() — single source of truth for hostname matching. Strips brackets, lowercases, drops trailing dot, drops IPv6 zone id. Both validateBaseUrl and validateBaseUrlWithDns route through it so bypasses like "localhost." and "[fe80::1%eth0]" can't slip past. 2. IPv6 unspecified address — both [::] and the long form [0:0:0:0:0:0:0:0] now block as private (equivalent to 0.0.0.0). 3. CGNAT range — 100.64.0.0/10 (RFC 6598) blocked. Provider-internal carrier NAT range, no legitimate LLM endpoint will live there. Tests: +10 (31 → 41 in url-validation, 78 total in llm-client). Coverage: - Trailing-dot bypass (localhost., 127.0.0.1.) - IPv6 zone id (rejected by URL parser as invalid — defense in depth) - Bracketed [fe80::1] without zone id - Uppercase LOCALHOST - IPv6 [::] and [0:0:0:0:0:0:0:0] - CGNAT 100.64.x.x and 100.127.x.x boundaries - 100.63.x.x and 100.128.x.x as non-private (boundary-correct) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Hardens @skytwin/llm-client SSRF URL validation by normalizing hostnames consistently and expanding private-range blocking to cover additional bypass vectors.
Changes:
- Centralizes hostname normalization (case, IPv6 brackets, trailing dot, IPv6 zone IDs) and reuses it across validation paths.
- Blocks IPv6 unspecified (
::) and long-form unspecified (0:0:0:0:0:0:0:0) as private/internal. - Blocks CGNAT range
100.64.0.0/10(RFC 6598) and adds regression tests for the new hardening cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/llm-client/src/url-validation.ts | Adds normalizeHostname() and expands isPrivateHost() to block IPv6 unspecified + CGNAT; applies normalization to DNS-aware validation. |
| packages/llm-client/src/tests/url-validation.test.ts | Adds test coverage for hostname normalization hardening and CGNAT boundaries. |
Comments suppressed due to low confidence (1)
packages/llm-client/src/url-validation.ts:59
- In
validateBaseUrlWithDns,normalizeHostname()strips a trailing dot before callingdns.lookup(). That changes DNS semantics:example.(absolute, no search domains) is not always equivalent toexample(may be subject to search domains / ndots behavior). This can make the DNS rebinding check validate a different name than the one the runtime request will actually resolve, potentially allowing ahost.that resolves to private IPs to slip through ifhostresolves publicly (or vice versa). Consider preserving the trailing dot for the DNS lookup (while still normalizing case / brackets / zone ID for matching) so the lookup reflects the original URL hostname exactly.
const hostname = normalizeHostname(new URL(baseUrl).hostname);
// Skip literal IPs and localhost — already validated above
if (hostname === 'localhost' || /^\d{1,3}(\.\d{1,3}){3}$/.test(hostname) || hostname.includes(':')) {
return;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export async function validateBaseUrlWithDns(baseUrl: string, provider: string): Promise<void> { | ||
| // Run all synchronous checks first | ||
| validateBaseUrl(baseUrl, provider); | ||
|
|
||
| const { lookup } = await import('node:dns/promises'); | ||
| const hostname = new URL(baseUrl).hostname.toLowerCase().replace(/^\[|\]$/g, ''); | ||
| const hostname = normalizeHostname(new URL(baseUrl).hostname); | ||
|
|
There was a problem hiding this comment.
validateBaseUrlWithDns() behavior is updated via normalizeHostname(), but there are no tests covering the DNS-path at all (no references in src/__tests__). Since this function is specifically meant to defend against DNS rebinding, it would be good to add a unit test that mocks node:dns/promises and asserts the lookup is performed on the intended hostname (e.g., preserving any trailing dot if supported) and that private-resolving names are rejected.
Captures the second half of the audit cleanup work that landed after #87: - #88 config/core test coverage (was 0/missing helpers) - #89 connectors gmail + calendar pure-logic tests (8 → 43) - #90 llm-client URL validation hardening (zone IDs, trailing dots, CGNAT) - #91 adapter-discovery factory shape check + manifest defaultConfig - #92 preference-archaeologist coverage (action keys, multi-group, expiry) - #93 worker SignalDeduper extraction + 11 tests Pure docs. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
SSRF defense in `@skytwin/llm-client` was solid for common cases (RFC 1918, link-local, cloud metadata, octal/hex encodings, IPv6 mapped IPv4). The session-start audit flagged three gaps; this PR closes them.
1. Centralized hostname normalization. New `normalizeHostname()` is the single source of truth for hostname matching. Strips brackets, lowercases, drops trailing dot, drops IPv6 zone id. Both `validateBaseUrl` and `validateBaseUrlWithDns` route through it so bypasses like `localhost.` (DNS root marker) and `[fe80::1%eth0]` (link-local with zone) can't slip past.
2. IPv6 unspecified address. Both `[::]` and the long form `[0:0:0:0:0:0:0:0]` now block as private (equivalent to `0.0.0.0`).
3. CGNAT (RFC 6598). 100.64.0.0/10 — provider-internal carrier NAT range — now blocks. No legitimate LLM endpoint lives there. Boundary-correct: 100.63.x.x and 100.128.x.x still pass.
Test plan
🤖 Generated with Claude Code