Skip to content

fix(llm-client): harden URL validation against zone IDs, trailing dots, CGNAT#90

Merged
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/url-validation-hardening
Apr 27, 2026
Merged

fix(llm-client): harden URL validation against zone IDs, trailing dots, CGNAT#90
jayzalowitz merged 1 commit into
mainfrom
jayzalowitz/url-validation-hardening

Conversation

@jayzalowitz

Copy link
Copy Markdown
Owner

Summary

SSRF defense in `@skytwin/llm-client` was solid for common cases (RFC 1918, link-local, cloud metadata, octal/hex encodings, IPv6 mapped IPv4). The session-start audit flagged three gaps; this PR closes them.

1. Centralized hostname normalization. New `normalizeHostname()` is the single source of truth for hostname matching. Strips brackets, lowercases, drops trailing dot, drops IPv6 zone id. Both `validateBaseUrl` and `validateBaseUrlWithDns` route through it so bypasses like `localhost.` (DNS root marker) and `[fe80::1%eth0]` (link-local with zone) can't slip past.

2. IPv6 unspecified address. Both `[::]` and the long form `[0:0:0:0:0:0:0:0]` now block as private (equivalent to `0.0.0.0`).

3. CGNAT (RFC 6598). 100.64.0.0/10 — provider-internal carrier NAT range — now blocks. No legitimate LLM endpoint lives there. Boundary-correct: 100.63.x.x and 100.128.x.x still pass.

Test plan

  • `pnpm --filter @skytwin/llm-client test` — 78/78 (was 68, +10 hardening tests)
  • `pnpm --filter @skytwin/llm-client lint` — clean
  • `pnpm test` — 38/38 turbo tasks
  • `pnpm lint` — 30/30 turbo tasks

🤖 Generated with Claude Code

…s, CGNAT

Closes the llm-client URL-validation hardening item from the
session-start audit. SSRF defense was solid for common cases but had
gaps the audit called out: IPv6 zone IDs, hostname normalization,
underrepresented IPv6 unspecified address.

Hardening additions:

1. normalizeHostname() — single source of truth for hostname matching.
   Strips brackets, lowercases, drops trailing dot, drops IPv6 zone id.
   Both validateBaseUrl and validateBaseUrlWithDns route through it so
   bypasses like "localhost." and "[fe80::1%eth0]" can't slip past.

2. IPv6 unspecified address — both [::] and the long form
   [0:0:0:0:0:0:0:0] now block as private (equivalent to 0.0.0.0).

3. CGNAT range — 100.64.0.0/10 (RFC 6598) blocked. Provider-internal
   carrier NAT range, no legitimate LLM endpoint will live there.

Tests: +10 (31 → 41 in url-validation, 78 total in llm-client).
Coverage:
- Trailing-dot bypass (localhost., 127.0.0.1.)
- IPv6 zone id (rejected by URL parser as invalid — defense in depth)
- Bracketed [fe80::1] without zone id
- Uppercase LOCALHOST
- IPv6 [::] and [0:0:0:0:0:0:0:0]
- CGNAT 100.64.x.x and 100.127.x.x boundaries
- 100.63.x.x and 100.128.x.x as non-private (boundary-correct)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 02:57

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens @skytwin/llm-client SSRF URL validation by normalizing hostnames consistently and expanding private-range blocking to cover additional bypass vectors.

Changes:

  • Centralizes hostname normalization (case, IPv6 brackets, trailing dot, IPv6 zone IDs) and reuses it across validation paths.
  • Blocks IPv6 unspecified (::) and long-form unspecified (0:0:0:0:0:0:0:0) as private/internal.
  • Blocks CGNAT range 100.64.0.0/10 (RFC 6598) and adds regression tests for the new hardening cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/llm-client/src/url-validation.ts Adds normalizeHostname() and expands isPrivateHost() to block IPv6 unspecified + CGNAT; applies normalization to DNS-aware validation.
packages/llm-client/src/tests/url-validation.test.ts Adds test coverage for hostname normalization hardening and CGNAT boundaries.
Comments suppressed due to low confidence (1)

packages/llm-client/src/url-validation.ts:59

  • In validateBaseUrlWithDns, normalizeHostname() strips a trailing dot before calling dns.lookup(). That changes DNS semantics: example. (absolute, no search domains) is not always equivalent to example (may be subject to search domains / ndots behavior). This can make the DNS rebinding check validate a different name than the one the runtime request will actually resolve, potentially allowing a host. that resolves to private IPs to slip through if host resolves publicly (or vice versa). Consider preserving the trailing dot for the DNS lookup (while still normalizing case / brackets / zone ID for matching) so the lookup reflects the original URL hostname exactly.
  const hostname = normalizeHostname(new URL(baseUrl).hostname);

  // Skip literal IPs and localhost — already validated above
  if (hostname === 'localhost' || /^\d{1,3}(\.\d{1,3}){3}$/.test(hostname) || hostname.includes(':')) {
    return;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 50 to 56
export async function validateBaseUrlWithDns(baseUrl: string, provider: string): Promise<void> {
// Run all synchronous checks first
validateBaseUrl(baseUrl, provider);

const { lookup } = await import('node:dns/promises');
const hostname = new URL(baseUrl).hostname.toLowerCase().replace(/^\[|\]$/g, '');
const hostname = normalizeHostname(new URL(baseUrl).hostname);

Copilot AI Apr 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateBaseUrlWithDns() behavior is updated via normalizeHostname(), but there are no tests covering the DNS-path at all (no references in src/__tests__). Since this function is specifically meant to defend against DNS rebinding, it would be good to add a unit test that mocks node:dns/promises and asserts the lookup is performed on the intended hostname (e.g., preserving any trailing dot if supported) and that private-resolving names are rejected.

Copilot uses AI. Check for mistakes.
@jayzalowitz jayzalowitz merged commit 3e16689 into main Apr 27, 2026
12 checks passed
@jayzalowitz jayzalowitz deleted the jayzalowitz/url-validation-hardening branch April 27, 2026 03:18
jayzalowitz added a commit that referenced this pull request Apr 27, 2026
Captures the second half of the audit cleanup work that landed after
#87:

- #88 config/core test coverage (was 0/missing helpers)
- #89 connectors gmail + calendar pure-logic tests (8 → 43)
- #90 llm-client URL validation hardening (zone IDs, trailing dots, CGNAT)
- #91 adapter-discovery factory shape check + manifest defaultConfig
- #92 preference-archaeologist coverage (action keys, multi-group, expiry)
- #93 worker SignalDeduper extraction + 11 tests

Pure docs.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants