「松子」Host Collision 101: Finding Hidden Assets Behind a Single IP

This post is a review of my notes on host collision (virtual host enumeration) – what it is, how it works, and why it still matters in nowadays.
It also doubles as a “design doc” for my tool HostCollision.

0x00 Motivation: When ports are open but the site is “missing”

Typical recon story:

You do IP/port scanning, find lots of 80/443/8080/8443.
You open them in a browser full of hope.
You get 403, 404, “Welcome to nginx”, Tomcat default page, random WAF splash screens…

Clearly, something is running there, but not necessarily the app you’re after.

In modern environments, this is normal:

Fronted by load balancers / reverse proxies / CDNs / WAFs.
Multiple virtual hosts (vhosts) on the same IP.
Internal or “hidden” apps routed only when the right Host header appears.

This is where host collision comes in:
we abuse how HTTP/1.1 routes requests by Host to discover additional sites behind a single IP.

0x01 Quick recap: Host header and virtual hosts

1.1 `Host` header in HTTP/1.1

In HTTP/1.1, Host is a mandatory header:

GET / HTTP/1.1
Host: example.com

The TCP connection (IP + port) says “which machine did I connect to”, the Host header says “which website on this machine do I want”.

1.2 How web servers use `Host`

Web servers (Nginx/Apache/etc.) commonly use name-based virtual hosts:

server {
    listen 80;
    server_name www.aaa.com;
    # ...
}

server {
    listen 80;
    server_name www.bbb.com;
    # ...
}

server {
    listen 80 default_server;
    server_name _;
    # default / fallback vhost
}

Routing logic is roughly:

Accept connection on IP:80.
Parse HTTP request → read Host: <something>.
Match server_name / vhost definition.
If no match → send traffic to a default vhost (often a boring page).

If admins deploy internal apps (e.g. intranet.example.com, admin.example.com) on the same front-end but don’t expose them via public DNS, they may still be reachable as long as the reverse proxy sees the right Host header.

That’s the attack surface host collision abuses.

0x02 So what exactly is “host collision”?

2.1 One-sentence definition

Host collision / virtual host fuzzing is sending HTTP requests to a fixed IP while fuzzing the Host header, in order to discover additional vhosts routed through the same front-end.

Concretely:

URL: http://<IP>/
Header: Host: <some-domain>

Instead of doing “DNS brute force” (ask DNS for foo.example.com, bar.example.com…), you:

Talk directly to the web server / reverse proxy by IP.
Change only the Host header over HTTP.
Observe which combinations produce meaningful responses.

You maintain two buckets:

IP bucket: ip.txt
Host bucket (domain/subdomain dictionary): host.txt

Process:

for each ip in ip_list:
  for each host in host_list:
    send HTTP request:
      URL  = http://ip/
      Host = host
    record status / length / body fingerprint / similarity

If 10.0.0.5 + Host: intranet.example.com suddenly returns a valid app while all other combos return error/default pages, then:

10.0.0.5 likely fronts the vhost intranet.example.com.
This vhost may be “internal-only” from a DNS perspective, but the HTTP gateway still routes to it.

0x03 Normal flow vs. host collision flow

3.1 Normal user flow

When a normal user visits https://app.example.com/:

Browser resolves app.example.com via DNS.
Gets IP, say 1.2.3.4.
Connects to 1.2.3.4:443, does TLS handshake (SNI=app.example.com).
Sends HTTP request with Host: app.example.com.
Load balancer / reverse proxy routes to the correct backend based on SNI / Host.

3.2 What host collision changes

Host collision decouples DNS from HTTP routing:

We no longer care what DNS says.
We only need an IP that accepts HTTP/HTTPS.
We send Host values that the operator did not intend to expose externally.

Example:

GET / HTTP/1.1
Host: admin.internal.example.com

sent directly to 203.0.113.10 (a public IP). If the front-end is misconfigured, it might route this to the internal admin app even though admin.internal.example.com doesn’t resolve in public DNS.

In other words:

DNS says “no such host”. HTTP routing says “sure, come in”.

That gap is exactly what we exploit.

0x04 Why this is a real security issue

In many real environments:

A single IP / load balancer fronts dozens or hundreds of apps.
Some apps are meant to be public; some are “internal” or “restricted”.
“Internal” is often implemented by:
- Only putting the hostname in internal DNS.
- Maybe firewalling some sources, but not always consistently.

If all of these apps are still routed based on Host alone, then:

Anyone who can reach the IP and guess the hostname can hit the app.
No public DNS record ≠ no exposure.
Certificate enumeration and DNS scraping may miss those hosts.
Host collision can reveal a massive number of extra targets in a single IP range. wya.pl

For a pentester, missing this means:

You see only one boring site behind an IP.
Meanwhile there might be tens or hundreds of APIs, admin panels, debug instances behind the same IP, all accessible with the right

0x05 A practical workflow: from raw IPs to usable hits

5.1 Collect candidate IPs

Typical sources:

Asset inventory (if you’re internal).
External: Shodan, FOFA, Censys, etc.
Your own masscan / nmap sweeps.

From these, keep IPs where:

80/443/8080/8443/etc. are open.
Direct IP access returns:
- Default pages (Welcome to nginx, Apache test page, etc.).
- WAF/403/404.
- Very generic responses.

These are strong candidates for “reverse proxies with multiple vhosts”.

5.2 Build IP and Host dictionaries

ip.txt – one IP per line (filtered candidates).
host.txt – hostnames to try, from:
- Subdomain enumeration (passive + brute-force).
- Wordlists (SecLists etc.).
- Historical data, internal naming conventions, leaked configs. thehacker.recipes+1

5.3 Run the host collision

Tool-agnostic logic:

For each (ip, host) pair:
- URL: http://ip/
- Header: Host: host
Collect:
- Status code
- Response size
- Response body (for hashing / similarity)
- Duration (optional, for debugging)

This is what tools like ffuf/gobuster/wfuzz do in vhost mode as well. thehacker.recipes+1

5.4 Similarity filtering: kill the noise

Raw results are noisy:

Default error pages
WAF block pages
Generic “site not configured” responses

Common trick:

For each IP, pick a baseline response (often the first valid-looking 2xx/3xx).
For every other (ip, host) response:
- Compute similarity score vs. baseline (e.g. shingle/Jaccard, fuzzy hash…).
- If similarity is too high, treat it as “same generic page”.
- If similarity is low, mark it as interesting.

This is the core idea behind tools like VhostFinder: virtual hosts with distinct content will diverge from the baseline.

5.5 Triage and follow-up

What you end up with after filtering:

A relatively small set of (ip, host) combos:
- 2xx/3xx responses
- Content significantly different from baseline
Titles that look like login pages, admin consoles, dashboards, APIs, etc.

Next steps:

Add ip host mappings to /etc/hosts (for convenience).
Browse these hosts normally.
Combine with directory brute forcing, tech fingerprinting, and standard web testing.

0x06 Host collision vs. DNS brute-forcing

They’re related, but not the same thing:

DNS brute force

Ask DNS for foo.example.com, bar.example.com, …
If there’s a record, you get an IP.
No record? DNS says “NXDOMAIN”.

Host collision / vhost fuzzing

You already have an IP.
You send HTTP requests to that IP with different Host headers.
You observe differences in HTTP responses.

Key difference:

DNS brute forcing enumerates published names (what DNS wants you to know).
Host collision enumerates routable names (what the HTTP stack will actually route).

Sometimes there’s a perfect overlap. In interesting cases, there isn’t – which is exactly why host collision is valuable.

0x07 Defensive notes: how not to get “collided”

From the blue-team perspective, host collision points to two underlying issues:

Over-trusting Host without proper scoping.
Letting internal vhosts ride on public-facing front-ends.

Some practical mitigations:

Separate public and internal vhosts
- Don’t put admin/dev/internal vhosts on the same public IP / listener as external apps.
- At least restrict them at the network level (VPN-only, office IPs, etc.).
Tight default / fallback behavior
- For unknown Host, return a minimal error (or drop).
- Don’t route to real apps as fallback.
- Avoid verbose default pages leaking server info.
Host header whitelisting
- Front-end/WAF only allows known, intended hostnames.
- Everything else → fixed error / drop.
Regular self-scanning
- From the Internet, run your own vhost fuzzing against your IP ranges.
- Compare caught vhosts vs. intended DNS records.
- If something is routable but not in DNS, decide if it really should be reachable.

0x08 Summary

Host collision is one of those techniques that:

Is conceptually simple;
Leverages a very old piece of the web stack (HTTP/1.1 Host);
Still reveals surprising amounts of attack surface in modern, virtual-host-heavy environments.

The core ideas to remember:

IP decides who you talk to; Host decides who you *ask* for.
DNS is one way to map names to IPs, but HTTP routing doesn’t depend on public DNS being “truthful”.
A single IP / load balancer can hide hundreds of apps; if you don’t check vhosts, you might miss most of your scope.

Whether you use my HostCollision or any other tool, having vhost enumeration in your recon playbook is absolutely worth it – both for offense (finding hidden assets) and for defense (discovering accidental exposures before someone else does).

0x00 Motivation: When ports are open but the site is “missing”#

0x01 Quick recap: Host header and virtual hosts#

1.1 Host header in HTTP/1.1#

1.2 How web servers use Host#

0x02 So what exactly is “host collision”?#

2.1 One-sentence definition#

0x03 Normal flow vs. host collision flow#

3.1 Normal user flow#

3.2 What host collision changes#

0x04 Why this is a real security issue#

0x05 A practical workflow: from raw IPs to usable hits#

5.1 Collect candidate IPs#

5.2 Build IP and Host dictionaries#

5.3 Run the host collision#

5.4 Similarity filtering: kill the noise#

5.5 Triage and follow-up#

0x06 Host collision vs. DNS brute-forcing#

DNS brute force#

Host collision / vhost fuzzing#

0x07 Defensive notes: how not to get “collided”#

0x08 Summary#