Well Known Bots

This repository contains a list of Well Known Bots, including robots, crawlers, validators, monitors, and spiders, in a single JSON file. Each bot is identified and provided a RegExp pattern to match against an HTTP User-Agent header. Additional metadata is available on each item.

Install

Direct download

Download the well-known-bots.json file directly.

Realities

It's impossible to create a system that can detect all bots. Well-behaving bots identify themselves in a consistent manner, usually via the User-Agent patterns this project provides. It is straightforward to identify these well-behaving bots, but misbehaving bots pretend to be real clients and use various mechanisms to evade detection.

For more details, see Non-Technical Notes in the browser-fingerprinting project.

Custom bots

To block a particular bot that is not on this list, you can use an Arcjet filter. See the Malicious traffic blueprint for how to block custom bots.

Adding a New Bot

To add a new bot to the list, you need to edit the well-known-bots.json file and add a new entry. Follow these steps:

Create a new bot entry with the required fields (see structure below)
Add User-Agent pattern(s) that identify the bot
Add verification method(s) if the bot provider supports verification
Add example instances to validate your patterns work correctly
Run validation to ensure your entry is correct: node validate.js --check
Submit a pull request with your changes

Bot Entry Structure

Each entry in the JSON represents a specific bot or crawler and includes the following fields:

Required Fields

id (string): A unique identifier for the bot in kebab-case (e.g., "google-crawler")
categories (array): One or more categories the bot belongs to (see available categories)
pattern (object): Regular expression patterns to match the bot's User-Agent
- accepted (array): Regex patterns that must match for bot identification
- forbidden (array): Regex patterns that, if matched, disqualify the User-Agent
verification (array): Methods for verifying the bot's authenticity (can be empty [] if not supported)

Optional Fields

url (string): Documentation URL for the bot
instances (object): Example User-Agent strings for testing
- accepted (array): User-Agent strings that should match the pattern
- rejected (array): User-Agent strings that should not match
aliases (array): Alternative identifiers for the bot used in other data sources
addition_date (string): Date the bot was added in YYYY/MM/DD format

Available Categories

academic, advertising, ai, amazon, apple, archive, feedfetcher, google,
meta, microsoft, monitor, optimizer, preview, programmatic, search-engine,
slack, social, tool, unknown, vercel, webhook, yahoo

Verification Methods

Bot verification allows you to confirm that a request claiming to be from a specific bot actually originates from that bot's infrastructure. Three verification methods are supported: DNS, CIDR, and IP.

DNS Verification

DNS verification uses reverse DNS lookups to verify a bot's identity. The bot's IP address is resolved to a hostname, which is then checked against known patterns.

When to use: When the bot provider publishes DNS patterns for their crawlers (e.g., Google, Bing).

Example:

{
  "type": "dns",
  "masks": [
    "crawl-***-***-***-***.googlebot.com",
    "geo-crawl-***-***-***-***.geo.googlebot.com"
  ]
}

Mask pattern syntax:

* - Matches zero or one occurrence of any character
@ - Matches any number of characters (wildcard)
All other characters require an exact match

Full example in context:

{
  "id": "example-search-bot",
  "categories": ["search-engine"],
  "pattern": {
    "accepted": ["ExampleBot\\/"],
    "forbidden": []
  },
  "verification": [
    {
      "type": "dns",
      "masks": [
        "crawler-***.example.com"
      ]
    }
  ]
}

CIDR Verification

CIDR verification checks if the request originates from IP address ranges (CIDR blocks) published by the bot provider.

When to use: When the bot provider publishes IP ranges in CIDR notation (e.g., Google, Stripe).

Supported source types:

http-json - JSON file with CIDR ranges (or a mix of individual IPs and CIDR ranges)
http-csv - CSV file with CIDR ranges in the first column (or a mix of individual IPs and CIDR ranges)
http-text - Plain text file with one CIDR range per line (or a mix of individual IPs and CIDR ranges)

Example with JSON source:

{
  "type": "cidr",
  "sources": [
    {
      "type": "http-json",
      "url": "https://developers.google.com/static/search/apis/ipranges/googlebot.json",
      "selector": "$.prefixes[*][\"ipv6Prefix\",\"ipv4Prefix\"]"
    }
  ]
}

Example with CSV source:

{
  "type": "cidr",
  "sources": [
    {
      "type": "http-csv",
      "url": "https://example.com/ip-ranges.csv"
    }
  ]
}

JSONPath selector examples:

$.prefixes[*].cidr - Array of objects with a cidr field
$[*] - Simple array of CIDR strings
$.ranges[*] - Nested array under ranges key

IP Verification

IP verification checks if the request originates from specific IP addresses. This method supports both static IP lists and remote sources.

When to use:

Static IPs: When the bot uses a small, fixed set of IP addresses
Remote sources: When the bot provider publishes a dynamic list of IPs

Static IP Addresses

For a small, fixed list of IP addresses:

{
  "type": "ip",
  "ips": [
    "35.204.201.174",
    "34.125.202.46"
  ]
}

Full example in context:

{
  "id": "small-monitoring-bot",
  "categories": ["monitor"],
  "pattern": {
    "accepted": ["MonitorBot"],
    "forbidden": []
  },
  "verification": [
    {
      "type": "ip",
      "ips": [
        "1.2.3.4",
        "5.6.7.8"
      ]
    }
  ]
}

Remote IP Sources

For dynamic or large lists, use remote sources:

Supported source types:

http-json - JSON file with IP addresses (or a mix of individual IPs and CIDR ranges)
http-text - Plain text file with one IP per line (or a mix of individual IPs and CIDR ranges)

Example with JSON source:

{
  "type": "ip",
  "sources": [
    {
      "type": "http-json",
      "url": "https://stripe.com/files/ips/ips_webhooks.json",
      "selector": "$.WEBHOOKS[*]"
    }
  ]
}

Example with text source:

{
  "type": "ip",
  "sources": [
    {
      "type": "http-text",
      "url": "https://my.pingdom.com/probes/ipv4"
    }
  ]
}

JSONPath selector examples:

$[*] - Simple array of IP strings
$[*].ip - Array of objects with an ip field
$.WEBHOOKS[*] - Array of IPs under WEBHOOKS key
$.*[*] - Object with arrays of IPs as values

Multiple Verification Methods

You can specify multiple verification methods for a single bot. All methods should be valid for verifying the bot's identity:

{
  "id": "google-crawler",
  "verification": [
    {
      "type": "cidr",
      "sources": [
        {
          "type": "http-json",
          "url": "https://developers.google.com/static/search/apis/ipranges/googlebot.json",
          "selector": "$.prefixes[*][\"ipv6Prefix\",\"ipv4Prefix\"]"
        }
      ]
    },
    {
      "type": "dns",
      "masks": [
        "crawl-***-***-***-***.googlebot.com"
      ]
    }
  ]
}

License

The project is a hard-fork of crawler-user-agents at commit 46831767324e10c69c9ac6e538c9847853a0feb9, which is distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 448 Commits
.devcontainer		.devcontainer
.github		.github
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
validate.js		validate.js
well-known-bots.json		well-known-bots.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Well Known Bots

Install

Direct download

Realities

Custom bots

Adding a New Bot

Bot Entry Structure

Required Fields

Optional Fields

Available Categories

Verification Methods

DNS Verification

CIDR Verification

IP Verification

Static IP Addresses

Remote IP Sources

Multiple Verification Methods

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Well Known Bots

Install

Direct download

Realities

Custom bots

Adding a New Bot

Bot Entry Structure

Required Fields

Optional Fields

Available Categories

Verification Methods

DNS Verification

CIDR Verification

IP Verification

Static IP Addresses

Remote IP Sources

Multiple Verification Methods

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages