This repository contains a list of Well Known Bots, including robots, crawlers,
validators, monitors, and spiders, in a single JSON file. Each bot is identified
and provided a RegExp pattern to match against an HTTP User-Agent header.
Additional metadata is available on each item.
Download the well-known-bots.json file directly.
It's impossible to create a system that can detect all bots. Well-behaving bots identify themselves in a consistent manner, usually via the User-Agent patterns this project provides. It is straightforward to identify these well-behaving bots, but misbehaving bots pretend to be real clients and use various mechanisms to evade detection.
For more details, see Non-Technical Notes in the browser-fingerprinting project.
To block a particular bot that is not on this list, you can use an Arcjet filter. See the Malicious traffic blueprint for how to block custom bots.
To add a new bot to the list, you need to edit the well-known-bots.json file and add a new entry. Follow these steps:
- Create a new bot entry with the required fields (see structure below)
- Add User-Agent pattern(s) that identify the bot
- Add verification method(s) if the bot provider supports verification
- Add example instances to validate your patterns work correctly
- Run validation to ensure your entry is correct:
node validate.js --check - Submit a pull request with your changes
Each entry in the JSON represents a specific bot or crawler and includes the following fields:
id(string): A unique identifier for the bot in kebab-case (e.g.,"google-crawler")categories(array): One or more categories the bot belongs to (see available categories)pattern(object): Regular expression patterns to match the bot's User-Agentaccepted(array): Regex patterns that must match for bot identificationforbidden(array): Regex patterns that, if matched, disqualify the User-Agent
verification(array): Methods for verifying the bot's authenticity (can be empty[]if not supported)
url(string): Documentation URL for the botinstances(object): Example User-Agent strings for testingaccepted(array): User-Agent strings that should match the patternrejected(array): User-Agent strings that should not match
aliases(array): Alternative identifiers for the bot used in other data sourcesaddition_date(string): Date the bot was added in YYYY/MM/DD format
academic, advertising, ai, amazon, apple, archive, feedfetcher, google,
meta, microsoft, monitor, optimizer, preview, programmatic, search-engine,
slack, social, tool, unknown, vercel, webhook, yahoo
Bot verification allows you to confirm that a request claiming to be from a specific bot actually originates from that bot's infrastructure. Three verification methods are supported: DNS, CIDR, and IP.
DNS verification uses reverse DNS lookups to verify a bot's identity. The bot's IP address is resolved to a hostname, which is then checked against known patterns.
When to use: When the bot provider publishes DNS patterns for their crawlers (e.g., Google, Bing).
Example:
{
"type": "dns",
"masks": [
"crawl-***-***-***-***.googlebot.com",
"geo-crawl-***-***-***-***.geo.googlebot.com"
]
}Mask pattern syntax:
*- Matches zero or one occurrence of any character@- Matches any number of characters (wildcard)- All other characters require an exact match
Full example in context:
{
"id": "example-search-bot",
"categories": ["search-engine"],
"pattern": {
"accepted": ["ExampleBot\\/"],
"forbidden": []
},
"verification": [
{
"type": "dns",
"masks": [
"crawler-***.example.com"
]
}
]
}CIDR verification checks if the request originates from IP address ranges (CIDR blocks) published by the bot provider.
When to use: When the bot provider publishes IP ranges in CIDR notation (e.g., Google, Stripe).
Supported source types:
http-json- JSON file with CIDR ranges (or a mix of individual IPs and CIDR ranges)http-csv- CSV file with CIDR ranges in the first column (or a mix of individual IPs and CIDR ranges)http-text- Plain text file with one CIDR range per line (or a mix of individual IPs and CIDR ranges)
Example with JSON source:
{
"type": "cidr",
"sources": [
{
"type": "http-json",
"url": "https://developers.google.com/static/search/apis/ipranges/googlebot.json",
"selector": "$.prefixes[*][\"ipv6Prefix\",\"ipv4Prefix\"]"
}
]
}Example with CSV source:
{
"type": "cidr",
"sources": [
{
"type": "http-csv",
"url": "https://example.com/ip-ranges.csv"
}
]
}JSONPath selector examples:
$.prefixes[*].cidr- Array of objects with acidrfield$[*]- Simple array of CIDR strings$.ranges[*]- Nested array underrangeskey
IP verification checks if the request originates from specific IP addresses. This method supports both static IP lists and remote sources.
When to use:
- Static IPs: When the bot uses a small, fixed set of IP addresses
- Remote sources: When the bot provider publishes a dynamic list of IPs
For a small, fixed list of IP addresses:
{
"type": "ip",
"ips": [
"35.204.201.174",
"34.125.202.46"
]
}Full example in context:
{
"id": "small-monitoring-bot",
"categories": ["monitor"],
"pattern": {
"accepted": ["MonitorBot"],
"forbidden": []
},
"verification": [
{
"type": "ip",
"ips": [
"1.2.3.4",
"5.6.7.8"
]
}
]
}For dynamic or large lists, use remote sources:
Supported source types:
http-json- JSON file with IP addresses (or a mix of individual IPs and CIDR ranges)http-text- Plain text file with one IP per line (or a mix of individual IPs and CIDR ranges)
Example with JSON source:
{
"type": "ip",
"sources": [
{
"type": "http-json",
"url": "https://stripe.com/files/ips/ips_webhooks.json",
"selector": "$.WEBHOOKS[*]"
}
]
}Example with text source:
{
"type": "ip",
"sources": [
{
"type": "http-text",
"url": "https://my.pingdom.com/probes/ipv4"
}
]
}JSONPath selector examples:
$[*]- Simple array of IP strings$[*].ip- Array of objects with anipfield$.WEBHOOKS[*]- Array of IPs underWEBHOOKSkey$.*[*]- Object with arrays of IPs as values
You can specify multiple verification methods for a single bot. All methods should be valid for verifying the bot's identity:
{
"id": "google-crawler",
"verification": [
{
"type": "cidr",
"sources": [
{
"type": "http-json",
"url": "https://developers.google.com/static/search/apis/ipranges/googlebot.json",
"selector": "$.prefixes[*][\"ipv6Prefix\",\"ipv4Prefix\"]"
}
]
},
{
"type": "dns",
"masks": [
"crawl-***-***-***-***.googlebot.com"
]
}
]
}The project is a hard-fork of crawler-user-agents at commit
46831767324e10c69c9ac6e538c9847853a0feb9, which is distributed under the MIT
License.