Skip to content

[Feature]: Expose limit parameter and document query operators in web_search tool #16696

@RoboRiley

Description

@RoboRiley

Problem or Use Case

I worked with my Hermes Agent to improve web_search tool some and wanted to share the changes we made. Thought this might make the current web_search more functional. For context, I am using local Firecrawl with SearXNG as the backend for Firecrawl.

Motivation
The current web_search tool only accepts a query string, hardcoding limit=5 and offering no documented way to use search query operators. This means:

  • The LLM cannot request more than 5 results without workarounds, which is limiting for research-heavy tasks.
  • Query operators like site:, filetype:pdf, intitle:, and -exclude work with most backends but are undocumented, so the LLM never discovers them.
  • The tool description is minimal ("Search the web for information on any topic"), giving the LLM no guidance on advanced usage.

Proposed Solution

Verified changes (tested on self-hosted Firecrawl + SearXNG backend)

All changes are backend-agnostic — they only expose existing functionality that Firecrawl, Tavily, Exa, and Parallel already support.

File: tools/web_tools.py

1. Update WEB_SEARCH_SCHEMA

# Before
WEB_SEARCH_SCHEMA = {
    "name": "web_search",
    "description": "Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query to look up on the web"
            }
        },
        "required": ["query"]
    }
}

After
WEB_SEARCH_SCHEMA = {
    "name": "web_search",
    "description": "Search the web for information. Returns results with titles, URLs, and descriptions. Use query operators for targeted filtering: site:domain (restrict to a domain), intitle:word (title must contain), allintitle:word (all words in title), filetype:ext (file type, e.g. filetype:pdf), inurl:word, allinurl:word, related:domain (find similar sites), -term (exclude), \"exact phrase\" (exact match). For large content extraction, use web_extract on returned URLs.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query. Supports operators: site:domain (restrict to domain), intitle:word, allintitle:word, filetype:ext (e.g. filetype:pdf), inurl:word, allinurl:word, related:domain (similar sites), -term (exclude results containing term), \"exact phrase\" (exact match). Example: 'site:arxiv.org LLM fine-tuning' or 'filetype:pdf machine learning survey'"
            },
            "limit": {
                "type": "integer",
                "description": "Maximum number of results to return (default: 10, max: 100). Use higher limits when you need more candidates for downstream extraction.",
                "default": 10
            }
        },
        "required": ["query"]
    }
}

2. Update function signature and docstring

# Before
def web_search_tool(query: str, limit: int = 5) -> str:

After
def web_search_tool(query: str, limit: int = 10) -> str:

3. Update the handler lambda

# Before
handler=lambda args, **kw: web_search_tool(args.get("query", ""), limit=5),

After
handler=lambda args, **kw: web_search_tool(args.get("query", ""), limit=args.get("limit", 10)),

A few notes:

  • Default limit 5 → 10: This doubles results for cloud-tier users (higher credit/token cost per search). If maintainers prefer cost conservatism, keeping limit=5 as default while still exposing the parameter is a reasonable alternative.
  • Query operators are backend-agnostic: Operators pass through the query string. Backends that support them (Firecrawl, Tavily) honor them; those that don't simply ignore them. No harm either way.
  • No backend-specific code changes: All four backends (Firecrawl, Tavily, Exa, Parallel) already support limit. This change only wires an existing parameter through to the tool schema and handler.

Alternatives Considered

No response

Feature Type

Performance / reliability

Scope

Small (single file, < 50 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetool/webWeb search and extractiontype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions