MCP Go Colly Crawler

Overview

MCP Go Colly is a sophisticated web crawling framework that integrates the Model Context Protocol (MCP) with the powerful Colly web scraping library. This project aims to provide a flexible and extensible solution for extracting web content for large language model (LLM) applications.

Features

Concurrent web crawling with configurable depth and domain restrictions
MCP server integration for tool-based crawling
Graceful shutdown handling
Robust error handling and result formatting
Support for both single URL and batch URL crawling

Building from Source

Prerequisites

Go 1.21 or later
Make (for using Makefile commands)

Installation

Clone the repository:

git clone https://github.com/yourusername/mcp-go-colly.git
cd mcp-go-colly

Install dependencies:

make deps

Building

The project includes a Makefile with several useful commands:

# Build the binary (outputs to bin/mcp-go-colly)
make build

# Build for all platforms (Linux, Windows, macOS)
make build-all

# Run tests
make test

# Clean build artifacts
make clean

# Format code
make fmt

# Run linter
make lint

All binaries will be generated in the bin/ directory.

Then you need to add the following configuration to the claude_desktop_config.json file:

{
  "mcpServers": {
    "web-scraper": {
      "command": "<add path here>/mcp-go-colly/bin/mcp-go-colly"
    }
  }
}

Usage

As an MCP Tool

The crawler is implemented as an MCP tool that can be called with the following parameters:

{
    "urls": ["https://example.com"],  // Single URL or array of URLs
    "max_depth": 2                    // Optional: Maximum crawl depth (default: 2)
}

Example MCP Tool Call

result, err := crawlerTool.Call(ctx, mcp.CallToolRequest{
    Params: struct{ Arguments map[string]interface{} }{
        Arguments: map[string]interface{}{
            "urls": []string{"https://example.com"},
            "max_depth": 2,
        },
    },
})

Configuration Options

max_depth: Set maximum crawl depth (default: 2)
urls: Single URL string or array of URLs to crawl
Domain restrictions are automatically applied based on the provided URLs

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

MIT

Acknowledgments

Colly Web Scraping Framework
Mark3 Labs MCP Project

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
cmd		cmd
internal		internal
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
smithery.yaml		smithery.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MCP Go Colly Crawler

Overview

Features

Building from Source

Prerequisites

Installation

Building

Usage

As an MCP Tool

Example MCP Tool Call

Configuration Options

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MCP Go Colly Crawler

Overview

Features

Building from Source

Prerequisites

Installation

Building

Usage

As an MCP Tool

Example MCP Tool Call

Configuration Options

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages