A robust MCP (Model Context Protocol) server for web scraping operations, deployed on Smithery - the orchestration layer for AI agents. This extension converts any website into clean, structured markdown format with automatic ChromeDriver management.
This MCP server is part of Smithery's marketplace with 7953+ skills and extensions built by the community. Deploy instantly to integrate web scraping capabilities into your AI agents.
- π High Performance: Direct function integration with uv package manager for optimal speed
- π Zero Configuration: Automatic ChromeDriver management with version compatibility
- π Smart URL Processing: Auto-adds HTTPS protocol and validates URLs
- π Markdown Conversion: Converts web content to clean, structured markdown
- β‘ Async Operations: Non-blocking web scraping with proper async/await
- π‘οΈ Production Ready: Comprehensive error handling and graceful fallbacks
- π³ Smithery Optimized: Containerized deployment with security best practices
- Smithery Account - Sign up at smithery.ai
- Python 3.12+ (for local development)
- UV package manager
- Google Chrome (automatically managed in deployment)
- Visit Smithery Web Scraper MCP
- Click "Deploy Server" to add to your agent
- Configure with your preferred settings
- Start scraping websites instantly!
# Clone the repository
git clone https://github.com/rockerritesh/scraper-mcp-smithery.git
cd scraper-mcp-smithery
# Install dependencies with uv
uv sync
# Run the MCP development server
uv run mcp dev server.pyfrom scraper_doc import scrape_website
# Scrape a website
content = scrape_website("https://example.com")
print(content) # Returns markdown formatted content- β
Supported:
https://example.com,http://example.com - β
Auto-fixed:
example.comβhttps://example.com - β Invalid: Malformed URLs return descriptive error messages
Smithery Agent β MCP Protocol β search_web_tool β Chrome/Selenium β Markdown Output
- π― Zero Setup: Deploy instantly without infrastructure management
- π Monitoring: Built-in health checks and performance metrics
- π Agent Integration: Seamless connection to Smithery's AI orchestration
- π Scalability: Automatic scaling based on usage patterns
- β Old: Subprocess calls with performance overhead
- β New: Direct function imports with async execution
- π― Result: ~3x faster performance on Smithery platform
# Test the scraper directly
uv run python scraper_doc.py https://example.com
# Test with output directory
uv run python scraper_doc.py https://example.com ./output
# Run MCP development server
uv run mcp dev server.pyMCP_DEBUG=1 uv run mcp dev server.py- mcp[cli] - Model Context Protocol framework
- selenium - Web browser automation
- webdriver-manager - Automatic ChromeDriver management
- requests - HTTP client for image downloads
- python-dotenv - Environment variable management
- Deployment Timeout: Usually resolves automatically; check Smithery status
- Tool Not Found: Ensure proper MCP tool registration in server.py
- Memory Limits: Large pages may require optimization (handled automatically)
Automatically resolved by webdriver-manager, but for local development:
# Clear webdriver cache if needed
rm -rf ~/.wdm/
# Verify Chrome installation
google-chrome --version- π Scraping Speed: 2-5 seconds per page
- πΎ Memory Usage: ~50-100MB per operation
- β‘ Concurrent Support: Multiple async operations
- π Auto-scaling: Handled by Smithery platform
- π‘οΈ Sandboxed Execution: Chrome runs with security flags
- π€ Non-root User: Enhanced container security
- π URL Validation: Prevents malicious URL processing
- π Audit Logging: Smithery platform monitoring
Agent: "Can you scrape the latest news from example.com?"
Web Scraper MCP: *Scrapes and returns structured content*
Agent: "Here's the latest news in markdown format..."
Trigger β Smithery Agent β Web Scraper MCP β Content Analysis β Action
- Smithery Platform - Deploy and manage MCP servers
- Smithery Documentation - Platform guides and API reference
- MCP Specification - Protocol documentation
- Community Discord - Get help and share ideas
MIT License - see LICENSE file for details.
- Fork this repository
- Create a feature branch
- Test on Smithery platform
- Submit a pull request
- Share in Smithery community
π Deployed on Smithery | Built with FastMCP, Selenium, and UV | Part of 7953+ community extensions
This README provides clear setup instructions while highlighting the tool's async capabilities and Smithery integration. The structure follows best practices for developer tools documentation.
---