Have you ever tried pasting a documentation link into ChatGPT, Claude, or Copilot — only to watch it fail miserably at finding the relevant function or example you needed? We’ve all been there. Even the smartest AI assistants crumble when faced with sprawling documentation sites full of hidden pages, nested links, and outdated structures. LinkForge fixes that once and for all; simply put any URL into our website and you can point any AI agent to a documentation site and instantly make it understandable, searchable, and truly useful.
- 🕷️ Web Crawling: Automatically crawls documentation websites
- 🤖 AI-Powered Conversion: Converts HTML to clean Markdown using Groq LLM
- 🔍 Semantic Search: Vector embeddings with ChromaDB for intelligent search
- ✨ Claude Enhancement: Enhances search results with Claude-generated examples and explanations
- ⚡ Async Processing: Non-blocking background processing for long-running tasks
- 🌐 HTTP Transport: FastMCP with streamable HTTP for easy integration
- 📦 Persistent Storage: ChromaDB for reliable vector storage
The server exposes 4 MCP tools:
process_documentation_url- Crawl and embed a documentation websiteget_processing_status- Check the status of processing jobsquery_documentation- Semantic search across embedded documentationlist_collections- View all available documentation collections
- Python 3.12+ <<<<<<< Updated upstream
- Claude API key
- Groq API key (get one at console.groq.com)
- Bright Data API key (get one at brightdata.com) =======
- Groq API key (required - get one at console.groq.com)
- Claude API key (required - get one at console.anthropic.com)
- Bright Data API key (optional - get one at brightdata.com)
Stashed changes
- Clone the repository
git clone <your-repo-url>
cd linkforge- Create and activate a virtual environment
# Using conda
conda create -n mcp-server python=3.12
conda activate mcp-server
# Or using venv
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up environment variables
# Copy the example file
cp env.example .env
# Edit .env and add your API keys:
<<<<<<< Updated upstream
# LinkForge — Documentation MCP Server
=======
# GROQ_API_KEY=your_groq_api_key_here
# CLAUDE_API_KEY=your_claude_api_key_here
# BRIGHTDATA_API_KEY=your_brightdata_api_key_here (optional)Stashed changes
Lightweight MCP server for crawling, processing and searching documentation. This README is intentionally concise — the repo contains a Python MCP/REST server and a small frontend under website/.
- REST API (processing/status): http://localhost:8000
- MCP endpoint (streamable HTTP): http://localhost:8001/mcp
- Local run: use the bundled
startup.bashto start the Python server and the frontend - Deploy: you can run a public MCP server by uploading
server.py(and required files) to any host/provider that runs Python
The repository includes a small Bash script startup.bash that starts the Python server and the frontend for local development.
Make it executable and run it:
chmod +x startup.bash
./startup.bashYou can deploy this project as a public MCP server by copying server.py and the required modules (full_doc_pipeline.py, vector_db.py, chroma_db/ if needed, and requirements.txt) to any provider that can run Python (VM, container, serverless framework, etc.).
<<<<<<< Updated upstream Minimal checklist for deployment:
- Ensure
requirements.txtis installed - Provide required environment variables (e.g.
GROQ_API_KEY,CLAUDE_API_KEYif used) - Expose ports 8000 and 8001 (or update
server.pyto use provider ports) - Start the Python process (e.g.,
python3 server.pyor via a process manager) =======
- Fork this repository
- Click the "Deploy to Render" button above
- Set your
GROQ_API_KEYenvironment variable in Render dashboard - Deploy!
Your server will be available at https://your-service-name.onrender.com/mcp
- Fork this repository
- Sign up/login to Render
- Create a new Web Service
- Connect your forked repository
- Render will automatically detect the
render.yamlconfiguration - Add environment variables in the Render dashboard:
GROQ_API_KEY: Your Groq API key (required)CLAUDE_API_KEY: Your Claude API key (required)BRIGHTDATA_API_KEY: Your Bright Data API key (optional)
- Deploy!
Add to your Cursor MCP configuration (~/.cursor/mcp.json or C:\Users\<username>\.cursor\mcp.json):
{
"mcpServers": {
"documentation": {
"url": "http://localhost:8000/mcp",
"transport": "http"
}
}
}For deployed version:
{
"mcpServers": {
"documentation": {
"url": "https://your-service-name.onrender.com/mcp",
"transport": "http"
}
}
}Connect your MCP server to Poke at poke.com/settings/connections.
To test the connection explicitly:
Tell the subagent to use the "documentation" integration's "process_documentation_url" tool
If you run into persistent issues (e.g., after renaming the connection), send clearhistory to Poke to delete all message history and start fresh.
- Process documentation:
Use process_documentation_url with:
- url: "https://docs.example.com"
- max_urls: 20
- crawler_workers: 50
- Check status:
Use get_processing_status with the job_id from step 1
- Query documentation (with Claude enhancement):
Use query_documentation with:
- query: "how to authenticate users"
- collection_name: "docs_example_com"
- max_results: 5
- enhance_with_claude: true (default)
This will return:
1. Claude-enhanced explanation with code examples
2. Original documentation chunks with similarity scores
- List available collections:
Use list_collections to see all processed documentation
Add more tools by decorating functions with @mcp.tool() in src/server.py:
@mcp.tool()
async def your_custom_tool(param: str) -> str:
"""Tool description for AI agents."""
# Your implementation
return resultEdit default values in src/server.py:
async def process_documentation_url(
url: str,
max_urls: int = 20, # Change default max URLs
crawler_workers: int = 50, # Change default workers
collection_name: str = None
) -> str:linkforge/
├── src/
│ └── server.py # FastMCP server implementation
├── full_doc_pipeline.py # Documentation processing pipeline
├── vector_db.py # ChromaDB vector database wrapper
├── requirements.txt # Python dependencies
├── render.yaml # Render deployment configuration
├── .env # Environment variables (local only)
└── README.md # This file
GROQ_API_KEY(required): Your Groq API key for HTML to Markdown conversionCLAUDE_API_KEY(required): Your Claude API key for enhanced documentation queriesBRIGHTDATA_API_KEY(optional): Your Bright Data API key for advanced web scrapingPYTHON_VERSION(optional): Python version for deployment (default: 3.12)
- Check that all dependencies are installed:
pip install -r requirements.txt - Verify your
GROQ_API_KEYandCLAUDE_API_KEYare set in.env - Ensure port 8000 is not already in use
- Verify the documentation URL is accessible
- Check Groq API key is valid and has credits
- Review server logs for specific error messages
- Delete
chroma_db/directory and restart to reset database - Ensure sufficient disk space for vector storage
- Verify the server URL includes
/mcppath - Check that transport is set to "http" or "streamable-http"
- Restart Cursor/Poke after updating
mcp.json
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- Open an issue on GitHub
- Check existing issues for solutions
- Review the FastMCP documentation
Stashed changes