Lynkr - Claude Code Proxy with Multi-Provider Support

Production-ready Claude Code proxy supporting 9+ LLM providers with 60-80% cost reduction through token optimization.

Overview

Lynkr is a self-hosted proxy server that unlocks Claude Code CLI and Cursor IDE by enabling:

🚀 Any LLM Provider - Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Ollama (local), llama.cpp, Azure OpenAI, Azure Anthropic, OpenAI, LM Studio
💰 60-80% Cost Reduction - Built-in token optimization with smart tool selection, prompt caching, and memory deduplication
🔒 100% Local/Private - Run completely offline with Ollama or llama.cpp
🎯 Zero Code Changes - Drop-in replacement for Anthropic's backend
🏢 Enterprise-Ready - Circuit breakers, load shedding, Prometheus metrics, health checks

Perfect for:

Developers who want provider flexibility and cost control
Enterprises needing self-hosted AI with observability
Privacy-focused teams requiring local model execution
Teams seeking 60-80% cost reduction through optimization

💰 Cost Savings

Lynkr reduces AI costs by 60-80% through intelligent token optimization:

Real-World Savings Example

Scenario: 100,000 API requests/month, 50k input tokens, 2k output tokens per request

Provider	Without Lynkr	With Lynkr	Monthly Savings	Annual Savings
Claude Sonnet 4.5 (Databricks)	$16,000	$6,400	$9,600	$115,200
GPT-4o (OpenRouter)	$12,000	$4,800	$7,200	$86,400
Ollama (Local)	API costs	$0	$12,000+	$144,000+

How We Achieve 60-80% Cost Reduction

6 Token Optimization Phases:

Smart Tool Selection (50-70% reduction)
- Filters tools based on request type
- Chat queries don't get file/git tools
- Only sends relevant tools to model
Prompt Caching (30-45% reduction)
- Caches repeated prompts and system messages
- Reuses context across conversations
- Reduces redundant token usage
Memory Deduplication (20-30% reduction)
- Removes duplicate conversation context
- Compresses historical messages
- Eliminates redundant information
Tool Response Truncation (15-25% reduction)
- Truncates long tool outputs intelligently
- Keeps only relevant portions
- Reduces tool result tokens
Dynamic System Prompts (10-20% reduction)
- Adapts prompts to request complexity
- Shorter prompts for simple queries
- Full prompts only when needed
Conversation Compression (15-25% reduction)
- Summarizes old conversation turns
- Keeps recent context detailed
- Archives historical context

📖 Detailed Token Optimization Guide

🚀 Key Features

Multi-Provider Support (9+ Providers)

✅ Cloud Providers: Databricks, AWS Bedrock (100+ models), OpenRouter (100+ models), Azure OpenAI, Azure Anthropic, OpenAI
✅ Local Providers: Ollama (free), llama.cpp (free), LM Studio (free)
✅ Hybrid Routing: Automatically route between local (fast/free) and cloud (powerful) based on complexity
✅ Automatic Fallback: Transparent failover if primary provider is unavailable

Cost Optimization

💰 60-80% Token Reduction - 6-phase optimization pipeline
💰 $77k-$115k Annual Savings - For typical enterprise usage (100k requests/month)
💰 100% FREE Option - Run completely locally with Ollama or llama.cpp
💰 Hybrid Routing - 65-100% cost savings by using local models for simple requests

Privacy & Security

🔒 100% Local Operation - Run completely offline with Ollama/llama.cpp
🔒 Air-Gapped Deployments - No internet required for local providers
🔒 Self-Hosted - Full control over your data and infrastructure
🔒 Local Embeddings - Private @Codebase search with Ollama/llama.cpp
🔐 Policy Enforcement - Git restrictions, test requirements, web fetch controls
🔐 Sandboxing - Optional Docker isolation for MCP tools

Enterprise Features

🏢 Production-Ready - Circuit breakers, load shedding, graceful shutdown
🏢 Observability - Prometheus metrics, structured logging, health checks
🏢 Kubernetes-Ready - Liveness, readiness, startup probes
🏢 High Performance - ~7μs overhead, 140K req/sec throughput
🏢 Reliability - Exponential backoff, automatic retries, error resilience
🏢 Scalability - Horizontal scaling, connection pooling, load balancing

IDE Integration

✅ Claude Code CLI - Drop-in replacement for Anthropic backend
✅ Cursor IDE - Full OpenAI API compatibility (Requires Cursor Pro)
✅ Continue.dev - Works with any OpenAI-compatible client
✅ Cline +VSCode - Confgiure it similar to cursor in openai compatible section

Advanced Capabilities

🧠 Long-Term Memory - Titans-inspired memory system with surprise-based filtering
🧠 Semantic Memory - FTS5 search with multi-signal retrieval (recency, importance, relevance)
🧠 Automatic Extraction - Zero-latency memory updates (<50ms retrieval, <100ms async extraction)
🔧 MCP Integration - Automatic Model Context Protocol server discovery
🔧 Tool Calling - Full tool support with server and client execution modes
🔧 Custom Tools - Easy integration of custom tool implementations
🔍 Embeddings Support - 4 options: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
📊 Token Tracking - Real-time usage monitoring and cost attribution

Developer Experience

🎯 Zero Code Changes - Works with existing Claude Code CLI/Cursor setups
🎯 Hot Reload - Development mode with auto-restart
🎯 Comprehensive Logging - Structured logs with request ID correlation
🎯 Easy Configuration - Environment variables or .env file
🎯 Docker Support - docker-compose with GPU support
🎯 400+ Tests - Comprehensive test coverage for reliability

Streaming & Performance

⚡ Real-Time Streaming - Token-by-token streaming for all providers
⚡ Low Latency - Minimal overhead (~7μs per request)
⚡ High Throughput - 140K requests/second capacity
⚡ Connection Pooling - Efficient connection reuse
⚡ Prompt Caching - LRU cache with SHA-256 keying

📖 Complete Feature Documentation

Quick Start

Installation

Option 1: NPM Package (Recommended)

# Install globally
npm install -g lynkr

# Or run directly with npx
npx lynkr

Option 2: Git Clone

# Clone repository
git clone https://github.com/vishalveerareddy123/Lynkr.git
cd Lynkr

# Install dependencies
npm install

# Create .env from example
cp .env.example .env

# Edit .env with your provider credentials
nano .env

# Start server
npm start

Option 3: Homebrew (macOS/Linux)

brew tap vishalveerareddy123/lynkr
brew install lynkr
lynkr start

Option 4: Docker

docker-compose up -d

Supported Providers

Lynkr supports 9+ LLM providers:

Provider	Type	Models	Cost	Privacy
AWS Bedrock	Cloud	100+ (Claude, Titan, Llama, Mistral, etc.)	$$-$$$	Cloud
Databricks	Cloud	Claude Sonnet 4.5, Opus 4.5	$$$	Cloud
OpenRouter	Cloud	100+ (GPT, Claude, Llama, Gemini, etc.)	$-$$	Cloud
Ollama	Local	Unlimited (free, offline)	FREE	🔒 100% Local
llama.cpp	Local	GGUF models	FREE	🔒 100% Local
Azure OpenAI	Cloud	GPT-4o, GPT-5, o1, o3	$$$	Cloud
Azure Anthropic	Cloud	Claude models	$$$	Cloud
OpenAI	Cloud	GPT-4o, o1, o3	$$$	Cloud
LM Studio	Local	Local models with GUI	FREE	🔒 100% Local

📖 Full Provider Configuration Guide

Claude Code Integration

Configure Claude Code CLI to use Lynkr:

# Set Lynkr as backend
export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=dummy

# Run Claude Code
claude "Your prompt here"

That's it! Claude Code now uses your configured provider.

📖 Detailed Claude Code Setup

Cursor Integration

Configure Cursor IDE to use Lynkr:

Open Cursor Settings
- Mac: Cmd+, | Windows/Linux: Ctrl+,
- Navigate to: Features → Models
Configure OpenAI API Settings
- API Key: sk-lynkr (any non-empty value)
- Base URL: http://localhost:8081/v1
- Model: claude-3.5-sonnet (or your provider's model)
Test It
- Chat: Cmd+L / Ctrl+L
- Inline edits: Cmd+K / Ctrl+K
- @Codebase search: Requires embeddings setup

📖 Full Cursor Setup Guide | Embeddings Configuration

Documentation

Getting Started

📦 Installation Guide - Detailed installation for all methods
⚙️ Provider Configuration - Complete setup for all 9+ providers
🎯 Quick Start Examples - Copy-paste configs

IDE Integration

🖥️ Claude Code CLI Setup - Connect Claude Code CLI
🎨 Cursor IDE Setup - Full Cursor integration with troubleshooting
🔍 Embeddings Guide - Enable @Codebase semantic search (4 options: Ollama, llama.cpp, OpenRouter, OpenAI)

Features & Capabilities

✨ Core Features - Architecture, request flow, format conversion
🧠 Memory System - Titans-inspired long-term memory
💰 Token Optimization - 60-80% cost reduction strategies
🔧 Tools & Execution - Tool calling, execution modes, custom tools

Deployment & Operations

🐳 Docker Deployment - docker-compose setup with GPU support
🏭 Production Hardening - Circuit breakers, load shedding, metrics
📊 API Reference - All endpoints and formats

Support

🔧 Troubleshooting - Common issues and solutions
❓ FAQ - Frequently asked questions
🧪 Testing Guide - Running tests and validation

External Resources

📚 DeepWiki Documentation - AI-powered documentation search
💬 GitHub Discussions - Community Q&A
🐛 Report Issues - Bug reports and feature requests
📦 NPM Package - Official npm package

Key Features Highlights

✅ Multi-Provider Support - 9+ providers including local (Ollama, llama.cpp) and cloud (Bedrock, Databricks, OpenRouter)
✅ 60-80% Cost Reduction - Token optimization with smart tool selection, prompt caching, memory deduplication
✅ 100% Local Option - Run completely offline with Ollama/llama.cpp (zero cloud dependencies)
✅ OpenAI Compatible - Works with Cursor IDE, Continue.dev, and any OpenAI-compatible client
✅ Embeddings Support - 4 options for @Codebase search: Ollama (local), llama.cpp (local), OpenRouter, OpenAI
✅ MCP Integration - Automatic Model Context Protocol server discovery and orchestration
✅ Enterprise Features - Circuit breakers, load shedding, Prometheus metrics, K8s health checks
✅ Streaming Support - Real-time token streaming for all providers
✅ Memory System - Titans-inspired long-term memory with surprise-based filtering
✅ Tool Calling - Full tool support with server and passthrough execution modes
✅ Production Ready - Battle-tested with 400+ tests, observability, and error resilience

Architecture

┌─────────────────┐
│ Claude Code CLI │  or  Cursor IDE
└────────┬────────┘
         │ Anthropic/OpenAI Format
         ↓
┌─────────────────┐
│  Lynkr Proxy    │
│  Port: 8081     │
│                 │
│ • Format Conv.  │
│ • Token Optim.  │
│ • Provider Route│
│ • Tool Calling  │
│ • Caching       │
└────────┬────────┘
         │
         ├──→ Databricks (Claude 4.5)
         ├──→ AWS Bedrock (100+ models)
         ├──→ OpenRouter (100+ models)
         ├──→ Ollama (local, free)
         ├──→ llama.cpp (local, free)
         ├──→ Azure OpenAI (GPT-4o, o1)
         ├──→ OpenAI (GPT-4o, o3)
         └──→ Azure Anthropic (Claude)

📖 Detailed Architecture

Quick Configuration Examples

100% Local (FREE)

export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:latest
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
npm start

AWS Bedrock (100+ models)

export MODEL_PROVIDER=bedrock
export AWS_BEDROCK_API_KEY=your-key
export AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
npm start

OpenRouter (simplest cloud)

export MODEL_PROVIDER=openrouter
export OPENROUTER_API_KEY=sk-or-v1-your-key
npm start

📖 More Examples

Contributing

We welcome contributions! Please see:

Contributing Guide - How to contribute
Testing Guide - Running tests

License

Apache 2.0 - See LICENSE file for details.

Community & Support

⭐ Star this repo if Lynkr helps you!
💬 Join Discussions - Ask questions, share tips
🐛 Report Issues - Bug reports welcome
📖 Read the Docs - Comprehensive guides

Made with ❤️ by developers, for developers.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.claude		.claude
.github/workflows		.github/workflows
bin		bin
docker		docker
docs		docs
documentation		documentation
scripts		scripts
src		src
test		test
.env.example		.env.example
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.npmignore		.npmignore
CITATIONS.bib		CITATIONS.bib
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PERFORMANCE-REPORT.md		PERFORMANCE-REPORT.md
README.md		README.md
docker-compose.yml		docker-compose.yml
final-test.js		final-test.js
index.js		index.js
install.sh		install.sh
lynkr-0.1.1.tgz		lynkr-0.1.1.tgz
package-lock.json		package-lock.json
package.json		package.json
test-learning-unit.js		test-learning-unit.js
test-learning.js		test-learning.js
test-parallel-agents.sh		test-parallel-agents.sh
test-parallel-direct.js		test-parallel-direct.js

License

Fast-Editor/Lynkr

Folders and files

Latest commit

History

Repository files navigation

Lynkr - Claude Code Proxy with Multi-Provider Support

Overview

💰 Cost Savings

Real-World Savings Example

How We Achieve 60-80% Cost Reduction

🚀 Key Features

Multi-Provider Support (9+ Providers)

Cost Optimization

Privacy & Security

Enterprise Features

IDE Integration

Advanced Capabilities

Developer Experience

Streaming & Performance

Quick Start

Installation

Supported Providers

Claude Code Integration

Cursor Integration

Documentation

Getting Started

IDE Integration

Features & Capabilities

Deployment & Operations

Support

External Resources

Key Features Highlights

Architecture

Quick Configuration Examples

Contributing

License

Community & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages