AI Model & Provider Discovery Registry
Kosha (कोश — treasury) discovers AI models across providers, resolves credentials, enriches with pricing, and exposes the catalog through a library, CLI, HTTP API, and a built-in OpenAI-compatible proxy. One source of truth for model identity, pricing, and routing — so your app doesn't break when providers ship new SKUs or change rates.
npm install @sriinnu/kosha-discovery # library / server
npm install -g @sriinnu/kosha-discovery # global `kosha` CLIimport { createKosha } from "@sriinnu/kosha-discovery";
const kosha = await createKosha();
const models = kosha.models(); // all
const cheapest = kosha.cheapestModels({ role: "image" }); // ranked
const sonnet = kosha.model("sonnet"); // alias resolves
console.log(sonnet.pricing); // { inputPerMillion: 3, outputPerMillion: 15, ... }kosha discover # discover all providers (writes cache + manifest)
kosha list --provider anthropic # filter from local cache
kosha model sonnet # details for one model (alias-aware)
kosha cheapest --role embeddings # rank cheapest for a role
kosha update # force a fresh fetch
kosha serve --port 3000 # HTTP APIAfter each discovery, a stable v1 manifest lands at ~/.kosha/registry.json — any tool that reads JSON can consume it:
jq '.models[] | select(.pricing.inputPerMillion < 0.1) | .modelId' ~/.kosha/registry.jsonGET /api/models[?provider=…&role=…] GET /api/models/:idOrAlias
GET /api/models/:idOrAlias/routes GET /api/models/cheapest?role=…
GET /api/providers GET /api/roles
POST /api/refresh GET /health
Kosha runs as an OpenAI-compatible proxy. Point your SDK at http://localhost:3000/proxy/v1 and it resolves the model, picks the right provider, injects credentials, and forwards — streaming included.
kosha serve # start on :3000import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3000/proxy/v1",
apiKey: "not-used", // kosha resolves credentials from env
});
// Use any canonical model ID or alias
const res = await client.chat.completions.create({
model: "sonnet",
messages: [{ role: "user", content: "hello" }],
});
// Let kosha pick the cheapest model you have a key for
const cheap = await client.chat.completions.create({
model: "kosha:cheapest",
messages: [{ role: "user", content: "hello" }],
});
// Cheapest model with tool_use and at least 128k context
const routed = await client.chat.completions.create({
model: "kosha:cheapest[tool_use,128k]",
messages: [{ role: "user", content: "hello" }],
});kosha:cheapest filter syntax (comma-separated, combinable):
| Filter | Example | Meaning |
|---|---|---|
| capability | tool_use, vision |
model must have this tag |
<N>k |
128k, 200k |
minimum context window |
provider:<id> |
provider:groq |
pin to a specific provider |
The response always includes x-kosha-model, x-kosha-provider, and x-kosha-requested headers so the caller knows exactly what ran.
Supported transports: openai, openai-compatible-http, ollama. Anthropic, Google, Bedrock, and Vertex require wire-format translation — not yet proxied.
| Provider | Discovery | Credential sources |
|---|---|---|
| Anthropic | /v1/models |
ANTHROPIC_API_KEY, Claude CLI, Codex CLI |
| OpenAI | /v1/models |
OPENAI_API_KEY, GitHub Copilot tokens |
/v1beta/models |
GOOGLE_API_KEY, GEMINI_API_KEY, Gemini CLI, gcloud |
|
| AWS Bedrock | SDK → CLI → static | AWS_ACCESS_KEY_ID, ~/.aws/credentials, SSO, IAM |
| Vertex AI | API + gcloud | GOOGLE_APPLICATION_CREDENTIALS, ADC |
| Ollama | local API | — (local) |
| OpenRouter | API | OPENROUTER_API_KEY (optional) |
| Vercel AI Gateway | /v1/models |
AI_GATEWAY_API_KEY, VERCEL_OIDC_TOKEN (public discovery, required for execution) |
| NVIDIA / Together / Fireworks / Groq / Cerebras / Cohere / DeepInfra / Perplexity | API | provider key env var |
| DeepSeek / Mistral / Moonshot (Kimi) / GLM (Zhipu) / Z.AI / MiniMax | API | provider key env var |
Full credential setup: docs/credentials.md.
Discovery layer talks to provider APIs and local catalogs. Enrichment layer fills pricing and context windows from the LiteLLM catalog and models.dev. Resilience layer (circuit breaker + stale-cache fallback + health tracker) keeps a flaky provider a degraded read, never a crash. Manifest layer writes a v1-stable JSON snapshot so downstream consumers — tokmeter, chitragupta, ayuh — read prices from one source instead of inventing their own. Proxy layer exposes an OpenAI-compatible endpoint that resolves kosha:cheapest[…] hints at request time, injects credentials, and forwards to the winning provider.
| Credentials | Env vars, CLI tools, and config files for every provider |
| CLI | Commands, flags, examples |
| HTTP API | Endpoints, parameters, response schemas |
| Configuration | Aliases, routing, enrichment, programmatic config |
| Architecture | Discovery flow, module map, adding providers |
| Resilience | Circuit breakers, stale cache, health |
| Security | Threat catalogue, runtime scanning, pre-commit hook |
| Discovery Plane v1 | Stable daemon contract (deltas, SSE watch, binding hints) |
Tag-driven via GitHub Actions:
git tag -s vX.Y.Z -m "vX.Y.Z" && git push origin vX.Y.Z
# → Actions → "Manual Release (Tag + npm)" → run with tag=vX.Y.ZThe workflow checks tag ↔ package.json match, builds, tests, publishes to npm, and creates the GitHub Release. Requires the NPM_TOKEN secret.
litellm (pricing data) · openrouter · ollama · chitragupta (registry patterns) · takumi (routing needs that drove kosha's creation).
MIT
