Semantic code search for AI agents. Taps through your codebase to find exactly where things live.
wdpkr is not a replacement for grep/ripgrep. It's the conceptual layer on top — "where does the commission system live?" rather than "find the string CommissionService."
wdpkr search "release commission payments to individual payees"
{
"query": "release commission payments to individual payees",
"namespace": "my-repo",
"indexed_at": "abc123",
"results": [
{
"path": "src/finance/commission/release.rs",
"score": 0.87,
"summary": "Service for releasing commission payments...",
"symbols": [
{
"name": "release_payment",
"kind": "function",
"lines": [42, 78],
"summary": "Releases commission for a specified payee...",
"score": 0.91
}
]
}
]
}The agent reads the actual files for ground truth. wdpkr's job is to point and describe, not to ship source into the context window.
Indexing (CI, on merge to main) Searching (local, agent-invoked)
───────────────────────────── ──────────────────────────────
repo files natural-language query
│ │
▼ ▼
┌─────────┐ ┌──────────┐
│ Chunker │ tree-sitter AST │ Embedder │ same model as index
└────┬────┘ └────┬─────┘
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Summarizer │ Claude Haiku │ Vector Store │ cosine similarity
└──────┬───────┘ └──────┬───────┘
▼ ▼
┌──────────┐ group by file
│ Embedder │ Voyage code-3 attach top symbols
└────┬─────┘ return tiered JSON
▼
┌──────────────┐
│ Vector Store │ Turbopuffer
└──────────────┘
- Embed summaries, not code. Off-the-shelf embedders are mediocre on conceptual queries against raw code. LLM-generated summaries close that gap.
- AST-driven chunking. Tree-sitter parses files into semantically meaningful symbols (functions, types, traits) rather than arbitrary line splits. Supports Rust, Go, TypeScript/TSX, JavaScript, Python, Java, C/C++, C#.
- Pluggable backends. Traits for VectorStore, Embedder, Summarizer, and Chunker — swap providers without changing the pipeline.
- CLI, not MCP. Any agent that can shell out can use it. JSON to stdout, errors to stderr.
cargo install wdpkrOr from source:
git clone https://github.com/duckedup/wdpkr.git
cd wdpkr
cargo install --path .# Initialize a repo (writes CLAUDE.md section, .wdpkrignore, CI workflow)
wdpkr init
# Configure providers and API keys
wdpkr config init
# Index the codebase
wdpkr index --full
# Search
wdpkr search "release commission payments"
wdpkr search "how is rate limiting implemented" --pretty
wdpkr search "auth flow" --scope src/auth/ -k 10Four-layer resolution: defaults → config file → env vars → CLI flags.
wdpkr config init # Interactive setup — choose providers, enter API keys
wdpkr config list # Show effective values + where each came fromwdpkr uses three external services. Each is trait-swappable — the defaults are production-ready but you can bring your own.
| Role | Default | Alternatives | API key |
|---|---|---|---|
| Summarizer | Anthropic Claude Haiku | — | ANTHROPIC_API_KEY |
| Embedder | Voyage voyage-code-3 |
OpenAI, Ollama (local) | VOYAGE_API_KEY |
| Vector store | Turbopuffer | nidus (local, pure-Rust) | TURBOPUFFER_API_KEY |
ANTHROPIC_API_KEY # summarization (required)
TURBOPUFFER_API_KEY # vector storage (required for the default store)
VOYAGE_API_KEY # embedding (required for default provider)
WDPKR_EMBED_PROVIDER # voyage | ollama | openai
WDPKR_STORE_PROVIDER # turbopuffer | nidus
WDPKR_NIDUS_PATH # nidus store directory (store.provider=nidus)
WDPKR_NAMESPACE # override auto-derived namespace
All settings can also be set in ~/.config/wdpkr/config.yaml via wdpkr config set.
For a fully local setup with no hosted vector database, use the nidus backend — a pure-Rust embeddable store (no API key, no FFI / no C toolchain):
wdpkr config set store.provider nidus
wdpkr config set store.nidus.path ~/.local/share/wdpkr/nidus # optional; this is the defaultPair it with the Ollama embedder (WDPKR_EMBED_PROVIDER=ollama) to keep embeddings
local too. Provider-specific store settings are nested per backend in the config file:
store:
provider: nidus
turbopuffer:
api_key: ... # or TURBOPUFFER_API_KEY
nidus:
path: ~/.local/share/wdpkr/nidus # store directory; or WDPKR_NIDUS_PATHSearch is exact (brute-force cosine), and because nidus is pure Rust, wdpkr builds and links with no C/C++ toolchain — which is what lets it be vendored cleanly.
wdpkr can index more than code. A tap is a data source; the default is files
(your repo). Configure a taps: list to add others — e.g. Linear issues, so an
agent can retrieve why a decision was made, not just the code that resulted:
taps:
- name: files
- name: linear # newest issues + comment threads
settings:
amount: 100
order_by: updatedAtexport LINEAR_API_KEY=lin_api_...
wdpkr index --tap linear # index just Linear
wdpkr search "why did we change the rate table" --provider linearNon-files results carry a source field ("source": "linear") and a
scheme-prefixed path (linear://ENG-123). --provider scopes a search to chosen
sources. See the Taps guide for the
full reference.
| Command | Purpose |
|---|---|
wdpkr search "<query>" |
Semantic search — returns tiered JSON |
wdpkr search "<q>" --provider linear |
Scope search to specific tap sources |
wdpkr index [--full] |
Index the codebase (full or incremental) |
wdpkr index --tap linear |
Index only a configured tap (e.g. Linear) |
wdpkr index --dry-run |
Estimate tokens and cost without API calls |
wdpkr init |
Set up wdpkr for a repo (CLAUDE.md, .wdpkrignore, CI workflow) |
wdpkr config init |
Interactive config setup |
wdpkr config list |
Show all config values and their sources |
wdpkr config get <key> |
Get a single config value |
wdpkr config set <key> <val> |
Set a config value |
wdpkr ships a retrieval eval harness. Cases live in eval/cases/*.json (query +
expected files/symbols); run them against a live index with:
wdpkr eval # default suite
wdpkr eval eval/cases/wdpkr-deep.json --tag keyword # filter by tagEach case reports recall@k, MRR (rank of the first relevant file),
symbol recall, and a compression ratio (result tokens ÷ tokens of the
files you'd otherwise read). A By tag breakdown slices the suite by query style.
The numbers below are the wdpkr-deep suite (32 cases) run against wdpkr's own
codebase, indexed in --docstring mode (embeds code documentation +
signatures, no LLM summaries) with voyage-code-3 into a local
nidus store.
| Metric | Value |
|---|---|
| Recall@5 (file) | 0.95 |
| MRR (file) | 0.81 |
| Symbol recall | 0.60 |
| Symbol MRR | 0.27 |
| Compression ratio | 0.25 |
By query style:
| Style | n | Recall@5 | MRR |
|---|---|---|---|
| keyword | 6 | 1.00 | 0.87 |
| where-is | 2 | 1.00 | 1.00 |
| natural-language | 20 | 0.97 | 0.84 |
| concept | 3 | 1.00 | 0.67 |
Takeaways: file-level retrieval is strong and top-heavy — the right file is returned ~95% of the time and is usually rank 1–2, even with no LLM in the indexing path. Keyword and "where is X" queries rank best; broad conceptual phrasings rank lower. Symbol-level precision is the weakest spot (Symbol MRR 0.27): the correct file often ranks well before the correct symbol surfaces.
The docs site lives in docs/ — Astro + Starlight, published to
wdpkr.duckedup.org on merge to main. It uses
Bun.
just docs # dev server with live reload (http://localhost:4321)
just docs-build # production build → docs/dist/
just docs-preview # preview the production buildOr directly:
cd docs
bun install
bun run dev