Give our backend engine a star on GitHub!
Inspiration
With the rise of AI and language models, the goal is to democratize retrieval‑augmented generation so that everyday users can run local models, get good accuracy, reduce hallucinations, and keep their data private. Using hyperplane‑based locality sensitive hashing (LSH) instead of a heavyweight vector database means the “index” is just a lightweight Python library, lowering the compute and ops barrier so users can plug in embeddings and get useful RAG behavior without standing up FAISS or Qdrant infrastructure.
For non‑experts, the system exposes simple retrieval “modes” (for example, lower false‑positive rate, higher recall, more precise) as thresholds on the LSH side, so they can choose safer retrieval policies without tuning approximate‑nearest‑neighbor internals. Because everything runs locally against their own files and keys, nothing leaves their machine, which simultaneously improves grounding and preserves privacy.
What it does
The project is built around a hyperplane‑based LSH engine (lshrs) wrapped in a chat interface that delivers fast, efficient RAG responses over a user’s own data. In the demo, roughly 60k randomly selected Wikipedia articles are indexed, and queries run through LSH to retrieve candidate chunks that the model then uses as grounding context.
Compared with dense vector search on this demo scale, the LSH index is noticeably faster and lighter on memory on commodity hardware, while sacrificing a bit of precision versus a well‑tuned ANN index. The tradeoff is biased toward “fast enough on a laptop, good‑enough recall” rather than squeezing out the last few points of accuracy.
The chat interface surfaces retrieved chunks by passing them through an MCP layer into the LLM prompt so every answer is conditioned on concrete snippets from the user’s data. Those snippets can also be shown directly in the UI as supporting context, making it clear why the model responded a certain way and helping users build trust.
How we built it
The core package and UI wrapper are written in Python, with some interfaces and frontend elements developed in JavaScript. Random‑hyperplane LSH is used because it is embedding‑agnostic and easy to tune: users can trade speed versus recall by changing the number of hyperplanes and tables, without touching their embedding model.
The MCP server sits between the UI, the model, and the LSH engine, acting as a retrieval broker. The UI (or a client like Gemini) calls MCP tools to ask for context; the server runs queries against the LSH index and returns results as structured tools the model can call, which keeps the system modular and makes it easy to plug in new tools or data sources without rewriting the chat frontend.
Challenges and lessons
Major challenges included time constraints, first‑time hacking, and getting the UI to work reliably while talking to the MCP server. One of the hardest bugs was preventing unsafe or out‑of‑bounds MCP executions from breaking the UI around dynamic tool calls, which pushed the team to add guardrails and validate keys and scopes before any MCP call goes through, plus lots of small, focused manual tests under time pressure.
A big takeaway was the importance of planning: next time, the first few hours would be spent ruthlessly designing the architecture and integration points instead of jumping straight into coding. Front‑loading the “plumbing” decisions (how UI, MCP, LSH, and the model talk to each other) would save time later—and yes, getting coffee early instead of midway through the death march was another lesson.
Accomplishments
The primary accomplishment is simply getting the whole thing working end‑to‑end: building the “darn project” into a functioning RAG stack powered by hyperplane‑based LSH and an MCP‑backed chat UI. Shipping a local‑first retrieval engine that non‑experts can actually run and interact with, within hackathon constraints, is something to be proud of.
Impact and next steps
Right now, the package is published on PyPI, and the team is considering turning it into a full hosted webapp that people can access easily. To preserve the local, private story if it becomes hosted, the core engine will remain open‑source under a permissive license so people can self‑host or fork it, and any hosted demo would be “bring your own API keys” with no long‑term data retention beyond the session.
Beyond Wikipedia, the sweet spot is personal or team‑level corpora: a researcher’s PDF library, a startup’s internal docs, or any on‑prem knowledge base that must stay private. Anywhere users want fast, controllable RAG on messy, sensitive documents—without shipping data to a third‑party service—is where Wurst can really shine.
Built With
- gradio
- huggingface
- javascript
- python
Log in or sign up for Devpost to join the conversation.