Inspiration
Most AI chats forget everything the moment you start a new thread. That means users keep re-explaining the same context, preferences, and decisions over and over. Cortex was built to fix that with a persistent memory layer and a brain-like visual map that makes old chats feel easy to explore.
What it does
Cortex turns past conversations into useful memory, that can re-used.
- Hybrid search (keyword + semantic): mixes exact matches with meaning-based recall (65/35 keyword-to-semantic) so it finds both facts and related ideas.
- Context Selection Users can select chats based of the hybrid-search, extract the context and create custom system prompts within seconds.
- MCP delivery: a custom Model Context Protocol (MCP) server injects retrieved memories directly into Claude.
- Quality checks: Backboard.io helps score memory results (relevance, redundancy, coverage) before they get returned.
- Visual interface: chats appear as nodes; search results light up so you can instantly see where the memory lives.
How we built it
Cortex is four main pieces:
- Backend: stores chats, indexes them, and runs hybrid retrieval.
- Frontend: interactive node-based visualization for browsing and searching memory.
- Local model: Qwen 2.5 (8B) running locally for summarization + prompt generation.
- MCP server: two endpoints—
search(find best memories) andfetch(return structured memory payload).
Challenges we ran into
- Speed: PEGASUS (a fine-tuned summarization model) was solid for summarization but too slow for real-time topic extraction, so we moved core extraction to our local Qwen setup.
- Clean context: retrieved memory had to be short, structured, and actually useful or it would hurt model performance instead of helping. (This is what the whole project revolves around)
Accomplishments that we’re proud of
- Built a model-agnostic memory layer that works externally and plugs in only when needed.
- Made hybrid retrieval feel reliable, especially for specific facts, not just “yap.”
- Created a brain-like visualization where relevant chats highlight during search and grouped by clusters.
- Got a custom MCP integration working
What we learned
- The biggest key to success is planning (and a claude pro subscription)
- LLMs don’t just need more context, they need the right context.
- Hybrid retrieval beats pure semantic search in real usage.
- Ranking + formatting matter as much as the retrieval itself.
What’s next for Cortex
- User-controlled memory: edit, pin, tag, or delete what the system remembers.
- Smarter summarization + pruning: keep long-term memory compact without losing important details.
- Better relevance weighting: improve ranking using recency and usage over time.
- Support more AI clients: plug Cortex into multiple chat tools, not just one.
Built With
- 3d-visualization
- css-backend:-fastapi
- html
- http:
- javascript
- languages:-python
- nomic-embed-text
- pydantic
- scikit-learn-(kmeans)
- sqlalchemy
- three.js
- umap-learn-llm/embeddings:-ollama-(qwen-2.5-for-summaries
- uvicorn-database:-sqlite-ml/vector:-numpy



Log in or sign up for Devpost to join the conversation.