** WHILE USING THE LIVE DEMO IT IS POPULATED WITH ROUGHLY 500 IMAGES FROM MY OWN SCREEN OVER THE PAST FEW HOURS AS I BUILT THE PROJECT, FOR SAKE OF MY EC2 VOLUME NOT DYING I HAVE TURNED OFF THE CONTINUOUS CAPTURE AS WITH WORKFLOW FEATURE. IF YOU REALLY WANT TO TEST THAT OUT YOU'LL NEED TO LOCALLY SET THE PROJECT UP ON A WINDOWS DEVICE OR WAIT FOR ME TO LIVE DEMO ;)**

Inspiration

Ever struggle to remember a video you watched last week? Or that one terminal command you spent hours figuring out only to forget it later?

And what about the repetitive tasks you’ve done dozens of times, like reviewing a full-stack PR: cloning the repo, running the project, checking for UI inconsistencies, running manual test cases, opening a separate terminal to inspect the database… the same tedious workflow all over again.

Memri is built to solve exactly these problems. It captures what you see, remembers what you do, and helps you repeat complex workflows effortlessly.

What it does

Memri Captures your screen alongside several window's internal API's to extract extra information and using these bits to build a context base empowering as you may have guessed it, Claude as a Chatbot, and as an eventual autonomous workflow agent.

Infinite possibilities open up when your agent has full context to your "workspace" from understanding what the most used tools are, you being able to ask it for forgotten niche bits of information you had seen in the past.

Imagine a scenario where you get instant recall to the things you need, like what videos you've watched today simply by talking to Memri, and not only will it tell you, it'll show you!

Or what if you wanted to find that sneaky terminal command you ran a while back, or what the most commonly used tools are for your device.

Memri is even capable of building workflows through the workflow tab, manual tasks are easily documented by Memri, soon to be executed by AI Agents.

How we built it

The short answer, Memri captures your screen, indexes the data with FST and metadata enriching LLM context.

The long answer is that Memri is a rust based multi agent ai system that utilises OCR, Window s API i.e for screen detection, agentic context lookup, soft mcp decision based system, all for optimized real time context building allowing agents to directly operate with your machines context, with the added benefit of generating workflows and soon executing upon them autonomously.

Challenges we ran into

  • Optimizations, decided on webp compressing, calculating a similarity score between screenshots to prevent over imaging, deciding what OCR to use

Accomplishments that we're proud of

  • App does not consume alot of memory despite constant OCR & Screenshots

What we learned

  • VibeCoding and Agentic Flow is genuinely a game changer in building and POC'ing

What's next for Memri

  • Graph Rag for building a knowledge system on top of context layer
  • Audio capture to add to the context ontop of the OCR'd content and metadata
  • Shared Memory space across multiple individuals creating an organizations context brain
  • Autonomous Agent integrations acting on the workflows

Built With

Share this project:

Updates