Inspiration

We were very annoyed when we faced a problem and had to describe it to AI instead of just letting them see

What it does

It looks at your screen, and answers your questions and carries out actions

How we built it

Trigger: a global ctrl+option hotkey finalizes the voice transcript and kicks off the pipeline Voice: system microphone input transcribed to text in real time Screen capture: ScreenCaptureUtility grabs all active screens in parallel with the memory query Long-term memory: ChromaDB with all-MiniLM-L6-v2 (sentence-transformers, running locally) for semantic vector search across sessions. Top 3 results by cosine similarity are retrieved and injected into the system prompt as past context. Short-term memory: an in-memory conversationHistory array (capped at 10 turns) handles within-session continuity LLM: Google Gemini via the Chat API, called with the augmented system prompt TTS: response played back as speech immediately after the API call Async storage: the current exchange is saved to ChromaDB non-blocking, after the response plays

Challenges we ran into

ChromaDB did not really work

Accomplishments that we're proud of

Ignoring RAG, the app worked very well without any bugs

What we learned

RAG is hard to implement

What's next for Blink

Properly implement RAG and allow AI to access info

Built With

Share this project:

Updates