CoBrain

Inspiration

Every day, builders and creators waste countless hours fighting their own computers.

When we were working on an AWS project, debugging meant constantly copy pasting errors and screenshots into ChatGPT. It was clunky, distracting, and broke our flow. At the same time, a friend of mine was burning precious focus on shallow tasks like replying to Discord messages and emails instead of coding.

We tried existing computer controllers, but they all had the same flaw: they needed painfully detailed instructions to do even the simplest thing. They didn’t understand us.

What we needed wasn’t just another tool. We needed an assistant that understands what we mean, not just what we type. Something that can cut through the noise, automate the repetitive, and let us stay in flow.

That’s the vision: a computer that works the way you think.

What it does

Our tool is built on a simple but powerful idea: your computer should understand how you work.

It constantly watches your screen and builds a living context of the way you use your machine. If you always clone your Git repos into your Desktop folder, it learns that and remembers it. Over time, it knows your habits, your preferences, and your workflow.

At the heart of this is our AI voice agent, activated with a simple trigger word — CoBrain. It is not always listening, only when you call on it. When you say, “CoBrain, open the repo I worked on last night,” it already knows you were working on a client’s website, that you always pull from the main branch before coding, and that your favourite IDE is Cursor. Without you having to spell it out, it takes care of everything and gets you straight back into flow. That is our core feature.

But what makes CoBrain truly powerful are the side features that make everyday work seamless:

Intuitive debugging anywhere. Whether on AWS, another cloud platform, or even in a chess match, CoBrain can tell when you are asking a question versus giving a command. If you hit an error, just say, “CoBrain, what is this,” and it instantly explains the issue and how to fix it.

Background agents. Imagine you are coding and Discord is blowing up with messages. Instead of breaking your focus, you can just say, “CoBrain, tell him I’ll talk to him later.” It understands who you mean, sends the reply in the background, and lets you stay in the zone.

Seamless integration with existing tools. We are not trying to build another AI code editor. Instead, CoBrain works with the tools you already use. If you are in Cursor and hit an error, just ask, “CoBrain, what is this error.” CoBrain knows you are inside Cursor, forwards the error into your Cursor chat, and drives the debugging process from there.

General voice-powered Q and A. Instead of wasting time screenshotting and typing into ChatGPT, you can just ask out loud. It is faster, natural, and keeps you moving.

CoBrain is not just another assistant. It is the bridge between how you think and how your computer acts — letting you stay in flow while it handles the rest. g

How we built it

The core runs on the Electron framework with Python subprocesses powering the heavy lifting. For screen analysis we use Screenpipe, giving CoBrain real time awareness of what is happening on your desktop. All of that context, from workflows to preferences, is stored securely in a self hosted instance of Qdrant so the knowledge always stays with you.

To interact directly with the computer we leaned on existing AppleScript, which lets CoBrain click, type and execute actions just like you would. For voice we use DeepGram for fast and accurate transcription, paired with OpenAI agents for reasoning and execution. Instead of always listening, CoBrain uses OpenWakeWords to activate only when you say the trigger word. We even trained our own custom wakeword “CoBrain” using Google Colab.

Challenges we ran into

• MCP integration: Coordinating multiple moving parts—AI models, background agents, and the MCP framework—was more complex than anticipated. • Creating custom wake words: Designing a system that reliably triggers without false positives or missed activations required a lot of trial and error. • Context storage: Building a way for the app to capture user context and store it so the AI agent could retrieve it later was more difficult than expected. • Script duplication: The AI agent sometimes ran the same script multiple times in a row, repeating actions unnecessarily. • Transcription issues: Early on, transcription models would output duplicate words or garbled results due to how the audio was being handled.

Accomplishments that we're proud of

• Developed a fully functional desktop AI agent capable of being commanded by users through voice and activating only when a wake word is spoken. • Creating a custom wake word. • Creating a desktop application that runs quickly and smoothly with a nice UI. • Built the AI to execute scripts on a user’s computer, allowing it to interact with the desktop, open websites, and automate tasks. • Created an always watching functionality that embeds user activity into a secure, local vector database to better help the AI understand the user. • Successfully integrated a database system that allows the agent to access past activity, enabling smarter, context-aware assistance.

What we learned

• How to implement wake word detection, including creating and customizing our own wake words for the agent. • How to transcribe audio using AI models, and handle the challenges of accuracy, latency, and different voice inputs. • How to effectively use AI agents to automate tasks on the desktop, including running scripts and interacting with applications. • How to work with MCP (Message Control Protocol / or whichever MCP you mean) to coordinate communication between the desktop agent, AI models, and stored data. • Practical lessons in integrating multiple AI systems, ensuring they work together smoothly while maintaining responsiveness and reliability.

What's next for CoBrain

• Windows compatibility: Make the scripting system fully functional on Windows to broaden accessibility. • More efficient AI calling: Optimize how and when the AI is called to significantly reduce costs. • Script templates: Introduce reusable script templates so users can automate common tasks while saving on compute costs. • App integrations: Expand integration with different desktop and web applications for wider usability. • Data reliability: Improve how user data and interaction logs are saved for consistency and trustworthiness. • Background agents: Build more reliable and efficient background agents that can run securely without interfering with user workflows. • Elastic compute & security: Use elastic compute infrastructure to store user data securely, powering background agents and authenticated logins. • Web application interaction: Enhance the agent’s ability to navigate and interact directly with web apps. • Authentication: Integrate authentication into both the desktop app and the website for secure usage and account management. • Extensive testing: Carry out large-scale testing to ensure stability, performance, and safety. • Finalize product: Fully polish and stabilize the app before public release. • Launch & marketing: Release CoBrain as both a desktop and web product, then begin marketing and onboarding users.