Cogni-Flow: The Autonomous AI Apprentice
Submission for the Galuxium Nexus V1 Hackathon
Focus Domains: Autonomous AI Agents, Developer Productivity Tools
Demo Video:
Presentation Deck:
- The Core Challenge
Expert knowledge is "trapped." The most valuable, complex workflows (in coding, design, and operations) are stuck in the heads of senior-level experts. They are difficult to document and impossible to automate with simple scripts because they rely on context and intuition.
- Our Solution: Cogni-Flow
Cogni-Flow is an "AI apprentice" agent that learns to perform complex tasks simply by observation.
It's a simple 3-step process:
- WATCH: The user clicks "Start Recording." The agent observes their screen, clicks, and keyboard actions as they perform a complex task.
- LEARN: The user clicks "Stop & Learn." Cogni-Flow sends the action log to a Generative AI model (Google's Gemini 2.5) to analyze the logic and intent behind the actions. The AI then writes a new, autonomous Python script.
- AUTOMATE: The user can now click "Run Workflow," and the agent will execute that newly generated script, performing the entire complex task on its own.
This redefines human-computer interaction by using passive observation as an input and accelerates productivity by automating the "un-automatable."
- The Working Prototype
This repository contains the working prototype, app.py. It's a desktop agent built in Python that fully demonstrates the "Watch, Learn, Automate" loop.
Tech Stack
Core Logic: Python
Desktop UI: customtkinter
AI "Brain": Google Gemini (google-generativeai)
Sensing (Recording): pynput (for mouse/keyboard) & mss (for screen)
Acting (Automation): pyautogui
-
How to Run
-
Clone the repository:
bash git clone [https://github.com/YOUR_USERNAME/cogni-flow.git](https://github.com/YOUR_USERNAME/cogni-flow.git) cd cogni-flow -
Install all required libraries:
bash python -m pip install customtkinter pynput mss google-generativeai pyautogui -
Set your Google AI API Key: Get your free key from
aistudio.google.com. Enable the "Generative Language API" in your Google Cloud console. In your terminal, set the key (this keeps it secure):
```powershell
For Windows PowerShell
$env:GOOGLE_API_KEY = "your_new_key_here"
```
-
Run the app:
bash python app.py -
Test the loop: Click "Start Recording." Perform an action (e.g., open Notepad, type "Hello World"). Click "Stop & Learn." (Wait for the "Ready to Run" status). Click "Run Workflow" and watch the automation.