Desktop/Browser tasks automater
A local browser and desktop operator for internships, scholarships, Docs, Drive, Slides, D2L, and daily desktop actions.
- Browser sessions are now reused through a stable
session_id, so repeated commands stay in the same automation browser. - The default planner profile is now
qwen2_5_7bfor stronger text first planning. - The browser worker now supports direct text clicks, CSS clicks, typing, key presses, waiting for text, tab listing, tab switching, tab closing, and basic challenge detection.
- The desktop worker now supports PowerShell commands, file search, file open, file move, clipboard copy, paste, and Notepad launch.
- The engine now routes mixed tasks instead of only internships and scholarships.
- The voice frontend now has a real engine bridge that can transcribe a wav file, submit the task, and speak a summary response.
- Create a virtual environment.
- Install requirements.
- Run
playwright install chrome. - Start the model with
scripts/start_model.ps1 -Profile qwen2_5_7b. - Start the browser service.
- Start the desktop service.
- Start the engine service.
- Optionally start the Telegram bridge.
- The browser stays visible on purpose.
- The same
session_idnow maps to the same persistent Playwright profile directory. - Google Docs and Drive are currently browser first in this codebase. API first integration can be layered on later.
- Challenge detection is included, but challenge completion is still a human handoff step.