Turing is an AI agent that learns by watching you work. Like an intern that shadows you, learns your workflows, and then executes them autonomously.
Imagine telling your computer:
"Open my DataVis class on Canvas and clone the notebook"
And it just... does it. Because it watched you do it once for your Machine Learning class.
That's Turing.
- Click "record", perform your workflow naturally
- System captures:
- Every click, scroll, and keystroke
- Screenshots before/after each action
- Visual context (what you clicked on)
- OCR of text elements
- AI analyzes your recording to understand:
- What steps you took
- What the workflow accomplishes
- Which values are parameters (e.g., class names)
- Visual signatures of UI elements
- Tell it what you want in natural language
- System:
- Finds matching workflow
- Extracts new parameters from your request
- Executes workflow with visual guidance
- Uses OCR to locate elements dynamically
- Store unlimited workflows
- Search by name, description, tags
- Export/import workflow packages
- Track usage statistics
βββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β "Open my DataVis class on Canvas" β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Workflow Matching Engine β
β β’ Find similar learned workflows β
β β’ Extract parameters from user request β
β β’ Calculate confidence score β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Visual Memory β
β workflows/ β
β βββ {uuid}/ β
β β βββ metadata.json β
β β βββ steps/ β
β β β βββ step_001.json β
β β β βββ step_001_before.png β
β β β βββ step_001_after.png β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Visual-Guided Execution β
β 1. Take screenshot β
β 2. Use OCR to find target element β
β 3. Use Vision LLM to understand UI β
β 4. Calculate click coordinates β
β 5. Execute action β
β 6. Verify state change β
βββββββββββββββββββββββββββββββββββββββββββββββ
# Navigate to backend directory
cd Turing/backend
# Activate virtual environment
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtexport GOOGLE_API_KEY='your_gemini_api_key_here'python agent_enhanced.py-
Start the enhanced agent:
python agent_enhanced.py
-
Enter
recordcommand -
Provide workflow details:
Workflow name: Open Canvas Class Description: Navigate to Canvas and open a specific class Tags: canvas, education -
Perform your workflow naturally - the system is watching!
- Open browser
- Navigate to canvas.asu.edu
- Click on your class
- Do whatever you need to do
-
When done, enter
stopcommand -
System analyzes and identifies parameters:
π Identified Parameters: - class_name: Name of the class to open Example: Machine Learning β Workflow saved!
Just describe what you want:
π¬ Open my DataVis class on Canvas
β¨ Found matching workflow: Open Canvas Class
Confidence: 90%
Execute this workflow? [Y/n]: y
π¬ Executing learned workflow...
β
Done!
π¬ list
π Learned Workflows:
=================================================================
Open Canvas Class
ββ Navigate to Canvas and open a specific class
Steps: 3 | Uses: 5
Parameters: class_name
Tags: canvas, education
Download Bank Statement
ββ Log into bank and download statement PDF
Steps: 8 | Uses: 2
Parameters: month, year
Tags: finance, banking
Stores workflows with complete visual context.
from visual_memory import VisualWorkflowMemory
memory = VisualWorkflowMemory()
# Create workflow
wf_id = memory.create_workflow(
name="My Workflow",
description="What it does",
tags=["tag1", "tag2"]
)
# Add steps
memory.add_step(
workflow_id=wf_id,
action_type='click',
action_data={'x': 500, 'y': 300},
screenshot_before=screenshot,
screenshot_after=screenshot,
visual_context={'clicked_text': 'Submit'}
)
# Finalize
memory.finalize_workflow(wf_id, parameters=[...])Monitors user actions and captures visual context.
from recorder import WorkflowRecorder
recorder = WorkflowRecorder()
# Start recording
wf_id = recorder.start_recording("My Workflow")
# User performs actions...
# System automatically captures everything
# Stop recording
recorder.stop_recording()Extracts meaning from screenshots using OCR and computer vision.
from visual_analyzer import VisualAnalyzer
analyzer = VisualAnalyzer()
# Analyze what was clicked
context = analyzer.analyze_click_context(
screenshot,
click_x=500,
click_y=300
)
print(context['clicked_text']) # "Submit Button"
# Find text in screenshot
matches = analyzer.find_text_in_screenshot(
screenshot,
target_text="Machine Learning"
)
for match in matches:
print(f"Found at: {match['center']}")Main interface with recording and learned execution.
The system uses Google's Gemini LLM to analyze workflows and identify parameters:
Workflow: Open Canvas Class
Steps:
1. Navigate to https://canvas.asu.edu
2. Click on "Machine Learning"
3. Click on "Assignments"
AI identifies:
- "Machine Learning" is a parameter (varies per class)
- "Assignments" is NOT a parameter (always same)
When executing with new parameters, system uses multiple strategies:
- OCR Text Matching: Find text "DataVis" on screen
- Visual Similarity: Compare to recorded element appearance
- Position Heuristics: Similar elements often in same region
- Vision LLM: Ask AI "where is the DataVis class link?"
if confidence > 0.9:
# Execute automatically
elif confidence > 0.7:
# Ask for confirmation
else:
# Ask user to demonstrateWorkflows are stored as structured directories:
workflows/
βββ 550e8400-e29b-41d4-a716-446655440000/
β βββ metadata.json
β βββ steps/
β β βββ step_001.json
β β βββ step_001_before.png
β β βββ step_001_after.png
β β βββ step_002.json
β β βββ step_002_before.png
β β βββ step_002_after.png
metadata.json:
{
"workflow_id": "550e8400-...",
"name": "Open Canvas Class",
"description": "Navigate to Canvas and open class",
"tags": ["canvas", "education"],
"created": "2025-10-25T10:30:00",
"status": "ready",
"steps_count": 3,
"parameters": [
{
"name": "class_name",
"type": "string",
"example": "Machine Learning",
"step": 2,
"description": "Name of class to open"
}
]
}step_001.json:
{
"step_id": "step_001",
"step_number": 1,
"timestamp": 1698234567.89,
"action_type": "click",
"action_data": {
"x": 500,
"y": 300,
"normalized_x": 340,
"normalized_y": 314
},
"visual_context": {
"clicked_text": "Machine Learning",
"element_type": "link",
"ocr_confidence": 0.95
},
"screenshot_before": "step_001_before.png",
"screenshot_after": "step_001_after.png"
}Record: "Resolve ticket #1234 for product XYZ"
Execute: "Resolve ticket #5678 for product ABC"
β System learns ticket resolution workflow
Record: "Enter invoice from ACME Corp"
Execute: "Enter invoice from Widget Co"
β Learns invoice entry pattern
Record: "Test login flow with valid credentials"
Execute: "Test login flow with invalid credentials"
β Learns UI testing patterns
Record: "Download paper from arXiv and save to Papers folder"
Execute: "Download paper [URL] and save to Papers folder"
β Learns research paper workflow
- Visual-guided execution: Core logic implemented, needs refinement
- OCR accuracy: Depends on text clarity and font
- UI variations: Works best with consistent UI layouts
- Cross-application: Currently optimized for web applications
- Advanced visual element matching with ML
- Support for conditional logic in workflows
- Workflow editing and debugging tools
- Multi-monitor support
- Windows/Linux support
- Browser extension for better web automation
This is research-grade software under active development. Contributions welcome!
Areas of focus:
- Improving OCR accuracy
- Better parameter identification
- Visual element matching algorithms
- Cross-platform support
- UI/UX improvements
MIT License - See LICENSE file
Built with:
- Google Gemini AI (computer use & vision)
- PyAutoGUI (screen control)
- EasyOCR (text extraction)
- pynput (action monitoring)
Inspired by:
- Robotic Process Automation (RPA) systems
- Programming by Demonstration research
- The dream of truly intelligent assistants
See additional documentation:
ARCHITECTURE.md- System architecture and designRESEARCH.md- Deep dive into technologies usedcomputer_use_simple.py- Core computer controlagent_interface.py- Original agent interface
Made with β€οΈ for CalHacks 2025
Teaching computers to learn by watching, one workflow at a time.