About Artemis

Inspiration

The idea for Artemis came from watching my sister struggle with ADHD and experiencing similar focus challenges ourselves. Sitting down to code, opening "just one tab" for documentation, and suddenly finding ourselves 30 minutes deep in YouTube videos with 47 open tabs and zero lines of code written became a pattern we couldn't ignore. Traditional productivity apps would block entire websites, but that felt too restrictive. A YouTube tutorial about React hooks is valuable learning content, while YouTube's homepage autoplay is a distraction trap. The difference isn't the domain, it's the intent and timing.

We wanted to build something smarter: a system that could actually understand what you're doing, what matters right now, and reshape your environment accordingly. Not by forcing rigid rules, but by learning your patterns and adapting in real-time. That's when we discovered the intersection of eye tracking, browser automation, and LLM reasoning could create something truly intelligent. For people with ADHD or anyone struggling with focus, context-aware assistance could be life-changing.

What It Does

Artemis is an AI-powered focus orchestration system that monitors your cognitive state and automatically optimizes your workspace. It combines:

  • Eye tracking via MediaPipe and EyeTrax to detect attention patterns, blink rates, and fixation stability
  • Browser telemetry through Chrome DevTools Protocol to analyze tab content, usage patterns, and engagement scores
  • Window monitoring to track active applications and infer task context
  • LLM reasoning using Claude 3.5 to semantically analyze content and make intelligent decisions
  • Environment control for smart lights (WiZ/LIFX) and music (Spotify) synchronized to flow phases

The system operates across four cognitive phases:

  1. Calibration - Warming up, high exploration, dispersed attention
  2. Engagement - Focus forming, gaze clusters stabilizing
  3. Flow - Deep focus achieved, minimal distractions
  4. Recovery - Fatigue detected, gradual cooldown

How We Built It

Architecture

Artemis is built as a multi-layered system with distinct components:

1. FlowSync Core (Electron + React + TypeScript)

  • Desktop application with glassmorphic UI using TailwindCSS and Framer Motion
  • IPC bridge between renderer and main process for system-level operations
  • Real-time metrics dashboard showing attention analytics

2. Eye Tracking Service (Python + Node.js Bridge)

  • Python service using EyeTrax library for gaze estimation
  • JSON-RPC communication over stdin/stdout for bidirectional messaging
  • Implemented Kalman filtering and variable scaling for 50% accuracy improvement
  • Calibration quality metrics with statistical analysis

3. Chrome Monitor (Chrome DevTools Protocol)

  • Connects to Chrome via CDP on port 9222
  • Executes JavaScript in target tabs to extract:
    • Page content (up to 10,000 chars)
    • Semantic information (headings, topics, sentiment)
    • Technical context (frameworks, languages, code blocks)
    • Behavioral metrics (scroll position, time spent, engagement)
  • Smart tab change detection that triggers LLM updates only when needed

4. LLM Reasoning Engine (Claude 3.5 Haiku)

  • Temporal memory system that stores significant moments with semantic tags
  • Computes real metrics from actual data (no hallucinated numbers):
    • Focus stability based on window/tab dwell time
    • Distraction level from tab switching frequency
    • Task coherence from domain consistency
    • Cognitive load from tab count and complexity
  • Conservative tab filtering with explicit decision logic:
    • Preserves educational content for learning tasks
    • Keeps documentation for development tasks
    • Only hides truly irrelevant or long-unused tabs
  • Learns user patterns over time with baseline improvement tracking

5. Environment Controllers (Python)

  • Spotify Web API integration with OAuth2 authentication
  • Smart lighting control for WiZ-compatible bulbs
  • Neuroergonomic presets based on research (cool white for focus, warm for breaks)

Key Technical Decisions

Why Electron? We needed desktop-level access to control Chrome, manage windows, and integrate with system APIs while maintaining a modern UI. Electron provided the perfect bridge.

Why Python for eye tracking? The EyeTrax and MediaPipe libraries are Python-native. We built a clean IPC bridge to Node.js rather than fighting the ecosystem.

Why Claude instead of GPT? Claude 3.5 Haiku offered the best balance of speed, cost, and reasoning quality for our use case. The 30-second analysis interval meant we needed fast, accurate responses.

Why Chrome DevTools Protocol? Unlike browser extensions (which can't access certain pages), CDP gives us unrestricted access to all tabs and can execute arbitrary JavaScript for deep content extraction.

Challenges We Faced

1. Gaze Tracking Accuracy

Problem: Initial calibration accuracy was poor (80-150px error), making it impossible to know what the user was actually looking at.

Solution:

  • Implemented variable scaling to weight features by variance (20-40% error reduction)
  • Added Kalman filtering for smooth trajectories (60-80% jitter reduction)
  • Enhanced sample collection from 15 to 20-30 samples per calibration point
  • Added calibration quality metrics so users know when to recalibrate

Result: Achieved 40-80px mean error with smooth, natural-feeling tracking.

2. LLM Hallucination of Metrics

Problem: When we asked Claude to analyze context and provide metrics, it would hallucinate numbers rather than computing them from real data.

Solution:

  • Pre-compute all metrics in TypeScript from actual telemetry
  • Pass computed values explicitly in the prompt
  • Constrain the LLM to interpret, not generate, numerical data
  • Added temporal evolution metrics (baseline improvement, learning curve)

Result: Real metrics that accurately reflect user behavior, with LLM providing qualitative interpretation.

3. Over-Aggressive Tab Closing

Problem: Early versions would close valuable tabs like YouTube tutorials or documentation because they matched "distraction" patterns.

Solution:

  • Implemented semantic content analysis of actual page content
  • Added task-aware filtering that understands context:
    • Educational videos are valuable for learning tasks
    • Documentation is essential for development tasks
    • Reference materials support research tasks
  • Conservative filtering rules: "when in doubt, keep it visible"
  • Detailed evaluation per tab with explicit reasoning

Result: System preserves task-relevant content while filtering genuine distractions.

4. Chrome DevTools Protocol Connection Issues

Problem: CDP requires Chrome to be launched with --remote-debugging-port=9222, which users often forget or configure incorrectly.

Solution:

  • Created comprehensive setup documentation with platform-specific instructions
  • Added connection testing and clear error messages
  • Implemented automatic reconnection logic
  • Built verification tools (curl http://localhost:9222/json)

Result: Smooth onboarding with clear troubleshooting steps.

5. Rate Limiting and Performance

Problem: Making an LLM API call on every tab switch or window change would be slow and expensive.

Solution:

  • Implemented intelligent caching (1-minute validity)
  • Tab change detection with hashing to avoid redundant calls
  • 10-second minimum interval between LLM updates
  • 15-second initialization delay to prevent startup spam
  • Pre-computed metrics reduce LLM workload

Result: Responsive system with minimal API costs (typically 2-4 calls per hour).

6. Cross-Platform Window Monitoring

Problem: Window tracking APIs differ significantly across macOS, Windows, and Linux.

Solution:

  • Used active-win library which abstracts platform differences
  • Fallback to basic tracking when detailed info isn't available
  • Graceful degradation on permission errors

Result: Consistent experience across all platforms with platform-specific optimizations.

7. Temporal Memory and Learning

Problem: Each analysis was stateless, missing patterns that emerge over time.

Solution:

  • Built a temporal memory system with significance scoring
  • Track focus sessions, task transitions, and flow state changes
  • Retrieve relevant memories based on recency and importance
  • Calculate baseline improvement and learning curves
  • Semantic tagging for efficient memory retrieval

Result: System learns from past behavior and adapts recommendations over time.

What We Learned

Technical Skills

  • Advanced IPC patterns: Building robust bridges between Python and Node.js with JSON-RPC
  • Chrome automation: Deep dive into Chrome DevTools Protocol beyond basic Puppeteer usage
  • Computer vision: Implementing and optimizing gaze tracking with Kalman filters and feature scaling
  • LLM prompt engineering: Crafting prompts that produce structured, accurate outputs without hallucination
  • Real-time system design: Building responsive UIs that react to multiple async data streams
  • Payment API integration: Working with enterprise financial APIs and security requirements
  • Semantic analysis: Extracting meaning from unstructured web content

System Design

  • Multi-modal sensor fusion: Combining data from eye tracking, browser telemetry, window monitoring, and content analysis into a coherent cognitive state model
  • Conservative automation: Building intelligent systems that preserve user agency rather than forcing rigid rules
  • Temporal awareness: Designing systems that learn and adapt based on historical patterns
  • Graceful degradation: Ensuring core functionality works even when components fail

Product Insights

  • Context is everything: The same content (YouTube, documentation, social media) can be valuable or distracting depending on the user's current task
  • Trust through transparency: Users need to understand why tabs are being closed, not just have it happen mysteriously
  • Incremental adaptation: Gentle, gradual changes are more effective than sudden dramatic interventions
  • Learning from behavior: Observing what users actually do is more valuable than what they say they want

Challenges of AI Systems

  • Hallucination mitigation: LLMs will confidently generate plausible-sounding but incorrect data if not constrained
  • Explainability: Users need to understand the system's reasoning, especially for actions that affect their work
  • Performance vs. accuracy trade-offs: Real-time responsiveness requires caching and rate limiting, which can delay perfect accuracy
  • Cold start problem: System needs time to learn user patterns before making confident recommendations

What's Next

Immediate Improvements

  • Multi-user profiles: Save calibrations and learned patterns per user
  • Enhanced calibration: Adaptive calibration that adds points in low-accuracy regions
  • Online learning: Update the gaze model continuously during tracking
  • Better visualization: Show gaze heatmaps and attention patterns over time

Advanced Features

  • Predictive focus state modeling: Anticipate when user is about to lose focus and intervene proactively
  • Calendar integration: Prepare environment for upcoming tasks based on schedule
  • Collaborative focus sessions: Shared focus time with friends or team members
  • Mobile integration: Extend environment control to phone notifications and apps
  • Voice control: Hands-free commands for common actions

Research Directions

  • Personalized flow triggers: Learn individual patterns that lead to deep focus
  • Distraction prediction: Identify early warning signs before attention breaks
  • Cognitive load optimization: Dynamically adjust task complexity based on current capacity
  • Long-term pattern analysis: Track productivity patterns over weeks and months

Conclusion

Artemis represents a new approach to productivity: instead of blocking or restricting, it understands context and adapts intelligently. By combining eye tracking, browser automation, and AI reasoning with a deep respect for user agency, we've built a system that genuinely helps maintain focus without getting in the way.

The journey taught us that building AI-powered systems requires careful attention to hallucination, explainability, and performance. Real-time cognitive state detection is possible with consumer hardware, but it requires thoughtful integration of multiple data sources and conservative decision-making.

Most importantly, we learned that the best productivity tools are the ones you forget are running - they just make your environment feel naturally conducive to focus. That's the ultimate goal of Artemis: invisible assistance that lets you do your best work.

Share this project:

Updates