Ayn | Devpost

Ayn giving suggestions to the coding agent based on errors it identified
Ayn going through a web app

Inspiration

The prevalence of AI coding agents promises the notion that anyone can be a web developer. As someone who uses AI coding agents all the time, one of my biggest pain points, and my friends alike, has always been closing the gap between what the agent sees, which is just code, and what we see visually, which is the webpage.

As of now, coding agents only understand code, not the visual behavior of the product they’re changing. They can modify CSS, but they don’t see when a button’s hover animation stutters, when a layout shifts unexpectedly, or when a color palette breaks accessibility.

Tweak code → rebuild → screenshot → explain → repeat. This endless cycle is not only repetitive but also slow and mentally draining.

When Google released the Gemini 2.5 Computer Use API, which can interact with real browsers and reason about screenshots, we realized we could finally give coding agents "eyes". That's how we came up with Ayn, eyes for coding agents.

What it does

So far, Ayn can take a developer’s natural-language request, parse it, and autonomously explore a live web app using the Gemini 2.5 Computer Use API.

Core functionality achieved

Instruction → Structured Task Ayn accepts instructions such as

“Inspect the hover animation on the main button at localhost:5173.” The instruction is routed through OpenRouter, which converts it into a JSON TaskSpec describing the target URL, selector, desired properties, and tolerances.

Automated Browser Exploration The orchestrator calls Gemini Computer Use to plan actions (navigate, hover, click, etc.) and executes them through a Playwright-based executor. The executor captures screenshots and key CSS metrics (transform, transition-duration, timing-function) for every step.

Observation Logging Each run generates a unique ID and stores structured output under /runs//, including:

Screenshots at each step

Computed style metrics

JSON observations describing what Gemini saw and measured

Error and Progress Reporting Ayn produces a clear JSON result listing any issues it detects (e.g., hover scale mismatch, missing transition). It also tracks run state through /status and /result endpoints for real-time progress updates.

Contextual Prompts for Coding Agents At the end of every run, Ayn generates a summary prompt tailored for coding agents (like Cursor or Copilot). This prompt encapsulates Ayn’s observations — including detected discrepancies, current style metrics, and screenshots — so the coding agent gains real, visual context about how the website behaves. Instead of editing code blindly, the coding agent can now reason about how its changes affect the actual UI and make more informed adjustments.

Simple CLI / API Interface Developers can trigger a run with

curl -X POST localhost:8000/tasks \ -d '{"instruction":"Check hover animation on signup button"}'

and receive structured results without manually taking screenshots or describing visuals.

How we built it

We started by throwing around different ideas for what kind of problem we wanted to solve. It took several discussions and a few disagreements before we finally agreed to focus on automating debugging using multiple AI systems. Once we locked in that vision, we spent time brainstorming how to actually make it work, from defining the flow between AIs to deciding which tools would handle what. After testing a few prototypes, we finalized our stack: Gemini 2.5 Computer Use API for system-level automation, OpenRouter for custom prompt generation, and a Python FastAPI backend to connect everything together.

After choosing our tools, we divided responsibilities based on each team member’s strengths. One handled backend communication, another focused on frontend logic and error simulation, one specialized in AI prompt design and OpenRouter integration, and another worked on overall system coordination and testing. Once everyone began coding, we regularly checked in with each other to debug issues, share progress, and align on how each module would interact. Those quick syncs helped us fix problems early and stay consistent despite the time pressure. As the night went on, the pieces slowly came together until we finally had a working multi-agent AI system running end-to-end.

Challenges we ran into

One of the first major challenges we faced was agreeing on an idea our team had several competing concepts, and it took time to settle on one that everyone felt passionate about. Even after deciding on this project, we struggled to agree on how to approach it, especially with so many moving parts between backend integration, model coordination, and prompt optimization. On the technical side, integrating several APIs and AI models proved extremely difficult. The Gemini Computer Use API required precise state management across asynchronous tasks, and synchronizing file edits with live feedback caused frequent data inconsistencies. Designing a backend that could route tasks between three separate AI agents while maintaining consistent context was especially challenging. We also faced prompt drift issues and environment conflicts between FastAPI, PowerShell, and Python that slowed down iteration.

Accomplishments that we're proud of

We successfully built a functioning multi-agent debugging pipeline that can autonomously detect and verify frontend errors. Getting Gemini to interact seamlessly with our backend and OpenRouter API was a major technical milestone. Another accomplishment was improving the default system prompt into a smarter, adaptive prompt one that learns from prior runs to produce more targeted debugging instructions. Even with under 24-hour hackathon constraints, we managed to deploy a proof of concept demonstrating a full feedback cycle between human prompt → AI action → verification → re-prompt. It showed the potential of automating IDE-level reasoning.

What we learned

This project taught us how to coordinate multiple AI systems into a unified workflow rather than using them as isolated tools. We gained a deeper understanding of prompt engineering, backend orchestration, and how context management impacts large-scale API calls. Working under time pressure also reinforced the importance of modular design breaking the system into micro-components made debugging and testing much faster. Beyond the technical lessons, this experience brought our team closer together and helped us realize how well we collaborate under pressure. Since it was our first time using tools like Gemini and OpenRouter, we were proud to see how quickly we adapted and how capable we actually were. It gave us confidence that we could tackle even more ambitious AI-driven projects in the future.

What's next for Ayn

Our vision for Ayn extends beyond just live UI observation, we want it to become a self-improving web ecosystem, where coding agents can see, reason, and repair without constant human intervention.

To do that, we plan to integrate Ayn with coding agents inside environments like VS Code, Cursor, or JetBrains IDEs. AI agents will be able to trigger Ayn directly from their editor, visualize live observations, and get context-rich fix suggestions in real time. They can iterate over this context until the desired UI implementation is reached.

Ayn’s ultimate goal is to iterate over itself; generating insights, prompting code edits, and verifying results through repeated cycles until the target behavior is achieved. This will create a true “see–reason–fix–verify” feedback loop between the codebase and the running webapp.

We envision a future where websites can maintain themselves automatically. If a deployment introduces bad CSS, broken layouts, or failed interactions, Ayn could detect the issue visually, propose a fix, and even submit a pull request, which AI coding agents can fix, all without human oversight. At the rate, frontier models are improving, we believe this is possible very soon

Built With

fastapi
gemeniapi
gemenicomputeruse
openrouter
playwright?
python

Submitted to

HackTX 2025
- Winner [MLH] Best Use of Gemini 2.5 Computer Use Model

Created by

I worked as the Reporter/CLI and Demo Script lead. My main focus was developing the CLI wrapper that executes tasks, pretty-prints final observations and error lists in a table, and exports results as result.json. I also handled artifact collection in runs//, including screenshots and action logs, and integrated OpenRouter on the backend to refine the default AI prompt into a more adaptive and context-aware version. Although managing multiple components and prompt tuning was challenging, I overcame these difficulties with the help and collaboration of my team.

Abel Tadele
My main task was developing the backend that receives output from the Gemini Computer Use system, executes it through Playwright, and logs the results. It was my first time using Playwright, so I spent time studying the documentation and testing examples to understand how it works. This helped me build a reliable execution flow and strengthen the connection between Gemini and the backend.

Kidus Beshah
I worked on the backend, focusing on integrating Gemini Computer Use with our system. Managing multiple moving parts proved challenging, especially ensuring consistency across modules while transferring data between components.

Bemnet Beshah
I helped come up with the idea and organized how our team split the work. I focused on building the part that connects everything after the executor runs, aggregating results, formatting the final audit report, and making sure it’s ready for the frontend. I learned a lot about connecting different modules together and how to coordinate effectively within a team where everyone’s work depends on each other.

Nathan Negera