Presenting, The Guiding Agent. Remember the last time you spent an agonising amount of time trying to find "that one setting" you needed on a website? Remember how frustrating it is to dive into the Setting of a website/app, only to spiral down into the weird mislabelled subcategories and ... (I can go on forever)
This isn't just a frustrating experience, it can also be severely hindering for many sections of society (like grandparents/non-tech-savvy folks who often don't even know the meaning of the various settings), preventing accessibility to technology.
Apart from access, not configuring settinga can also lead to security issues (like having Multi-Factor-Authentication turned OFF)
This project aims to solve the simple question: What if there was something which could perform these tasks for me?
This is a python project built using Google ADK.
There are three main Agents:
- Steps Generator: Takes user input and generates a set of steps to be performed to achieve the task
- Code Generator: Takes in the steps generated by Steps Generator, and creates Playwright code to perform those steps
- Code Executor: Takes the code from Code Generator and executes it to perform actually perform the task.
The three agents are executed sequentially in that order.
The project consists of a SequentialAgent orchestrating three sub-agents: steps_provider, code_generator, and code_runner.
- Description: An instruction simplifier that converts detailed, conversational instructions into a concise, numbered list of explicit, actionable steps.
- Role: To break down complex user requests into a series of manageable, atomic actions.
- Output: A numbered list of action-oriented steps.
- Example Input: "To sign in to Google, navigate to google.com and click the 'Sign in' button, usually found in the top right corner. Then, enter your Google Account email or phone number and password which you set up earlier."
- Example Output:
- Goto google.com
- Click button named "Sign In"
- Type email or phone number in text box
- Press Enter
- Type password in text box
- Press Enter
- Description: An expert Playwright test automation engineer focused tasked with translating sequential natural language instructions into runnable Playwright Python code.
- Role: To generate Playwright code based on the simplified steps provided by
steps_provider. - Key Functionality:
- Browser Connection: Always uses
playwright.chromium.connect_over_cdp("http://localhost:9222", slow_mo=50)to connect to an existing Chrome browser instance. - Context and Page Selection: Always picks the first available browser context and page.
- Locator Strategy: Implements a hierarchical prioritization for locators, including
get_by_role,get_by_text,get_by_label,get_by_placeholder, and contextual CSS selectors as a last resort. - Robustness: Employs mandatory
try-exceptblocks for every interaction (click(),fill(),goto(), etc.) to implement a multi-locator fallback mechanism. If one locator fails, it systematically tries the next prioritized one.
- Browser Connection: Always uses
- Output: A complete, self-contained Playwright Python script.
- Description: Runs Python code.
- Role: To execute the Playwright Python code generated by the
code_generatorin a separate process. - Key Functionality:
- Executes the provided Python code using
subprocess.run. - Handles both successful and unsuccessful code execution.
- Cleans up the Python code string before execution (removes ````python` and ````` wrappers).
- Executes the provided Python code using
