Inspiration
Remember the last time you spent an agonising amount of time trying to find "that one setting" you needed on a website? Remember how frustrating it is to dive into the Setting of a website/app, only to spiral down into the weird mislabelled subcategories and ... (I can go on forever)
This isn't just a frustrating experience; it can also be severely hindering for many sections of society (like grandparents/non-tech-savvy folks who often don't even know the meaning of the various settings), preventing accessibility to technology.
Apart from access, not configuring settings can also lead to security issues (like having Multi-Factor-Authentication turned OFF)
This project aims to solve the simple question: What if there was something which could perform these tasks for me?
What it does
Guiding Agent shows you the setting you're searching for, and performs setting changes to any website you use.
How we built it
Architecture
This is a python project built using Google ADK.
There are three main Agents:
- Steps Generator: Takes user input and generates a set of steps to be performed to achieve the task
- Code Generator: Takes in the steps generated by Steps Generator, and creates Playwright code to perform those steps
- Code Executor: Takes the code from Code Generator and executes it to perform actually perform the task.
The three agents are executed sequentially in that order.
[image of agents]
Agents
The project consists of a SequentialAgent orchestrating three sub-agents: steps_provider, code_generator, and code_runner.
1. steps_provider
- Description: An instruction simplifier that converts detailed, conversational instructions into a concise, numbered list of explicit, actionable steps.
- Role: To break down complex user requests into a series of manageable, atomic actions.
- Output: A numbered list of action-oriented steps.
- Example Input: "To sign in to Google, navigate to google.com and click the 'Sign in' button, usually found in the top right corner. Then, enter your Google Account email or phone number and password which you set up earlier."
- Example Output:
- Goto google.com
- Click button named "Sign In"
- Type email or phone number in text box
- Press Enter
- Type password in text box
- Press Enter
2. code_generator
- Description: An expert Playwright test automation engineer focused tasked with translating sequential natural language instructions into runnable Playwright Python code.
- Role: To generate Playwright code based on the simplified steps provided by
steps_provider. - Key Functionality:
- Browser Connection: Always uses
playwright.chromium.connect_over_cdp("http://localhost:9222", slow_mo=50)to connect to an existing Chrome browser instance. - Context and Page Selection: Always picks the first available browser context and page.
- Locator Strategy: Implements a hierarchical prioritization for locators, including
get_by_role,get_by_text,get_by_label,get_by_placeholder, and contextual CSS selectors as a last resort. - Robustness: Employs mandatory
try-exceptblocks for every interaction (click(),fill(),goto(), etc.) to implement a multi-locator fallback mechanism. If one locator fails, it systematically tries the next prioritized one.
- Browser Connection: Always uses
- Output: A complete, self-contained Playwright Python script.
3. code_runner
- Description: Runs Python code.
- Role: To execute the Playwright Python code generated by the
code_generatorin a separate process. - Key Functionality:
- Executes the provided Python code using
subprocess.run. - Handles both successful and unsuccessful code execution.
- Cleans up the Python code string before execution (removes ````python` and ````` wrappers).
- Executes the provided Python code using
Challenges we ran into
This wasn't easy. Following challenges made this task techinally difficult:
Generating Steps
The steps_generator agent needed to create simple, accurate, and predictable steps. It struggled at first, but with better instructions, its performance improved significantly.
Generating Playwright Code
This part was challenging, and it's still not perfect. Playwright finds elements on a web page using "locators" (like names or functions). It was hard for the model to know the exact locator names, as these are not public. My solution was to have the model search mainly by the locator's name. This gives several possible locators, and the code then tries each one until it finds one that works.
Dynamic Code Execution
I needed to run Python code that changes each time. Initially, I tried a Playwright setup that runs one step at a time, but this didn't work because Google ADK uses a different kind of execution (an "asyncio event loop"). Then, I tried an asynchronous Playwright setup, but I couldn't figure out how to start a new task in the existing execution environment. My final solution was to run the code in a completely separate Python environment using subprocess.run().
Accomplishments that we're proud of
- Overcoming the above mentioned challenges
- It works really well!
What we learned
- LLMs are great. Agents make them better
- The primary problem with using LLMs is that they're probabilistic, unlike code, which is deterministic.
- ADK provides good constraints to help make LLMs more probabilistic
- The quality of prompt still matters a lot. A lot of instructions still have to be very well laid out in the prompt.
- Results of each run differ even despite using best practices
What's next for Guiding Agent
Make it an agent which can run 100% of the times. It should be able to reason through the steps and be able to take in even vague descriptions of task. Guiding Agent shouldn't be limited to web. Having it on our smartphones would be the long term goal.
Scope for Improvement
- The agentic system just performs the task directly. We should ask permission for execution from the user
- Halt Execution: The user should have the option to halt execution of the code
- Scrape the webpage to get the exact locators
- Recursive execution strategy with traceback - enables the agent to come back to the homepage if it goes down an incorrect path
Built With
- adk
- playwright
- python
Log in or sign up for Devpost to join the conversation.