Guiding Agent

Inspiration

Remember the last time you spent an agonising amount of time trying to find "that one setting" you needed on a website? Remember how frustrating it is to dive into the Setting of a website/app, only to spiral down into the weird mislabelled subcategories and ... (I can go on forever)

This isn't just a frustrating experience; it can also be severely hindering for many sections of society (like grandparents/non-tech-savvy folks who often don't even know the meaning of the various settings), preventing accessibility to technology.

Apart from access, not configuring settings can also lead to security issues (like having Multi-Factor-Authentication turned OFF)

This project aims to solve the simple question: What if there was something which could perform these tasks for me?

What it does

Guiding Agent shows you the setting you're searching for, and performs setting changes to any website you use.

How we built it

Architecture

This is a python project built using Google ADK.

There are three main Agents:

Steps Generator: Takes user input and generates a set of steps to be performed to achieve the task
Code Generator: Takes in the steps generated by Steps Generator, and creates Playwright code to perform those steps
Code Executor: Takes the code from Code Generator and executes it to perform actually perform the task.

The three agents are executed sequentially in that order.

[image of agents]

Agents

The project consists of a SequentialAgent orchestrating three sub-agents: steps_provider, code_generator, and code_runner.

1. `steps_provider`

Description: An instruction simplifier that converts detailed, conversational instructions into a concise, numbered list of explicit, actionable steps.
Role: To break down complex user requests into a series of manageable, atomic actions.
Output: A numbered list of action-oriented steps.
Example Input: "To sign in to Google, navigate to google.com and click the 'Sign in' button, usually found in the top right corner. Then, enter your Google Account email or phone number and password which you set up earlier."
Example Output:
1. Goto google.com
2. Click button named "Sign In"
3. Type email or phone number in text box
4. Press Enter
5. Type password in text box
6. Press Enter

2. `code_generator`

Description: An expert Playwright test automation engineer focused tasked with translating sequential natural language instructions into runnable Playwright Python code.
Role: To generate Playwright code based on the simplified steps provided by steps_provider.
Key Functionality:
- Browser Connection: Always uses playwright.chromium.connect_over_cdp("http://localhost:9222", slow_mo=50) to connect to an existing Chrome browser instance.
- Context and Page Selection: Always picks the first available browser context and page.
- Locator Strategy: Implements a hierarchical prioritization for locators, including get_by_role, get_by_text, get_by_label, get_by_placeholder, and contextual CSS selectors as a last resort.
- Robustness: Employs mandatory try-except blocks for every interaction (click(), fill(), goto(), etc.) to implement a multi-locator fallback mechanism. If one locator fails, it systematically tries the next prioritized one.
Output: A complete, self-contained Playwright Python script.

3. `code_runner`

Description: Runs Python code.
Role: To execute the Playwright Python code generated by the code_generator in a separate process.
Key Functionality:
- Executes the provided Python code using subprocess.run.
- Handles both successful and unsuccessful code execution.
- Cleans up the Python code string before execution (removes ````python` and ````` wrappers).

Challenges we ran into

This wasn't easy. Following challenges made this task techinally difficult:

Generating Steps

The steps_generator agent needed to create simple, accurate, and predictable steps. It struggled at first, but with better instructions, its performance improved significantly.

Generating Playwright Code

This part was challenging, and it's still not perfect. Playwright finds elements on a web page using "locators" (like names or functions). It was hard for the model to know the exact locator names, as these are not public. My solution was to have the model search mainly by the locator's name. This gives several possible locators, and the code then tries each one until it finds one that works.

Dynamic Code Execution

I needed to run Python code that changes each time. Initially, I tried a Playwright setup that runs one step at a time, but this didn't work because Google ADK uses a different kind of execution (an "asyncio event loop"). Then, I tried an asynchronous Playwright setup, but I couldn't figure out how to start a new task in the existing execution environment. My final solution was to run the code in a completely separate Python environment using subprocess.run().

Accomplishments that we're proud of

Overcoming the above mentioned challenges
It works really well!

What we learned

LLMs are great. Agents make them better
The primary problem with using LLMs is that they're probabilistic, unlike code, which is deterministic.
ADK provides good constraints to help make LLMs more probabilistic
The quality of prompt still matters a lot. A lot of instructions still have to be very well laid out in the prompt.
Results of each run differ even despite using best practices

What's next for Guiding Agent

Make it an agent which can run 100% of the times. It should be able to reason through the steps and be able to take in even vague descriptions of task. Guiding Agent shouldn't be limited to web. Having it on our smartphones would be the long term goal.

Scope for Improvement

The agentic system just performs the task directly. We should ask permission for execution from the user
Halt Execution: The user should have the option to halt execution of the code
Scrape the webpage to get the exact locators
Recursive execution strategy with traceback - enables the agent to come back to the homepage if it goes down an incorrect path

Built With

adk
playwright
python

Updates

Shivam Agarwal started this project — Jun 23, 2025 06:39 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.