Inspiration

As a data scientist, I often work with internal teams who ask questions regarding user behaviour, SEO metrics, and other KPIs. While these are usually simple queries, answering them manually requires writing SQL each time. I realized that this repetitive workflow could be automated. This inspired the creation of EchoQL : a natural language to SQL interface powered by intelligent agents, enabling business users to get structured answers without writing a single line of code.


What it does

EchoQL is a multi-agent system that converts plain English questions into SQL and returns the results. It operates as a sequential pipeline with specialized agents:

  1. Data Availability Checker – Ensures the relevant fields exist in the dataset.
  2. SQL Generator – Constructs a SQL query using the available schema.
  3. SQL Validator – Confirms the SQL is valid.
  4. SQL Repair Agent – Fixes any issues in the query if validation fails.
  5. SQL Fetcher – Executes the query on BigQuery and returns a DataFrame.

How we built it

We built EchoQL using the Google Agent Development Kit (ADK) and deployed it via Google Cloud Run, using:

  • ADK for defining and orchestrating custom Python agents
  • Google Cloud Run for containerized deployment of the agent service
  • Google BigQuery as the query execution backend
  • Google Cloud Storage for data staging and artifacts

The system runs as a fully autonomous pipeline, where each agent handles one part of the natural language to SQL workflow.


Challenges we ran into

  1. The default ADK agent templates were sometimes limiting. We solved this by subclassing BaseAgent to define custom behavior and control.
  2. Deployment was challenging initially since we weren’t using Vertex AI's Agent Engine or UI tools. Instead, we deployed via adk deploy cloud_run, which required understanding GCP's Cloud Run service, service accounts, environment variables, and requirements management.

Accomplishments that we're proud of

  • Successfully built a functional NL→SQL pipeline with multi-agent coordination.
  • Learned to deploy ADK agents on Cloud Run, making the solution cloud-accessible without depending on Vertex AI Agent Engine.
  • Built something that could be immediately valuable inside a company as a self-service data access tool.

What we learned

Through EchoQL, we gained experience with:

  • Google Agent Development Kit (ADK) and its agentic programming model
  • Building agentic workflows with BaseAgent classes and orchestration
  • Deploying to Cloud Run using the adk deploy cloud_run workflow
  • BigQuery + pandas + db-dtypes integration for querying at scale
  • Containerizing ML-powered APIs for business usage

We also deepened our understanding of production-grade agentic systems and scalable deployment on GCP.


What's next for EchoQL

We plan to improve EchoQL by integrating a semantic table retriever that filters relevant table schemas using vector similarity before passing context to the generator. This will:

  • Reduce LLM context size
  • Improve accuracy of generated SQL
  • Lower overall latency and cost

We’re also planning to package EchoQL as a web app with authentication, workspace isolation, and an internal schema explorer — making it an enterprise-ready internal tool.

Built With

Share this project:

Updates

posted an update

To test EchoQL, ask a clear question based on data from Jan 1–Dec 31, 2024, using one of four tables: mock_users (id, name, email, birthday), mock_questions (question_id, user_id, created_at, content), mock_answers (id, question_id, user_id, created_at, content), or mock_user_sessions (session_id, user_id, duration_min, session_date). Only questions referencing existing columns and valid dates will work—others trigger an error. Examples: “Total number of questions?”, “Who asked the most questions?”, “Oldest or youngest user?”, or “Total number of answers?” EchoQL checks schema, generates SQL, validates, and fetches results from BigQuery.

Log in or sign up for Devpost to join the conversation.