Skip to content

aiqualitylab/ai-natural-language-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

201 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered E2E Test Generation Platform

Describe tests in plain English. AI writes the code.

Enterprise-grade platform to generate and execute Cypress, Playwright, and WebdriverIO end-to-end tests from natural language requirements.

This project combines LLM-driven generation, LangGraph workflow orchestration, and vector-based pattern learning to improve test authoring speed while maintaining repeatability and CI/CD readiness.

CI DOI Python Node.js License OpenAI Anthropic Google Cypress Playwright WebdriverIO LangChain LangGraph FAISS SQLite Sentence-Transformers Docker GHCR Grafana OpenTelemetry Loki

Website


Table of Contents

Getting Started

Platform Design

Setup & Configuration

Using the Platform

Operations

Project Info

Overview

The platform translates natural language requirements into executable E2E tests for:

Framework Output Style
Cypress .cy.js Traditional & prompt-powered
Playwright .spec.ts TypeScript async/await
WebdriverIO .spec.js Mocha runner with Jest-like expect

It supports both local engineering workflows and automated pipeline execution. The generator uses contextual data from live HTML analysis and historical pattern matching to produce stable, maintainable test assets.

Business Value

Note

  • Reduces manual test authoring effort and onboarding time.
  • Standardizes generated test structure across teams.
  • Improves reuse through vector-based pattern memory.
  • Supports enterprise delivery with CI/CD and Docker workflows.
  • Enables faster root-cause diagnosis using AI-assisted failure analysis.

Core Capabilities

Capability Detail
Test Generation Natural language to executable E2E test generation
Orchestration LangGraph-based multi-step orchestration
URL Analysis Dynamic URL analysis and fixture generation
Pattern Memory Pattern storage and semantic retrieval using ChromaDB
LLM Support Multi-provider: OpenAI, Anthropic, Google
Cypress Modes Traditional mode and Cypress prompt-powered mode
Playwright TypeScript generation
WebdriverIO JavaScript .spec.js generation with Mocha and Chrome runner support
Execution Optional immediate test execution after generation
Tracing OpenTelemetry trace export to Grafana Tempo
Logging Optional log shipping to Grafana Loki

Architecture

graph TB
    subgraph "User Input"
        A[Natural Language<br/>Requirements]
        B[URL/HTML Data<br/>--url flag]
        C[JSON Test Data<br/>--data flag]
    end

    subgraph "AI & Workflow Engine"
        D[LangGraph Workflow<br/>5-Step Process]
        E[Multi-Provider LLM<br/>OpenAI / Anthropic / Google]
        F[Vector Store<br/>Pattern Learning<br/>Chroma DB]
    end

    subgraph "Framework Generation"
        G{Cypress Framework}
        H{Playwright Framework}
        W{WebdriverIO Framework}
        I[Cypress Tests<br/>.cy.js files<br/>Traditional & cy.prompt&#40;&#41;]
        J[Playwright Tests<br/>.spec.ts files<br/>TypeScript]
        X[WebdriverIO Tests<br/>.spec.js files<br/>Mocha + expect]
    end

    subgraph "Execution & Analysis"
        K[Cypress Runner<br/>npx cypress run]
        L[Playwright Runner<br/>npx playwright test]
        M[AI Failure Analyzer<br/>--analyze flag<br/>Multi-Provider LLM]
        P[WebdriverIO Runner<br/>npx wdio run]
    end

    A --> D
    B --> D
    C --> D
    D --> E
    E --> F
    F --> D
    D --> G
    D --> H
    D --> W
    G --> I
    H --> J
    W --> X
    I --> K
    J --> L
    X --> P
    K --> M
    L --> M
    P --> M

    style D fill:#e3f2fd,color:#333333,stroke:#666666
    style E fill:#f3e5f5,color:#333333,stroke:#666666
    style F fill:#fff3e0,color:#333333,stroke:#666666
    style G fill:#c8e6c9,color:#333333,stroke:#666666
    style H fill:#ffcdd2,color:#333333,stroke:#666666
    style W fill:#ffe0b2,color:#333333,stroke:#666666
Loading
High-Level Components
  • CLI interface (qa_automation.py)
  • LangGraph workflow engine
  • LLM provider adapters
  • HTML analysis and fixture writer
  • Vector store pattern manager
  • Test file generation and optional execution
  • Observability layer (OpenTelemetry + Loki)

Workflow

flowchart TD
    A[Start: User Input<br/>Requirements + Framework] --> B[Step 1: Initialize Vector Store<br/>Load/Create Chroma DB<br/>Pattern Database]
    B --> C[Step 2: Fetch Test Data<br/>Analyze URL/HTML<br/>Extract Selectors<br/>Generate Fixtures]
    C --> D[Step 3: Search Similar Patterns<br/>Query Vector Store<br/>Find Matching Test Patterns<br/>From Past Generations]
    D --> E[Step 4: Generate Tests<br/>Use AI + Patterns<br/>Create Framework-Specific Code<br/>Cypress, Playwright, or WebdriverIO]
    E --> F[Step 5: Run Tests<br/>Execute via Framework Runner<br/>Optional --run flag]
    F --> G[End: Tests Executed<br/>Ready for CI/CD]

    style A fill:#e1f5fe,color:#333333,stroke:#666666
    style B fill:#fff3e0,color:#333333,stroke:#666666
    style C fill:#c8e6c9,color:#333333,stroke:#666666
    style D fill:#ffcdd2,color:#333333,stroke:#666666
    style E fill:#f3e5f5,color:#333333,stroke:#666666
    style F fill:#e8f5e8,color:#333333,stroke:#666666
    style G fill:#f3e5f5,color:#333333,stroke:#666666
Loading

Generation follows a deterministic five-step flow:

Step Name Description
1 Initialize Vector Store Load or create the Chroma pattern database
2 Fetch Test Data Analyze URL/HTML, extract selectors, generate fixtures
3 Search Similar Patterns Query vector store for matching historical patterns
4 Generate Tests Use AI + patterns to create framework-specific code
5 Run Tests Optionally execute via framework runner (--run)

Technology Stack

Layer Technology
Orchestration Python CLI orchestration
Workflow LangChain + LangGraph
Vector Store ChromaDB vector store
LLM Backends OpenAI / Anthropic / Google
Test Runners Cypress, Playwright, and WebdriverIO runners
Observability OpenTelemetry SDK and OTLP exporter
Logging Loki logging handler (optional)

Repository Structure

View repository tree
ai-natural-language-tests/
|-- cypress/
|   |-- e2e/
|   |   |-- generated/
|   |   `-- prompt-powered/
|   `-- fixtures/
|-- tests/
|   `-- generated/
|-- webdriverio/
|   `-- tests/
|       `-- generated/
|-- prompt_specs/
|-- vector_db/
|-- qa_automation.py
|-- cypress.config.js
|-- playwright.config.ts
|-- wdio.conf.js
|-- package.json
|-- requirements.txt
|-- Dockerfile
|-- docker-compose.yml
`-- README.md

Prerequisites

Requirement Version / Notes
Python 3.10+
Node.js 22+
npm latest
Git latest
Playwright browsers npx playwright install chromium

Installation

Local Setup
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
pip install -r requirements.txt
npm ci
npx playwright install chromium

Create .env:

OPENAI_API_KEY=your_key

Optional: GitAgent (Repo-Specific)

This repository includes a targeted gitagent setup for its QA automation workflow:

  • agent.yaml (manifest)
  • SOUL.md and RULES.md (behavior and constraints)
  • knowledge/ (framework and repo references)

In short: agent.yaml defines the repo agent, SOUL.md and RULES.md define how it should behave, and knowledge/ gives it project-specific framework guidance.

Quick commands:

npm run gitagent:validate
npm run gitagent:info
npm run gitagent:export
Docker Setup
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
docker compose build

Docker Compose loads .env and now explicitly forwards observability variables for Tempo and Loki to the container runtime.

Run in container:

docker compose run --rm test-generator "Test login" --url https://the-internet.herokuapp.com/login

Run with observability enabled:

docker compose run --rm test-generator \
  "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run

GitHub Registry (GHCR)

Pre-built Docker images are published to GitHub Container Registry. No local clone or build required.

Without GHCR With GHCR
Clone → install → build → run docker run — done
Each user builds their own image One image built once, shared everywhere
"Works on my machine" problems Identical environment for every user

Pull and run

docker pull ghcr.io/aiqualitylab/ai-natural-language-tests:latest

docker run --rm \
  -e OPENAI_API_KEY=your_key \
  ghcr.io/aiqualitylab/ai-natural-language-tests:latest \
  "Test login" --url https://the-internet.herokuapp.com/login

Image tags

Tag Use case
latest Always the most recently published version — use for quick runs
v4.0.0 Pinned to a specific release — use in CI/CD for reproducibility

For publishing and release management, see CONTRIBUTING.md.

Configuration

Core API Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_key
OpenTelemetry (Grafana Tempo)
OTEL_PROVIDER=grafana
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-eu-north-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instance_id:api_token)>
Loki Logging (Optional)
GRAFANA_LOKI_URL=https://logs-prod-eu-north-0.grafana.net
GRAFANA_INSTANCE_ID=<instance_id>
GRAFANA_API_TOKEN=<logs_write_token>

Usage

Quick Reference

Mode Command
Cypress (default) python qa_automation.py "requirement" --url <url>
Playwright python qa_automation.py "requirement" --url <url> --framework playwright
WebdriverIO python qa_automation.py "requirement" --url <url> --framework webdriverio
Prompt-powered Cypress python qa_automation.py "requirement" --url <url> --use-prompt
Generate + Execute python qa_automation.py "requirement" --url <url> --run
Failure Analysis python qa_automation.py --analyze "error message"
Pattern Inventory python qa_automation.py --list-patterns

Natural Language Prompt Examples

What you type What AI generates
"Test login with valid credentials" Login form fill + submit + success assertion
"Test login fails with wrong password" Negative test with error message assertion
"Test contact form submission" Form field detection + submit + confirmation
"Test search returns results" Search input + trigger + results count assertion
"Test signup with missing fields" Validation error coverage for required fields
"Test logout clears session" Post-login logout + redirect assertion

Tip

Writing effective AI requirements

  • Be specific about the action: "Test login" vs "Test login with valid credentials and verify dashboard loads"
  • Mention the expected outcome when it matters: "...and verify error message appears"
  • Use --url to give the AI real page context — it reads the HTML and picks the right selectors automatically
  • Chain multiple requirements in one run: "Test login" "Test logout" --url <url>

Generate Cypress Test

Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login

Generate Playwright Test

Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright

Generate WebdriverIO Test

Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework webdriverio

Prompt-Powered Cypress Mode

Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --use-prompt

Generate and Execute

Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --run

Failure Analysis

Show commands
python qa_automation.py --analyze "CypressError: Element not found"
python qa_automation.py --analyze -f error.log

Note

The AI failure analyzer returns a structured diagnosis:

Field Description
CATEGORY Error type: SELECTOR, TIMEOUT, ASSERTION, NETWORK, etc.
REASON Root cause explanation in plain English
FIX Suggested code change or configuration fix

Pattern Inventory

Show command
python qa_automation.py --list-patterns

CI/CD Integration

flowchart TD
    A[Code Changes<br/>Pushed to Repo] --> B[CI/CD Pipeline<br/>Triggers]
    B --> C[Install Dependencies<br/>pip install -r requirements.txt<br/>npm install]
    C --> D[Generate Tests<br/>python qa_automation.py<br/>--url or --data]
    D --> E[Run Tests<br/>npx cypress run<br/>npx playwright test<br/>npx wdio run]
    E --> F{Tests Pass?}
    F -->|Yes| G[Deploy Application<br/>Success]
    F -->|No| H[AI Failure Analysis<br/>--analyze in pipeline]
    H --> I[Auto-Fix & Regenerate<br/>If possible]
    I --> E
    H --> J[Notify Developers<br/>Manual intervention]

    style A fill:#e1f5fe,color:#333333,stroke:#666666
    style B fill:#fff3e0,color:#333333,stroke:#666666
    style C fill:#c8e6c9,color:#333333,stroke:#666666
    style D fill:#ffcdd2,color:#333333,stroke:#666666
    style E fill:#f3e5f5,color:#333333,stroke:#666666
    style G fill:#e8f5e8,color:#333333,stroke:#666666
    style J fill:#ffebee,color:#333333,stroke:#666666
Loading

Recommended pipeline stages:

Stage Action
1 Install Python and Node dependencies
2 Validate environment variables and secrets injection
3 Generate tests from requirements
4 Execute generated tests
5 Publish artifacts and reports
6 Export telemetry to observability stack

Security and Compliance Guidance

Important

  • Store secrets only in secure secret managers (never commit .env).
  • Use scoped API tokens with least-privilege access.
  • Rotate provider keys and Grafana tokens on a fixed cadence.
  • Keep generated tests and reports free of sensitive production data.
  • Apply repository protection rules and mandatory CI checks.

Troubleshooting

Warning

Traces Not Visible in Grafana Tempo

  • Verify OTLP endpoint region and datasource selection.
  • Verify Authorization=Basic <base64(instance_id:api_token)> format.
  • Query with:
{resource.service.name="ai-natural-language-tests"}

Note

Loki Authentication Errors

  • Ensure token has logs:write scope.
  • Confirm instance ID and logs endpoint match the same Grafana stack.

Tip

Docker Observability Validation

  • Confirm .env includes OTLP and Loki keys before docker compose run.
  • Use docker compose config to verify environment interpolation.
  • In Grafana Explore, query Tempo with service.name="ai-natural-language-tests".
  • In Grafana Loki, query labels: {service_name="ai-natural-language-tests"}.

Tip

Switching to Headed Mode for Debugging

Tests run headless by default. To debug interactively, switch your framework config:

Cypress:

  • Edit cypress.config.js and add headed: true after browser: 'chrome'
  • Or run: npx cypress run --headed --spec 'cypress/e2e/generated/*.cy.js'

Playwright:

  • Edit playwright.config.ts and change headless: trueheadless: false
  • Or run: npx playwright test --headed tests/generated/

WebdriverIO:

  • Edit wdio.conf.js and comment out '--headless=new' from the args array

Docker Headed Mode (with X11 forwarding):

docker build --target debug -t ai-tests:debug .
docker run -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix ai-tests:debug
  • Optional: mainly for Linux visual debugging.
  • Retry with generated single-spec command from logs.

Changelog

Release notes are maintained in CHANGELOG.md using a standard Keep a Changelog format.


Built with AI. Tested by AI. Ready for CI.

© 2026 AI Quality Lab / Sreekanth Harigovindan. tests.aiqualitylab.org
tests.aiqualitylab.org

Documentation licensed under CC BY 4.0.