Describe tests in plain English. AI writes the code.
Enterprise-grade platform to generate and execute Cypress, Playwright, and WebdriverIO end-to-end tests from natural language requirements.
This project combines LLM-driven generation, LangGraph workflow orchestration, and vector-based pattern learning to improve test authoring speed while maintaining repeatability and CI/CD readiness.
The platform translates natural language requirements into executable E2E tests for:
| Framework | Output | Style |
|---|---|---|
| Cypress | .cy.js |
Traditional & prompt-powered |
| Playwright | .spec.ts |
TypeScript async/await |
| WebdriverIO | .spec.js |
Mocha runner with Jest-like expect |
It supports both local engineering workflows and automated pipeline execution. The generator uses contextual data from live HTML analysis and historical pattern matching to produce stable, maintainable test assets.
Note
- Reduces manual test authoring effort and onboarding time.
- Standardizes generated test structure across teams.
- Improves reuse through vector-based pattern memory.
- Supports enterprise delivery with CI/CD and Docker workflows.
- Enables faster root-cause diagnosis using AI-assisted failure analysis.
| Capability | Detail |
|---|---|
| Test Generation | Natural language to executable E2E test generation |
| Orchestration | LangGraph-based multi-step orchestration |
| URL Analysis | Dynamic URL analysis and fixture generation |
| Pattern Memory | Pattern storage and semantic retrieval using ChromaDB |
| LLM Support | Multi-provider: OpenAI, Anthropic, Google |
| Cypress Modes | Traditional mode and Cypress prompt-powered mode |
| Playwright | TypeScript generation |
| WebdriverIO | JavaScript .spec.js generation with Mocha and Chrome runner support |
| Execution | Optional immediate test execution after generation |
| Tracing | OpenTelemetry trace export to Grafana Tempo |
| Logging | Optional log shipping to Grafana Loki |
graph TB
subgraph "User Input"
A[Natural Language<br/>Requirements]
B[URL/HTML Data<br/>--url flag]
C[JSON Test Data<br/>--data flag]
end
subgraph "AI & Workflow Engine"
D[LangGraph Workflow<br/>5-Step Process]
E[Multi-Provider LLM<br/>OpenAI / Anthropic / Google]
F[Vector Store<br/>Pattern Learning<br/>Chroma DB]
end
subgraph "Framework Generation"
G{Cypress Framework}
H{Playwright Framework}
W{WebdriverIO Framework}
I[Cypress Tests<br/>.cy.js files<br/>Traditional & cy.prompt()]
J[Playwright Tests<br/>.spec.ts files<br/>TypeScript]
X[WebdriverIO Tests<br/>.spec.js files<br/>Mocha + expect]
end
subgraph "Execution & Analysis"
K[Cypress Runner<br/>npx cypress run]
L[Playwright Runner<br/>npx playwright test]
M[AI Failure Analyzer<br/>--analyze flag<br/>Multi-Provider LLM]
P[WebdriverIO Runner<br/>npx wdio run]
end
A --> D
B --> D
C --> D
D --> E
E --> F
F --> D
D --> G
D --> H
D --> W
G --> I
H --> J
W --> X
I --> K
J --> L
X --> P
K --> M
L --> M
P --> M
style D fill:#e3f2fd,color:#333333,stroke:#666666
style E fill:#f3e5f5,color:#333333,stroke:#666666
style F fill:#fff3e0,color:#333333,stroke:#666666
style G fill:#c8e6c9,color:#333333,stroke:#666666
style H fill:#ffcdd2,color:#333333,stroke:#666666
style W fill:#ffe0b2,color:#333333,stroke:#666666
High-Level Components
- CLI interface (
qa_automation.py) - LangGraph workflow engine
- LLM provider adapters
- HTML analysis and fixture writer
- Vector store pattern manager
- Test file generation and optional execution
- Observability layer (OpenTelemetry + Loki)
flowchart TD
A[Start: User Input<br/>Requirements + Framework] --> B[Step 1: Initialize Vector Store<br/>Load/Create Chroma DB<br/>Pattern Database]
B --> C[Step 2: Fetch Test Data<br/>Analyze URL/HTML<br/>Extract Selectors<br/>Generate Fixtures]
C --> D[Step 3: Search Similar Patterns<br/>Query Vector Store<br/>Find Matching Test Patterns<br/>From Past Generations]
D --> E[Step 4: Generate Tests<br/>Use AI + Patterns<br/>Create Framework-Specific Code<br/>Cypress, Playwright, or WebdriverIO]
E --> F[Step 5: Run Tests<br/>Execute via Framework Runner<br/>Optional --run flag]
F --> G[End: Tests Executed<br/>Ready for CI/CD]
style A fill:#e1f5fe,color:#333333,stroke:#666666
style B fill:#fff3e0,color:#333333,stroke:#666666
style C fill:#c8e6c9,color:#333333,stroke:#666666
style D fill:#ffcdd2,color:#333333,stroke:#666666
style E fill:#f3e5f5,color:#333333,stroke:#666666
style F fill:#e8f5e8,color:#333333,stroke:#666666
style G fill:#f3e5f5,color:#333333,stroke:#666666
Generation follows a deterministic five-step flow:
| Step | Name | Description |
|---|---|---|
| 1 | Initialize Vector Store | Load or create the Chroma pattern database |
| 2 | Fetch Test Data | Analyze URL/HTML, extract selectors, generate fixtures |
| 3 | Search Similar Patterns | Query vector store for matching historical patterns |
| 4 | Generate Tests | Use AI + patterns to create framework-specific code |
| 5 | Run Tests | Optionally execute via framework runner (--run) |
| Layer | Technology |
|---|---|
| Orchestration | Python CLI orchestration |
| Workflow | LangChain + LangGraph |
| Vector Store | ChromaDB vector store |
| LLM Backends | OpenAI / Anthropic / Google |
| Test Runners | Cypress, Playwright, and WebdriverIO runners |
| Observability | OpenTelemetry SDK and OTLP exporter |
| Logging | Loki logging handler (optional) |
View repository tree
ai-natural-language-tests/
|-- cypress/
| |-- e2e/
| | |-- generated/
| | `-- prompt-powered/
| `-- fixtures/
|-- tests/
| `-- generated/
|-- webdriverio/
| `-- tests/
| `-- generated/
|-- prompt_specs/
|-- vector_db/
|-- qa_automation.py
|-- cypress.config.js
|-- playwright.config.ts
|-- wdio.conf.js
|-- package.json
|-- requirements.txt
|-- Dockerfile
|-- docker-compose.yml
`-- README.md
| Requirement | Version / Notes |
|---|---|
| Python | 3.10+ |
| Node.js | 22+ |
| npm | latest |
| Git | latest |
| Playwright browsers | npx playwright install chromium |
Local Setup
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
pip install -r requirements.txt
npm ci
npx playwright install chromiumCreate .env:
OPENAI_API_KEY=your_keyThis repository includes a targeted gitagent setup for its QA automation workflow:
agent.yaml(manifest)SOUL.mdandRULES.md(behavior and constraints)knowledge/(framework and repo references)
In short: agent.yaml defines the repo agent, SOUL.md and RULES.md define how it should behave, and knowledge/ gives it project-specific framework guidance.
Quick commands:
npm run gitagent:validate
npm run gitagent:info
npm run gitagent:exportDocker Setup
git clone https://github.com/aiqualitylab/ai-natural-language-tests.git
cd ai-natural-language-tests
docker compose buildDocker Compose loads .env and now explicitly forwards observability variables for Tempo and Loki to the container runtime.
Run in container:
docker compose run --rm test-generator "Test login" --url https://the-internet.herokuapp.com/loginRun with observability enabled:
docker compose run --rm test-generator \
"Test login" --url https://the-internet.herokuapp.com/login --framework playwright --runPre-built Docker images are published to GitHub Container Registry. No local clone or build required.
| Without GHCR | With GHCR |
|---|---|
| Clone → install → build → run | docker run — done |
| Each user builds their own image | One image built once, shared everywhere |
| "Works on my machine" problems | Identical environment for every user |
docker pull ghcr.io/aiqualitylab/ai-natural-language-tests:latest
docker run --rm \
-e OPENAI_API_KEY=your_key \
ghcr.io/aiqualitylab/ai-natural-language-tests:latest \
"Test login" --url https://the-internet.herokuapp.com/login| Tag | Use case |
|---|---|
latest |
Always the most recently published version — use for quick runs |
v4.0.0 |
Pinned to a specific release — use in CI/CD for reproducibility |
For publishing and release management, see CONTRIBUTING.md.
Core API Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_keyOpenTelemetry (Grafana Tempo)
OTEL_PROVIDER=grafana
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod-eu-north-0.grafana.net/otlp
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64(instance_id:api_token)>Loki Logging (Optional)
GRAFANA_LOKI_URL=https://logs-prod-eu-north-0.grafana.net
GRAFANA_INSTANCE_ID=<instance_id>
GRAFANA_API_TOKEN=<logs_write_token>Quick Reference
| Mode | Command |
|---|---|
| Cypress (default) | python qa_automation.py "requirement" --url <url> |
| Playwright | python qa_automation.py "requirement" --url <url> --framework playwright |
| WebdriverIO | python qa_automation.py "requirement" --url <url> --framework webdriverio |
| Prompt-powered Cypress | python qa_automation.py "requirement" --url <url> --use-prompt |
| Generate + Execute | python qa_automation.py "requirement" --url <url> --run |
| Failure Analysis | python qa_automation.py --analyze "error message" |
| Pattern Inventory | python qa_automation.py --list-patterns |
Natural Language Prompt Examples
| What you type | What AI generates |
|---|---|
"Test login with valid credentials" |
Login form fill + submit + success assertion |
"Test login fails with wrong password" |
Negative test with error message assertion |
"Test contact form submission" |
Form field detection + submit + confirmation |
"Test search returns results" |
Search input + trigger + results count assertion |
"Test signup with missing fields" |
Validation error coverage for required fields |
"Test logout clears session" |
Post-login logout + redirect assertion |
Tip
Writing effective AI requirements
- Be specific about the action: "Test login" vs "Test login with valid credentials and verify dashboard loads"
- Mention the expected outcome when it matters: "...and verify error message appears"
- Use
--urlto give the AI real page context — it reads the HTML and picks the right selectors automatically - Chain multiple requirements in one run:
"Test login" "Test logout" --url <url>
Show command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/loginShow command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwrightShow command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework webdriverioShow command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --use-promptShow command
python qa_automation.py "Test login" --url https://the-internet.herokuapp.com/login --framework playwright --runShow commands
python qa_automation.py --analyze "CypressError: Element not found"
python qa_automation.py --analyze -f error.logNote
The AI failure analyzer returns a structured diagnosis:
| Field | Description |
|---|---|
CATEGORY |
Error type: SELECTOR, TIMEOUT, ASSERTION, NETWORK, etc. |
REASON |
Root cause explanation in plain English |
FIX |
Suggested code change or configuration fix |
Show command
python qa_automation.py --list-patternsflowchart TD
A[Code Changes<br/>Pushed to Repo] --> B[CI/CD Pipeline<br/>Triggers]
B --> C[Install Dependencies<br/>pip install -r requirements.txt<br/>npm install]
C --> D[Generate Tests<br/>python qa_automation.py<br/>--url or --data]
D --> E[Run Tests<br/>npx cypress run<br/>npx playwright test<br/>npx wdio run]
E --> F{Tests Pass?}
F -->|Yes| G[Deploy Application<br/>Success]
F -->|No| H[AI Failure Analysis<br/>--analyze in pipeline]
H --> I[Auto-Fix & Regenerate<br/>If possible]
I --> E
H --> J[Notify Developers<br/>Manual intervention]
style A fill:#e1f5fe,color:#333333,stroke:#666666
style B fill:#fff3e0,color:#333333,stroke:#666666
style C fill:#c8e6c9,color:#333333,stroke:#666666
style D fill:#ffcdd2,color:#333333,stroke:#666666
style E fill:#f3e5f5,color:#333333,stroke:#666666
style G fill:#e8f5e8,color:#333333,stroke:#666666
style J fill:#ffebee,color:#333333,stroke:#666666
Recommended pipeline stages:
| Stage | Action |
|---|---|
| 1 | Install Python and Node dependencies |
| 2 | Validate environment variables and secrets injection |
| 3 | Generate tests from requirements |
| 4 | Execute generated tests |
| 5 | Publish artifacts and reports |
| 6 | Export telemetry to observability stack |
Important
- Store secrets only in secure secret managers (never commit
.env). - Use scoped API tokens with least-privilege access.
- Rotate provider keys and Grafana tokens on a fixed cadence.
- Keep generated tests and reports free of sensitive production data.
- Apply repository protection rules and mandatory CI checks.
Warning
Traces Not Visible in Grafana Tempo
- Verify OTLP endpoint region and datasource selection.
- Verify
Authorization=Basic <base64(instance_id:api_token)>format. - Query with:
{resource.service.name="ai-natural-language-tests"}
Note
Loki Authentication Errors
- Ensure token has
logs:writescope. - Confirm instance ID and logs endpoint match the same Grafana stack.
Tip
Docker Observability Validation
- Confirm
.envincludes OTLP and Loki keys beforedocker compose run. - Use
docker compose configto verify environment interpolation. - In Grafana Explore, query Tempo with
service.name="ai-natural-language-tests". - In Grafana Loki, query labels:
{service_name="ai-natural-language-tests"}.
Tip
Switching to Headed Mode for Debugging
Tests run headless by default. To debug interactively, switch your framework config:
Cypress:
- Edit
cypress.config.jsand addheaded: trueafterbrowser: 'chrome' - Or run:
npx cypress run --headed --spec 'cypress/e2e/generated/*.cy.js'
Playwright:
- Edit
playwright.config.tsand changeheadless: true→headless: false - Or run:
npx playwright test --headed tests/generated/
WebdriverIO:
- Edit
wdio.conf.jsand comment out'--headless=new'from the args array
Docker Headed Mode (with X11 forwarding):
docker build --target debug -t ai-tests:debug .
docker run -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix ai-tests:debug- Optional: mainly for Linux visual debugging.
- Retry with generated single-spec command from logs.
Release notes are maintained in CHANGELOG.md using a standard Keep a Changelog format.
Built with AI. Tested by AI. Ready for CI.
| © 2026 AI Quality Lab / Sreekanth Harigovindan. | ![]() tests.aiqualitylab.org |
Documentation licensed under CC BY 4.0.
