Ship fast.
Don't break things.

Lastest is a visual testing tool for solo founders who ship fast. An AI agent records your tests, your dashboard organizes them, CI runs them — and every release diffs before against after so only the things you intended to change actually change.

Free foreverSelf-host or cloudDocker · 5 min setup
30s
Full suite. A sprint of testing, every commit.
3 engines
Pixel · Structural · Perceptual diffs
0 regressions
Reach prod. Visual diffs block merges automatically.
Why Lastest

Speed without surprises.

Manual visual testing eats sprints. Skipping it ships bugs. Lastest gives you both: testing fast enough that you don't notice it, thorough enough that you don't worry.

Record once at staging, replay forever in CI · three diff engines turn intent into a verdict.

Don't take time away from building.

Point Lastest at your staging app. The agent maps every flow and records visual tests in seconds — work that took your QA process hours of clicking and writing selectors. After the first record, every run is a deterministic replay. Run on every commit. Run while you sleep.

Relax with every release.

Three diff engines — pixel, structural, perceptual — compare every page before and after. If a button moved, a layout broke, a color drifted, or the checkout silently regressed on iOS Safari, you see it before merge. Only the changes you intended ship to users.

“Your QA team spends entire sprints writing tests that break on the next redesign. The alternative charges you per screenshot. Neither is acceptable.”

01 — Painfully slow

Manual testing devours your roadmap

Writing visual regression tests by hand takes hours, days, sometimes entire sprints. Maintaining selectors and updating baselines after every UI change adds even more drag. Your QA team becomes a bottleneck instead of a safeguard.

40+ hours per sprint

02 — Blind automation

Agentic tools with no human oversight

Fully autonomous testing agents generate and run tests with zero human visibility. No review step. No approval workflow. No dashboard to inspect what was tested or why. When they get it wrong, you find out in production.

03 — Endlessly expensive

AI token costs that never stop compounding

Other AI testing tools burn tokens on every single test run. The more you test, the more you pay. Your CI bill grows linearly with your test suite, turning testing frequency into a financial decision.

$199 – $5,000+/mo in tokens + SaaS

The model

Generate once.
Replay forever.
Pay nothing.

Agentic AI with human-in-the-loop. The AI explores and generates tests. You review, approve, and manage them from a purpose-built oversight dashboard. After the first AI-powered generation, every subsequent run is a deterministic replay — zero tokens, zero cost.

One AI-powered generation, one human approval, infinite deterministic replays.
01

AI explores your entire application

Point Lastest at your running app. The agentic Play Agent autonomously navigates every flow, maps every state, and builds a comprehensive visual baseline — work that would take a QA engineer days, finished in seconds.

02

You review from the oversight dashboard

Unlike black-box agentic tools, Lastest gives you a management interface to inspect, approve, modify, and reject generated tests. Human judgment where it matters. AI speed everywhere else.

03

Replays run forever at zero token cost

The first generation uses AI tokens. Every run after that is a deterministic replay — no AI calls, no token costs. Three diff engines — pixel, structural, and perceptual — catch regressions that single-engine tools miss. Self-hosted on your infrastructure or run on Lastest Cloud.

The final frontier

Generation is cheap. Replay is cheap.
Validation has to answer two questions.

Today the agent ships features in an afternoon — and shipping is still slow, because the bottleneck moved. Validation is the new frontier, and validation is not one check, it’s two: nothing that was working is now broken, and what was actually built matches what was expected. The agent can fail either independently. You need both checks, in the same loop, on every change.

The Generate → Validate (frontier, highlighted) → Replay pipeline, with a reviewer toolbar, baseline-vs-current diff preview, audit-trail row, three diff-engine indicators, and a bottom strip comparing what type checks, unit tests, and scripted E2E miss versus what visual + HITL validation catches.
Generate is solved. Replay is solved. Validation is the only stage where a human still belongs in the loop — once, at the seam.
Q1 · regression-safety

Nothing is broken

Diff the current render against the previously approved baseline. Catches the layout regression, the contrast collapse, the silently dead button — every category of bug that ends in a user noticing, without you having to author the assertion in advance.

Q2 · intent-conformance

Built matches what was expected

Diff the current render against the spec, ticket, or uploaded design. Catches the agent that ships a clean implementation of the wrong feature — green tests, broken intent. The brief is the source of truth, not the agent’s read of the brief.

How · 03

Three engines, one verdict

Pixel screams on noise. Structural screams on layout drift. Perceptual screams only on what a user would actually see. Three engines, run against both references — baseline and spec — so the same diff infra answers both questions.

How · 04

Human at the seam, with receipts

A reviewer approves the new baseline once, at the moment new behavior becomes canonical. Who, when, which engine, signed. Trust requires receipts — the kind LLM-as-judge cannot produce.

How it works

Record. Organize. Run. Compare.

Four steps, one dashboard. From a fresh checkout to full visual coverage in under five minutes.

01Step

Record

Point at your staging URL. The agent navigates flows and snapshots every key state. Or hit “record” and click through manually.

02Step

Organize

Tests group by area in the dashboard — checkout, PDP, nav. Approve baselines, edit names, retire dead routes. You own the test code.

03Step

Run in CI

One step in GitHub Actions, GitLab, CircleCI. Replays cost nothing. Run on every push, every PR, every nightly — or trigger from the cloud dashboard.

04Step

Compare

Side-by-side before / after with the diff highlighted in red. Approve the change, or reject the regression. Merge with the dashboard green.

Compare

Before. After. Only what changed.

Every visual change surfaces side-by-side. Approve the intentional ones, reject the regressions. Nothing slips.

Price stepper alignment on iOS Safaritests/checkout/price-stepper.spec.ts+2.1% diff · review
Three lenses, one verdict — pixelmatch flags pixel noise, SSIM catches layout breaks, Butteraugli matches human perception.

Everything ships in the box. No tier. No upgrade prompt.

Four capability families, every entry on every plan.

Intelligence

AI test generation

Claude writes resilient Playwright code with a 7-layer selector fallback (data-testid → id → role → aria-label → text → CSS → OCR). Survives refactors that break hand-written tests.

Modes

Three ways to work

AI-Free recording for air-gapped teams. AI-Assisted with human review on every change. Full Autonomous via the Play Agent. Pick a mode per test — or per team.

Economics

Zero-token replays

AI runs only when you create or fix a test. Every replay is pure Playwright execution. Test a thousand times a day for $0 in tokens. Self-hosted screenshots are unlimited regardless of volume.

Automation

Play Agent — 11-step pipeline

Specialized sub-agents (Orchestrator, Planner, Scout, Diver, Generator, Healer) plan, generate, run, and fix tests autonomously. Pause, approve, or skip any step. Resumes where it left off.

Precision

3 diff engines

pixelmatch (pixel-perfect), SSIM (structural similarity), and Butteraugli (human-perception-aligned). Choose the trade-off per test — or run all three for a benchmark verdict.

Stabilization

12 flaky-test guards

Text-region-aware OCR diffing, timestamp freezing, network idle wait, DOM stability detection, font loading wait, page-shift detection, burst capture, auto-mask of dynamic content. Cross-OS consistency baked in.

Ownership

Self-hosted & open source

FSL-1.1-ALv2 license. Your infra, your network boundary. Screenshots never leave your servers. Full source on GitHub. Or run it on Lastest Cloud and skip ops entirely.

Integration

CI/CD native + Smart Run

Reusable GitHub Action, GitLab MR comments (self-hosted GitLab supported), webhook triggers, scheduled cron runs. Smart Run analyzes git diffs and runs only the tests your change touched.

Operations

Embedded Browser pool

Containerized Chromium with live JPEG streaming back to the dashboard. No local Playwright install. Dynamic provisioning into k3d locally or your cluster in production. Distributed Remote Runners for CI fan-out.

Triage

AI failure classification

Every failure is auto-classified as real regression, flaky test, environment issue, or test maintenance — with confidence scores and reasoning. Stop drowning in red checkmarks.

Accessibility

WCAG 2.2 AA scoring

Axe-core checks every screenshot. 0–100 score with severity-weighted deductions (critical / serious / moderate / minor) and trend sparklines per build. A11y becomes part of your regression pipeline.

Providers

Bring your own AI

Claude CLI, OpenRouter, Anthropic API, OpenAI, or Ollama (local models). Use a separate provider for diff analysis than for test generation. No lock-in, no surprise pricing.

Spec-driven

OpenAPI & user stories → tests

Drop in an OpenAPI spec, a markdown PRD, or a list of user stories. AI extracts the cases and generates tests automatically. Route Discovery scans your source for paths the spec missed.

Extensibility

MCP server — 29 AI tools

Model Context Protocol server (npx @lastest/mcp-server) exposes 29 tools that let external AI agents run tests, review diffs, approve baselines, and create or heal tests programmatically.

Versioning

Full test history & composition

Every edit, AI fix, or restore is versioned with a reason. Compose builds by cherry-picking specific test versions. Branch baselines fork on PR open and merge back on PR merge.

Engagement

Beat the Bot

Optional gamified leaderboard where humans compete with AI bots to author and triage tests. Points for catching regressions and clearing review todos. Seasonal play, achievements, Bug Blitz events.

Full Platform

A comprehensive inventory of capabilities

Lastest consolidates test authoring, execution, visual comparison, and team review into a single self-hosted platform.

Three ways to author tests

Manual recording

Point-and-click browser recording via Playwright. No code, no AI, no API keys. Generates deterministic scripts you can edit by hand.

No AI needed
or

AI-assisted

AI generates, fixes, or enhances tests — but every change requires your review and approval. Import URLs, OpenAPI specs, or user stories.

You review everything
or

Autonomous Play Agent

11-step pipeline driven by specialized sub-agents (Orchestrator, Planner, Scout, Diver, Generator, Healer): scan routes, plan areas, generate, run, heal failures up to 3 attempts, re-run, report. Pauses only when stuck — resume from where it left off.

Full automation

Two ways to execute

Remote runners v2

Distributed execution via WebSocket. Install @lastest/runner from npm, connect with a token, and test across OS and browsers. Concurrent multi-task support, SHA256 code integrity, DB-backed command queue, heartbeat polling, per-test abort.

Multi-machine
or

Embedded browser

Containerized Chromium with CDP live streaming. No local Playwright needed. JPEG streaming with configurable quality/framerate, WebSocket auth, concurrent contexts. Supports recording and execution.

Zero install

The testing pipeline

Record

Capture interactions or let AI explore

Generate

AI creates tests (one-time tokens)

Run

Deterministic replay, zero tokens

Compare

Three diff engines analyze changes

Review

Humans approve before deploy

Core platform

Multi-step screenshots

Capture multiple labeled screenshots per test for multi-page flow coverage.

Approval workflow

Review visual diffs before they become baselines. Structured review process.

Git-aware builds

Test per branch and commit. Compare across PRs. Track coverage over time.

Branch comparison

Side-by-side branch-to-branch test result diffing.

Test suites

Ordered suites for structured, sequential execution.

Test versioning

Full history with change reasons: manual, AI fix, AI enhance, restore.

Test composition

Cherry-pick tests and pin specific versions per build.

Functional hierarchy

Nested parent/child areas with drag-and-drop reordering.

Debug mode

Step-by-step execution with live feedback per action.

8 testing templates

Presets for SaaS, Marketing, Canvas, E-commerce, Docs, Mobile, SPA, CMS.

Planned screenshots

Compare against Figma exports with planned-vs-actual tracking.

Setup & teardown

Multi-step orchestration with per-test overrides. Playwright, API, and script types.

Dashboard health score

Weighted 0–100 score combining pass rate (60%), non-flaky rate (20%), route coverage (20%). Sparkline trend tracking.

Batch approval

Select and approve multiple visual diffs in a single action. Accelerates review for large builds.

Comparison runs

A/B testing mode: compare baseline vs. current branch builds side-by-side with per-test diff detail.

Scheduled runs

Cron-based automation with preset schedules or custom cron expressions. Auto-disables after consecutive failures. Optional branch targeting.

Guided onboarding

8-step setup guide for new users: connect Git, configure AI, scan routes, record first test, run, set baselines, re-run, check results. Auto-detects completion.

In-app bug reports

Auto-captures URL, viewport, console errors, failed requests, breadcrumbs, and screenshot attachment. Files GitHub issues directly.

Review todos

Branch-specific actionable items created when reviewers flag a diff. Track review feedback as todos tied to specific builds and tests.

Early adopter mode

Team-level toggle to access experimental features before general release. Opt in per-workspace.

AI capabilities

6 AI providers

Claude CLI, OpenRouter, Claude Agent SDK, direct Anthropic API, OpenAI, or Ollama (local, free). Separate provider for diff analysis.

AI test generation

Multi-selector fallback: data-testid, id, role, aria-label, text, CSS, OCR.

AI diff analysis

Classifies changes with confidence scores. Separate provider from generation.

AI test fixing

Auto-proposes fixes. Review before accepting, or let the agent auto-fix.

Spec-driven testing

Import OpenAPI specs, user stories, or markdown. AI extracts and generates tests.

Route discovery

AI scans source code to discover routes and suggest tests.

AI prompt audit trail

Full logging of all AI requests and responses.

AI confidence scores

Every AI recommendation includes a numeric confidence score. Filter and prioritize reviews by certainty.

MCP server

29 Model Context Protocol tools for external AI agents. Programmatic access to test creation, execution, and review via npx @lastest/mcp-server.

Agent monitoring

Real-time SSE activity feed tracking Play Agent sessions step-by-step. Session history, pause/resume, and active/paused/completed status from the dashboard.

Codebase intelligence

Auto-detects framework, CSS, auth, state management, API layer, and key dependencies. 100+ package database maps stacks to testing recommendations to enrich AI prompts.

AI failure triage

Automatic classification of test failures into real regression, flaky test, environment issue, or test maintenance — with confidence scores and reasoning.

MCP selector validation

Real-time selector validation against live pages via Claude MCP. Catches brittle selectors during generation, not after they break.

Visual comparison

3 diff engines

Pixelmatch, SSIM, and Butteraugli. Speed vs. perceptual accuracy trade-off.

Text-region diffing

OCR-based two-pass comparison with separate text/non-text thresholds.

Page shift detection

Detects inserted/deleted rows with fuzzy matching, not full-page flags.

Ignore regions

Mask dynamic areas with solid-color or placeholder-text styles.

Configurable sensitivity

Pixel and percentage thresholds for unchanged/flaky/changed classification.

SHA256 fast-path

Hash match = instant pass. No pixel comparison needed on every run.

Diff engine benchmarks

Built-in benchmark framework comparing all three engines across synthetic test scenarios with timing and accuracy metrics.

Stabilization — 12 features for flaky test prevention

Timestamp freezingRandom value seedingCross-OS consistencyBurst captureAuto-mask dynamic contentNetwork idle waitingDOM stability detectionThird-party blockingFont loading waitLoading indicator hidingAnimation freezingAuto-detect capabilities

Integrations

GitHubOAuth, PR comments, webhooks, Action
GitLabOAuth, MR comments, self-hosted
Google SheetsTest data via OAuth
NotificationsSlack, Discord, custom webhooks
VSCode & APIREST + SSE at /api/v1/
AccessibilityWCAG 2.2 AA scoring (0–100), axe-core on every capture
Network & consoleRequest + error capture per run
CLI runnerpnpm test:visual for any CI

Analytics & insights

Impact timeline

Track visual regression impact across PRs and commits. Author contribution analysis and trend visualization.

WCAG 2.2 AA compliance

Automated 0–100 accessibility scoring with severity-weighted violations. Per-test violation detail across builds.

Infrastructure & team

Smart Run

Git-diff analysis runs only affected tests.

Parallel execution

Configurable max parallel tests for local and remote.

Branch baselines

Fork/merge baselines per branch. SHA256 carry-forward matching.

Docker deployment

Multi-stage build, persistent volumes, health checks, non-root.

App state inspection

Access window.__APP_STATE__, Redux stores during tests.

Multi-tenant teams

Slug-based workspaces with email invitations.

Role-based access

Owner, admin, member, and viewer permissions.

Multiple auth

Email/password (Argon2), GitHub, GitLab, Google OAuth.

Storage states

Save and restore browser storage snapshots — localStorage, sessionStorage, and cookies — for authenticated flows.

Assertion tracking

Parsed expect() calls with expected vs. actual values, error messages, and code line references per test.

Selector stats

Track selector success rates and response times. Identify brittle selectors before they cause flaky tests.

CLI test runner

pnpm test:visual for GitHub Actions and CI pipelines. Auto-captures GITHUB_HEAD_REF, GITHUB_REF_NAME, and GITHUB_SHA for git tracking.

Reusable GitHub Action

las-team/lastest/action@main — zero-config CI/CD integration. No local Playwright install; tests run on your Lastest server via a remote runner. Results posted to Actions step summary.

Webhook triggers

Builds fire on PR opened/updated, push to monitored branches, or manual click. Smart Run filters to only affected tests.

Background jobs

Queue tracking for long-running operations — AI route scans, agent sessions, and builds — with live status and history.

The comparison the industry prefers you don't see.

ToolPrice / moHuman oversightAI testsZero-token replaysHealth analyticsA11y scoringSelf-hosted
LastestFree (self-host) · Cloud free tier
Percy$199+manual onlyn/a
Applitools$699+limitedpartiallimitedpartial
Chromatic$179+manual onlyn/a
Pure agentic toolsper-tokenvaries
“Pay for AI once to generate your tests. Then replay them ten thousand times for free. That is how testing economics should work.”

What people are saying

Honest words from the community.

Unedited reactions from Reddit threads, users, and early adopters — including the parts where they told us what to fix.

  • Reddit
    OPu/Quirky_Research_949
    Great idea, and I like that it’s open source. That said, I found the design a bit confusing. It took me more than a couple of minutes to understand what the project does. You might want to rethink the design to make the value clearer.
  • Reddit
    OP · Top 1% Commenteru/hiten1818726363
    Thats just crazy ui bro. And concept is good too. It think you should add some video representation on how it works.
  • User
    Tested byLS
    Overall, the app feels pretty polished and thoughtfully designed. The onboarding was easy to follow, and I liked that the recorder automatically replayed the test after saving — that made the flow feel smooth and fast. The tooltips during recording were also genuinely helpful without being annoying. The biggest friction for me was needing to connect GitHub before getting much real value from the product. As someone just wanting to quickly test a URL, it felt a bit limiting because a lot of features seem blocked until a repo is connected. The recording experience itself worked well for the most part, although I noticed some clicks inside the embedded browser didn’t always trigger navigation correctly. Other than that, the UI is clean, defaults make sense, and the overall experience feels solid.
    Wed, May 6, 2026 · 08:58 AM

Pricing

Free forever to self-host. Platinum Edition from $299/mo.

Self-hosting is free for everyone, forever — bring your own AI via OpenRouter, Anthropic, or local Ollama. Try the cloud free to evaluate. Platinum Edition is deeply discounted during our early adopter window, and the price you sign up at is locked in for the lifetime of your subscription.

Self-host

Free forever

$0/ forever

Run Lastest on your own hardware. Unlimited everything, no seat limits. FSL-1.1-ALv2.

  • Unlimited tests, replays, browsers, seats, projects
  • Bring your own AI — OpenRouter, Anthropic, Ollama
  • Full human-in-the-loop dashboard + 3 diff engines
  • CI/CD integration (GitHub, GitLab, Bitbucket)
  • Community support (GitHub, Discord)
Star on GitHub

Cloud free

Free to try

$0/ evaluation

Hosted by us. No install, no credit card. For evaluation only.

  • 100 runs per month
  • 1 project, 3 seats, 1 concurrent runner
  • Shared cloud infrastructure
  • Upgrade anytime — tests carry over

No data retention guarantee — builds may be pruned.

No functional consistency — we ship daily/weekly.

Not for production workloads.

Try cloud free
“Percy Pro: $449/mo, 25k screenshots. Lastest Self-Host: $0, unlimited replays. That is how testing economics should work.”

Two paths, zero friction

Pick your runtime. Tested forever either way.

Run on your laptop, your VPS, or skip the install entirely and use Lastest Cloud. Same engine, same dashboard, same diff. The first run uses AI tokens; every replay after that is free, on either path.

Cloud runNo install
$open https://app.lastest.cloud
# sign in · paste your staging URL · agent records
→ 24 visual tests recorded · 1m 42s
Self-hostFSL-1.1-ALv2
$git clone github.com/las-team/lastest
$cd lastest && docker-compose up -d
$lastest record --url https://staging.acme.io

From the blog

Recent writing

Field notes on AI test generation, diff engines, and self-hosting visual regression at every team size.

Culture

Gamify QA Testing: How to Turn Boring Test Sessions into a Sprint Challenge (and Beat the Bot)

Manual testing is a grind, and "please do more exploratory testing" is the most-ignored line in any standup. Add a leaderboard, a weekly bot to beat, a karma penalty for spam reports, and a dinner on the line — testing stops being chore-work and starts being competitive sport. Here's the points system, the anti-spam guardrails, and the Slack ritual that makes it stick.

Read post

Enterprise

AI speed meets enterprise governance

Lastest is free to self-host for every team. The human-in-the-loop dashboard already provides the oversight layer enterprises require — approval workflows, audit trails, and full test visibility. For organizations needing SLAs, managed infrastructure, SSO integration, or dedicated engineering support, we offer enterprise agreements. No minimum commitment. No sales presentations.

SSO at the edge, RBAC and audit log in the middle, governance outputs out the other side — all inside your network boundary.

SLA-backed uptime guarantees

SSO / SAML integration

Managed cloud deployment

Dedicated support channel

On-premise installation support

Priority feature development