Ship fast.
Don't break things.
Lastest is a visual testing tool for solo founders who ship fast. An AI agent records your tests, your dashboard organizes them, CI runs them — and every release diffs before against after so only the things you intended to change actually change.
Speed without surprises.
Manual visual testing eats sprints. Skipping it ships bugs. Lastest gives you both: testing fast enough that you don't notice it, thorough enough that you don't worry.
Don't take time away from building.
Point Lastest at your staging app. The agent maps every flow and records visual tests in seconds — work that took your QA process hours of clicking and writing selectors. After the first record, every run is a deterministic replay. Run on every commit. Run while you sleep.
Relax with every release.
Three diff engines — pixel, structural, perceptual — compare every page before and after. If a button moved, a layout broke, a color drifted, or the checkout silently regressed on iOS Safari, you see it before merge. Only the changes you intended ship to users.
The Industry Problem
“Your QA team spends entire sprints writing tests that break on the next redesign. The alternative charges you per screenshot. Neither is acceptable.”
01 — Painfully slow
Manual testing devours your roadmap
Writing visual regression tests by hand takes hours, days, sometimes entire sprints. Maintaining selectors and updating baselines after every UI change adds even more drag. Your QA team becomes a bottleneck instead of a safeguard.
40+ hours per sprint
02 — Blind automation
Agentic tools with no human oversight
Fully autonomous testing agents generate and run tests with zero human visibility. No review step. No approval workflow. No dashboard to inspect what was tested or why. When they get it wrong, you find out in production.
03 — Endlessly expensive
AI token costs that never stop compounding
Other AI testing tools burn tokens on every single test run. The more you test, the more you pay. Your CI bill grows linearly with your test suite, turning testing frequency into a financial decision.
$199 – $5,000+/mo in tokens + SaaS
Generate once.
Replay forever.
Pay nothing.
Agentic AI with human-in-the-loop. The AI explores and generates tests. You review, approve, and manage them from a purpose-built oversight dashboard. After the first AI-powered generation, every subsequent run is a deterministic replay — zero tokens, zero cost.
AI explores your entire application
Point Lastest at your running app. The agentic Play Agent autonomously navigates every flow, maps every state, and builds a comprehensive visual baseline — work that would take a QA engineer days, finished in seconds.
You review from the oversight dashboard
Unlike black-box agentic tools, Lastest gives you a management interface to inspect, approve, modify, and reject generated tests. Human judgment where it matters. AI speed everywhere else.
Replays run forever at zero token cost
The first generation uses AI tokens. Every run after that is a deterministic replay — no AI calls, no token costs. Three diff engines — pixel, structural, and perceptual — catch regressions that single-engine tools miss. Self-hosted on your infrastructure or run on Lastest Cloud.
Generation is cheap. Replay is cheap.
Validation has to answer two questions.
Today the agent ships features in an afternoon — and shipping is still slow, because the bottleneck moved. Validation is the new frontier, and validation is not one check, it’s two: nothing that was working is now broken, and what was actually built matches what was expected. The agent can fail either independently. You need both checks, in the same loop, on every change.
Nothing is broken
Diff the current render against the previously approved baseline. Catches the layout regression, the contrast collapse, the silently dead button — every category of bug that ends in a user noticing, without you having to author the assertion in advance.
Built matches what was expected
Diff the current render against the spec, ticket, or uploaded design. Catches the agent that ships a clean implementation of the wrong feature — green tests, broken intent. The brief is the source of truth, not the agent’s read of the brief.
Three engines, one verdict
Pixel screams on noise. Structural screams on layout drift. Perceptual screams only on what a user would actually see. Three engines, run against both references — baseline and spec — so the same diff infra answers both questions.
Human at the seam, with receipts
A reviewer approves the new baseline once, at the moment new behavior becomes canonical. Who, when, which engine, signed. Trust requires receipts — the kind LLM-as-judge cannot produce.
Record. Organize. Run. Compare.
Four steps, one dashboard. From a fresh checkout to full visual coverage in under five minutes.
Record
Point at your staging URL. The agent navigates flows and snapshots every key state. Or hit “record” and click through manually.
Organize
Tests group by area in the dashboard — checkout, PDP, nav. Approve baselines, edit names, retire dead routes. You own the test code.
Run in CI
One step in GitHub Actions, GitLab, CircleCI. Replays cost nothing. Run on every push, every PR, every nightly — or trigger from the cloud dashboard.
Compare
Side-by-side before / after with the diff highlighted in red. Approve the change, or reject the regression. Merge with the dashboard green.
Before. After. Only what changed.
Every visual change surfaces side-by-side. Approve the intentional ones, reject the regressions. Nothing slips.
Platform capabilities
Everything ships in the box. No tier. No upgrade prompt.
Intelligence
AI test generation
Claude writes resilient Playwright code with a 7-layer selector fallback (data-testid → id → role → aria-label → text → CSS → OCR). Survives refactors that break hand-written tests.
Modes
Three ways to work
AI-Free recording for air-gapped teams. AI-Assisted with human review on every change. Full Autonomous via the Play Agent. Pick a mode per test — or per team.
Economics
Zero-token replays
AI runs only when you create or fix a test. Every replay is pure Playwright execution. Test a thousand times a day for $0 in tokens. Self-hosted screenshots are unlimited regardless of volume.
Automation
Play Agent — 11-step pipeline
Specialized sub-agents (Orchestrator, Planner, Scout, Diver, Generator, Healer) plan, generate, run, and fix tests autonomously. Pause, approve, or skip any step. Resumes where it left off.
Precision
3 diff engines
pixelmatch (pixel-perfect), SSIM (structural similarity), and Butteraugli (human-perception-aligned). Choose the trade-off per test — or run all three for a benchmark verdict.
Stabilization
12 flaky-test guards
Text-region-aware OCR diffing, timestamp freezing, network idle wait, DOM stability detection, font loading wait, page-shift detection, burst capture, auto-mask of dynamic content. Cross-OS consistency baked in.
Ownership
Self-hosted & open source
FSL-1.1-ALv2 license. Your infra, your network boundary. Screenshots never leave your servers. Full source on GitHub. Or run it on Lastest Cloud and skip ops entirely.
Integration
CI/CD native + Smart Run
Reusable GitHub Action, GitLab MR comments (self-hosted GitLab supported), webhook triggers, scheduled cron runs. Smart Run analyzes git diffs and runs only the tests your change touched.
Operations
Embedded Browser pool
Containerized Chromium with live JPEG streaming back to the dashboard. No local Playwright install. Dynamic provisioning into k3d locally or your cluster in production. Distributed Remote Runners for CI fan-out.
Triage
AI failure classification
Every failure is auto-classified as real regression, flaky test, environment issue, or test maintenance — with confidence scores and reasoning. Stop drowning in red checkmarks.
Accessibility
WCAG 2.2 AA scoring
Axe-core checks every screenshot. 0–100 score with severity-weighted deductions (critical / serious / moderate / minor) and trend sparklines per build. A11y becomes part of your regression pipeline.
Providers
Bring your own AI
Claude CLI, OpenRouter, Anthropic API, OpenAI, or Ollama (local models). Use a separate provider for diff analysis than for test generation. No lock-in, no surprise pricing.
Spec-driven
OpenAPI & user stories → tests
Drop in an OpenAPI spec, a markdown PRD, or a list of user stories. AI extracts the cases and generates tests automatically. Route Discovery scans your source for paths the spec missed.
Extensibility
MCP server — 29 AI tools
Model Context Protocol server (npx @lastest/mcp-server) exposes 29 tools that let external AI agents run tests, review diffs, approve baselines, and create or heal tests programmatically.
Versioning
Full test history & composition
Every edit, AI fix, or restore is versioned with a reason. Compose builds by cherry-picking specific test versions. Branch baselines fork on PR open and merge back on PR merge.
Engagement
Beat the Bot
Optional gamified leaderboard where humans compete with AI bots to author and triage tests. Points for catching regressions and clearing review todos. Seasonal play, achievements, Bug Blitz events.
A comprehensive inventory of capabilities
Lastest consolidates test authoring, execution, visual comparison, and team review into a single self-hosted platform.
Three ways to author tests
Manual recording
Point-and-click browser recording via Playwright. No code, no AI, no API keys. Generates deterministic scripts you can edit by hand.
No AI neededAI-assisted
AI generates, fixes, or enhances tests — but every change requires your review and approval. Import URLs, OpenAPI specs, or user stories.
You review everythingAutonomous Play Agent
11-step pipeline driven by specialized sub-agents (Orchestrator, Planner, Scout, Diver, Generator, Healer): scan routes, plan areas, generate, run, heal failures up to 3 attempts, re-run, report. Pauses only when stuck — resume from where it left off.
Full automationTwo ways to execute
Remote runners v2
Distributed execution via WebSocket. Install @lastest/runner from npm, connect with a token, and test across OS and browsers. Concurrent multi-task support, SHA256 code integrity, DB-backed command queue, heartbeat polling, per-test abort.
Embedded browser
Containerized Chromium with CDP live streaming. No local Playwright needed. JPEG streaming with configurable quality/framerate, WebSocket auth, concurrent contexts. Supports recording and execution.
Zero installThe testing pipeline
Record
Capture interactions or let AI explore
Generate
AI creates tests (one-time tokens)
Run
Deterministic replay, zero tokens
Compare
Three diff engines analyze changes
Review
Humans approve before deploy
Core platform
Multi-step screenshots
Capture multiple labeled screenshots per test for multi-page flow coverage.
Approval workflow
Review visual diffs before they become baselines. Structured review process.
Git-aware builds
Test per branch and commit. Compare across PRs. Track coverage over time.
Branch comparison
Side-by-side branch-to-branch test result diffing.
Test suites
Ordered suites for structured, sequential execution.
Test versioning
Full history with change reasons: manual, AI fix, AI enhance, restore.
Test composition
Cherry-pick tests and pin specific versions per build.
Functional hierarchy
Nested parent/child areas with drag-and-drop reordering.
Debug mode
Step-by-step execution with live feedback per action.
8 testing templates
Presets for SaaS, Marketing, Canvas, E-commerce, Docs, Mobile, SPA, CMS.
Planned screenshots
Compare against Figma exports with planned-vs-actual tracking.
Setup & teardown
Multi-step orchestration with per-test overrides. Playwright, API, and script types.
Dashboard health score
Weighted 0–100 score combining pass rate (60%), non-flaky rate (20%), route coverage (20%). Sparkline trend tracking.
Batch approval
Select and approve multiple visual diffs in a single action. Accelerates review for large builds.
Comparison runs
A/B testing mode: compare baseline vs. current branch builds side-by-side with per-test diff detail.
Scheduled runs
Cron-based automation with preset schedules or custom cron expressions. Auto-disables after consecutive failures. Optional branch targeting.
Guided onboarding
8-step setup guide for new users: connect Git, configure AI, scan routes, record first test, run, set baselines, re-run, check results. Auto-detects completion.
In-app bug reports
Auto-captures URL, viewport, console errors, failed requests, breadcrumbs, and screenshot attachment. Files GitHub issues directly.
Review todos
Branch-specific actionable items created when reviewers flag a diff. Track review feedback as todos tied to specific builds and tests.
Early adopter mode
Team-level toggle to access experimental features before general release. Opt in per-workspace.
AI capabilities
6 AI providers
Claude CLI, OpenRouter, Claude Agent SDK, direct Anthropic API, OpenAI, or Ollama (local, free). Separate provider for diff analysis.
AI test generation
Multi-selector fallback: data-testid, id, role, aria-label, text, CSS, OCR.
AI diff analysis
Classifies changes with confidence scores. Separate provider from generation.
AI test fixing
Auto-proposes fixes. Review before accepting, or let the agent auto-fix.
Spec-driven testing
Import OpenAPI specs, user stories, or markdown. AI extracts and generates tests.
Route discovery
AI scans source code to discover routes and suggest tests.
AI prompt audit trail
Full logging of all AI requests and responses.
AI confidence scores
Every AI recommendation includes a numeric confidence score. Filter and prioritize reviews by certainty.
MCP server
29 Model Context Protocol tools for external AI agents. Programmatic access to test creation, execution, and review via npx @lastest/mcp-server.
Agent monitoring
Real-time SSE activity feed tracking Play Agent sessions step-by-step. Session history, pause/resume, and active/paused/completed status from the dashboard.
Codebase intelligence
Auto-detects framework, CSS, auth, state management, API layer, and key dependencies. 100+ package database maps stacks to testing recommendations to enrich AI prompts.
AI failure triage
Automatic classification of test failures into real regression, flaky test, environment issue, or test maintenance — with confidence scores and reasoning.
MCP selector validation
Real-time selector validation against live pages via Claude MCP. Catches brittle selectors during generation, not after they break.
Visual comparison
3 diff engines
Pixelmatch, SSIM, and Butteraugli. Speed vs. perceptual accuracy trade-off.
Text-region diffing
OCR-based two-pass comparison with separate text/non-text thresholds.
Page shift detection
Detects inserted/deleted rows with fuzzy matching, not full-page flags.
Ignore regions
Mask dynamic areas with solid-color or placeholder-text styles.
Configurable sensitivity
Pixel and percentage thresholds for unchanged/flaky/changed classification.
SHA256 fast-path
Hash match = instant pass. No pixel comparison needed on every run.
Diff engine benchmarks
Built-in benchmark framework comparing all three engines across synthetic test scenarios with timing and accuracy metrics.
Stabilization — 12 features for flaky test prevention
Integrations
pnpm test:visual for any CIAnalytics & insights
Impact timeline
Track visual regression impact across PRs and commits. Author contribution analysis and trend visualization.
WCAG 2.2 AA compliance
Automated 0–100 accessibility scoring with severity-weighted violations. Per-test violation detail across builds.
Infrastructure & team
Smart Run
Git-diff analysis runs only affected tests.
Parallel execution
Configurable max parallel tests for local and remote.
Branch baselines
Fork/merge baselines per branch. SHA256 carry-forward matching.
Docker deployment
Multi-stage build, persistent volumes, health checks, non-root.
App state inspection
Access window.__APP_STATE__, Redux stores during tests.
Multi-tenant teams
Slug-based workspaces with email invitations.
Role-based access
Owner, admin, member, and viewer permissions.
Multiple auth
Email/password (Argon2), GitHub, GitLab, Google OAuth.
Storage states
Save and restore browser storage snapshots — localStorage, sessionStorage, and cookies — for authenticated flows.
Assertion tracking
Parsed expect() calls with expected vs. actual values, error messages, and code line references per test.
Selector stats
Track selector success rates and response times. Identify brittle selectors before they cause flaky tests.
CLI test runner
pnpm test:visual for GitHub Actions and CI pipelines. Auto-captures GITHUB_HEAD_REF, GITHUB_REF_NAME, and GITHUB_SHA for git tracking.
Reusable GitHub Action
las-team/lastest/action@main — zero-config CI/CD integration. No local Playwright install; tests run on your Lastest server via a remote runner. Results posted to Actions step summary.
Webhook triggers
Builds fire on PR opened/updated, push to monitored branches, or manual click. Smart Run filters to only affected tests.
Background jobs
Queue tracking for long-running operations — AI route scans, agent sessions, and builds — with live status and history.
Market reality
The comparison the industry prefers you don't see.
| Tool | Price / mo | Human oversight | AI tests | Zero-token replays | Health analytics | A11y scoring | Self-hosted |
|---|---|---|---|---|---|---|---|
| Lastest | Free (self-host) · Cloud free tier | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Percy | $199+ | manual only | ✗ | n/a | ✗ | ✗ | ✗ |
| Applitools | $699+ | limited | partial | ✗ | limited | partial | ✗ |
| Chromatic | $179+ | manual only | ✗ | n/a | ✗ | ✗ | ✗ |
| Pure agentic tools | per-token | ✗ | ✓ | ✗ | ✗ | ✗ | varies |
“Pay for AI once to generate your tests. Then replay them ten thousand times for free. That is how testing economics should work.”
What people are saying
Honest words from the community.
Unedited reactions from Reddit threads, users, and early adopters — including the parts where they told us what to fix.
Pricing
Free forever to self-host. Platinum Edition from $299/mo.
Self-hosting is free for everyone, forever — bring your own AI via OpenRouter, Anthropic, or local Ollama. Try the cloud free to evaluate. Platinum Edition is deeply discounted during our early adopter window, and the price you sign up at is locked in for the lifetime of your subscription.
Self-host
Free forever
Run Lastest on your own hardware. Unlimited everything, no seat limits. FSL-1.1-ALv2.
- Unlimited tests, replays, browsers, seats, projects
- Bring your own AI — OpenRouter, Anthropic, Ollama
- Full human-in-the-loop dashboard + 3 diff engines
- CI/CD integration (GitHub, GitLab, Bitbucket)
- Community support (GitHub, Discord)
Cloud free
Free to try
Hosted by us. No install, no credit card. For evaluation only.
- 100 runs per month
- 1 project, 3 seats, 1 concurrent runner
- Shared cloud infrastructure
- Upgrade anytime — tests carry over
No data retention guarantee — builds may be pruned.
No functional consistency — we ship daily/weekly.
Not for production workloads.
Platinum Edition
From $299/mo
Locked in for the lifetime of your subscription.
All DLC included. Managed cloud, SLA, SSO, dedicated support.
- Managed cloud tenant with SLA-backed uptime
- SSO / SAML, audit log, data retention
- Dedicated support channel
- Priority feature development
- Custom implementations available per quote
“Percy Pro: $449/mo, 25k screenshots. Lastest Self-Host: $0, unlimited replays. That is how testing economics should work.”
Two paths, zero friction
Pick your runtime. Tested forever either way.
Run on your laptop, your VPS, or skip the install entirely and use Lastest Cloud. Same engine, same dashboard, same diff. The first run uses AI tokens; every replay after that is free, on either path.
From the blog
Recent writing
Field notes on AI test generation, diff engines, and self-hosting visual regression at every team size.
Enterprise
AI speed meets enterprise governance
Lastest is free to self-host for every team. The human-in-the-loop dashboard already provides the oversight layer enterprises require — approval workflows, audit trails, and full test visibility. For organizations needing SLAs, managed infrastructure, SSO integration, or dedicated engineering support, we offer enterprise agreements. No minimum commitment. No sales presentations.
SLA-backed uptime guarantees
SSO / SAML integration
Managed cloud deployment
Dedicated support channel
On-premise installation support
Priority feature development


