Guide · 13 Mar, 2026

A/B Testing & Experimentation

đŸ§Ș
📊

A/B testing compares two versions of a webpage to see which one performs better. You show Version A to half your visitors and Version B to the other half. Then you measure which version gets more signups, clicks, or sales.

That’s the whole concept. Three sentences.

The execution is where people get lost. So here’s everything you need: how testing works, which tools are worth your money, what the research says about win rates, and how to avoid the mistakes that waste most teams’ time.

What is A/B testing?

Show two versions of something to real visitors. Measure which one performs better. Keep the winner.

You have a webpage. Maybe it’s your homepage, a pricing page, or a signup form. You think a different headline might get more people to click. A/B testing lets you find out without guessing.

Here’s the process:

  1. Pick what to change. A headline, a button, a call-to-action, an image. One thing.
  2. Create Version B. Your current page is Version A (the original). Your change becomes Version B.
  3. Split traffic. Half your visitors see A, half see B. The split is random.
  4. Wait for data. The test runs until you have enough visitors for a reliable answer.
  5. Read the results. One version wins, or they perform about the same. Both outcomes are useful.
StepWhat happensWhat you do
SetupPick a page and an element to test3 minutes in most tools
SplitVisitors randomly see Version A or BNothing. The tool handles this
MeasureConversions are tracked per versionCheck back in a week or two
DecideOne version gets more conversionsKeep the winner, or test again

The term “split testing” means the same thing. Some people use it interchangeably, some draw a technical distinction between the two. For most purposes, they’re the same idea.

With Kirro, setup takes about three minutes. Paste a small script on your site, open the visual editor, click the element you want to change, and hit start. No code. No developer.

What can you test?

Almost anything visible on a page. Headlines, button text, images, hero sections, form layouts, pricing displays, page structure. If a visitor can see it, you can test it.

How do you know if a test worked?

After enough visitors have seen both versions, the tool runs the numbers. You’ll see something like: “Version B gets 23% more signups. Kirro is confident this works.” Or: “Both versions performed about the same.” Both answers are useful. A “no difference” result means your original was already decent. That’s good information.

The key word is “enough visitors.” Run a test with 50 visitors and you’re basically flipping a coin. Run it with 5,000 and you’re getting a real answer. How many you need depends on your traffic and the size of the change you’re testing. (More on sample size later.)

When you’re testing multiple elements at once (a headline AND a button AND an image), that’s called multivariate testing. It needs much more traffic. Most small businesses don’t need it. Start with one change at a time.

Why A/B testing matters

One tested headline change made Bing $100 million a year. Most websites still guess.

According to BuiltWith data compiled by Convert.com, only 0.2% of all websites use any A/B testing tool. Even among the top 10,000 highest-traffic sites, just 32% run tests.

That means the vast majority of businesses are guessing what works. And guessing is expensive.

Here’s what happens when companies actually test:

ExampleWhat was testedResultSource
Bing (Microsoft)Ad headline layout12% revenue increase (~$100M/year in the US)HBR, 2017
Obama campaignSignup page variations140% more signups, $75M in added donationsTrueList
Bing page speed100ms faster load time$18M additional annual revenueHBR, 2017

These are big companies with big numbers. But the principle scales down perfectly.

Say your landing page converts at 2% and gets 10,000 visitors a month. That’s 200 conversions. A better headline pushes it to 2.5%. Now you’re getting 250 conversions. Same traffic, 50 more customers every month, zero extra ad spend. Over a year, that’s 600 additional customers from a single headline test.

And the wins compound. Run 10-15 tests a year, find 3-4 winners, and each improvement stacks on top of the last. A 2% conversion rate becomes 2.5%, then 2.8%, then 3.1%. Your website gets better at turning visitors into customers without any increase in traffic budget.

This is why Ron Kohavi, who ran experimentation at Microsoft, Amazon, and Airbnb, calls controlled experiments “the best scientific way to establish causality”. You’re not guessing what works. You’re measuring it.

Our take: Most A/B tests don’t produce winners. At Google and Bing, only 10-20% of experiments show positive results. That’s normal. The 10-20% that DO win more than pay for the effort. Testing isn’t about winning every time. It’s about finding the wins that matter.

Booking.com runs roughly 1,000 tests at any given time. Google runs over 10,000 per year. They don’t do this because every test is a slam dunk. They do it because the ones that hit, hit big.

Frequently asked questions

Quick answers to the most common A/B testing questions.

What is A/B testing?

A/B testing shows two versions of a webpage (or email, or ad) to different groups of visitors. You measure which version gets more conversions, whether that’s signups, purchases, or clicks. The version that performs better wins.

It’s the simplest way to make data-backed decisions about your website instead of guessing. You don’t need a statistics degree or a dedicated CRO team. With modern tools like Kirro, you can set up and run a test in a few minutes.

How long should an A/B test run?

At minimum, two weeks. Even if you hit your target sample size faster, you need at least one full business cycle to account for weekday vs weekend differences.

The biggest mistake is stopping a test early because it “looks like” a winner. Checking results before the test is done inflates your false positive rate from 5% to 26%. Let the test run.

What is a good sample size for A/B testing?

There’s no universal number. It depends on your current conversion rate, the minimum improvement you’d find meaningful, and your desired confidence level.

Rough benchmark: most tests need at least 350-1,000 conversions per version. Sites with fewer than 10,000 monthly visitors should focus on testing big, obvious changes (headlines, layout) rather than subtle tweaks. Use our free sample size calculator to get a number for your situation.

What’s the difference between A/B testing and multivariate testing?

A/B testing compares two versions with one change. Multivariate testing changes multiple elements at the same time and measures how they interact.

The tradeoff: multivariate tests reveal more but need significantly more traffic (often 10x or more). In practice, less than 1% of all tests run are multivariate. For most businesses, A/B testing is the right starting point. Test one thing, get a clear answer, move on.


Ready to stop guessing and start measuring? Try Kirro free for 30 days. Set up your first test in 3 minutes. No credit card. No setup guide. See what actually works.

A/B Testing Tools & Software

Your testing tool should match your team size. Most teams pay for features they’ll never touch.

The A/B testing tools market is approaching $1 billion and growing at about 14% per year. Yet only 11.5% of the top million websites actually run a testing tool. That’s a lot of sites leaving money on the table.

The market itself is lopsided. Most tools were built for enterprise teams with dedicated optimization specialists and six-figure budgets. If you’re a marketer or founder at a smaller company, the options look expensive and complicated. They don’t have to be.

When Google Optimize shut down in September 2023, an estimated 2-3 million websites lost the only simple, free testing tool that worked inside their Google stack. That gap still hasn’t been properly filled. Enterprise tools are too expensive. Developer tools assume you write code.

Kirro was built for that gap. EUR 99/month, unlimited tests, unlimited visitors, visual editor, GA4 integration. No per-visitor pricing that punishes you for growing. Try it free for 30 days.

How to pick the right tool

Answer three questions: what’s your budget, who’s running the tests, and how much traffic do you have?

A/B testing tools fall into four categories. Knowing which one fits saves you from buying a race car when you need a bicycle.

CategoryWho it’s forPrice rangeHow it worksExamples
Visual editorMarketers, founders, small teams$0-$1,200/yearPoint-and-click changes on your live site. No code.Kirro, Crazy Egg, VWO
Full-stack platformMid-market to enterprise CRO teams$3,500-$100,000+/yearVisual editor plus server-side testing, personalization, and analytics.VWO, Convert, Optimizely, AB Tasty
Developer-firstEngineering teams, feature flagging$0-$5,000/yearCode-based. Tests live in your codebase, not a visual editor.GrowthBook, Statsig, LaunchDarkly
Server-side onlyLarge sites needing zero-flicker performance$10,000+/yearTests run on your server before the page loads. Fastest, but requires dev work.Optimizely Full Stack, Kameleoon

Visual editor tools are the starting point for most small teams. You click on a headline, change it, and hit start. Kirro fits here. So does VWO’s basic plan and Crazy Egg.

Full-stack platforms bundle everything: visual editor, server-side tests, personalization, heatmaps, session recordings. VWO and Optimizely live here. You pay for the bundle whether you use all of it or not.

Developer-first tools are free or cheap but need someone who can write code. GrowthBook is open source and genuinely free. PostHog and Statsig have generous free tiers. If you have a developer on the team and a data warehouse, these are worth a look. If you don’t, skip them.

Pricing models matter too. Some tools charge per seat (per person who logs in). Others charge per MTU (monthly tested users, meaning visitors who see a test). MTU pricing means your bill goes up as your traffic grows. Seat pricing stays flat. Kirro charges a flat EUR 99/month regardless of traffic or team size. That’s unusual. Most tools in this space charge more as you grow.

One thing every “best tools” roundup skips: only about 1 in 8 A/B tests produce a clear winner. The tool you pick matters less than actually running tests consistently. A EUR 99/month tool used weekly beats a $36,000/year tool used quarterly.

Find the right guide for your question

Each of these posts goes deep on a specific angle. Here’s where to start based on what you’re looking for.

Want a full buyer’s guide? Our A/B testing software comparison reviews 13 tools side by side. Total cost of ownership, real pricing (not just the sticker), and tradeoffs nobody else mentions. It’s the deep dive.

Short on time? The best A/B testing tools post is the 5-minute version. Five picks organized by use case: best for small teams, best for enterprise, best for ecommerce, best for developers. Clear winner recommendation upfront.

Need redirect testing? Split testing (sending visitors to completely different URLs) is a different job than element-level A/B testing. Our split testing software guide covers 8 tools that handle real URL redirects and explains when you need that instead of standard A/B testing.

Building a mobile app? Mobile testing has unique headaches: app store review delays, version fragmentation, and smaller sample sizes. The mobile app A/B testing guide covers the best tools and what makes mobile different from web testing.

The bigger picture

A/B testing tools are one piece of the conversion puzzle. The full CRO stack includes heatmaps, session recordings, surveys, and analytics alongside testing. Our CRO tools guide breaks down the full toolkit.

For the fundamentals of A/B testing itself (how it works, what to test first, how to read results), start with our A/B testing pillar guide. Everything in this section lives under that umbrella.

All A/B Testing Tools & Software posts →

CRO Tools & Software

You don’t need a CRO suite. You need a testing tool, a behavior tool, and the analytics you already have. Total cost: EUR 99/month.

Most “CRO tools” articles list 15 products and tell you to pick a few from each category. That’s advice built for teams with dedicated optimization specialists and five-figure budgets. If you’re a marketer or founder at a smaller company, you need a different approach.

Here’s the reality: only 0.2% of websites use any A/B testing tool at all. The gap isn’t knowledge. It’s that the industry keeps selling full “CRO platforms” to people who need one or two tools.

A/B testing tools and CRO tools are different things, even though people use the terms like they mean the same thing. An A/B testing tool does one job: it shows two versions of a page to different visitors and tells you which one performs better. A CRO tool is a broader category that includes heatmaps, session recordings, surveys, form analytics, and personalization alongside testing. Think of A/B testing as one wrench. CRO tools are the whole toolbox.

The question is whether you need the toolbox or just the wrench. For most small teams, the answer is the wrench plus one or two free tools you probably already have access to.

The 3-tool stack that covers 90% of what you need

GA4 (free) + Microsoft Clarity (free) + Kirro (EUR 99/month). That’s a complete CRO setup for less than what most tools charge for heatmaps alone.

This isn’t a theoretical framework. It’s what actually works for small teams:

Google Analytics 4 handles the “measure” part. Track traffic, conversions, funnels, and user behavior. It’s free and it’s the foundation.

Microsoft Clarity handles the “watch” part. Free heatmaps and session recordings with no limits on traffic or sessions. See where people click, how far they scroll, and where they rage-click. No credit card, no caps, no catch. Hotjar still owns 55% of the heatmap market, but Clarity does the same core job for zero dollars.

Kirro handles the “test” part. Change a headline, swap a button, try a new hero section. EUR 99/month, unlimited tests, unlimited visitors, 9KB script. The visual editor means no developer needed.

Total cost: EUR 99/month. The enterprise equivalent (Optimizely + Hotjar Business + GA4 360) runs $50,000+ per year. Same workflow. Different price tag.

Our take: The VWO and AB Tasty merger in January 2026 signals where the industry is heading. PE-backed consolidation pushes tools upmarket and prices up. The gap for small teams keeps getting wider. Building your stack from focused, affordable tools protects you from that trend.

Which post should you read first?

Every post in this cluster covers a different part of the CRO toolkit. Start with the one that matches where you are right now.

Choosing tools by category? Our CRO tools guide breaks down all six types of conversion optimization tools (analytics, heatmaps, testing, surveys, form analytics, personalization). It includes a decision framework based on your traffic level and budget, plus honest recommendations for each category. Start here if you’re building your stack from scratch.

Ready to buy software? The CRO software buyer’s guide is for people who already know what they need and want to compare specific products. It covers pricing, stacks by budget tier, and the trade-offs between all-in-one platforms and best-of-breed setups. Includes the post-merger VWO/AB Tasty picture.

Need to understand user behavior first? Our session replay tools roundup compares every major heatmap and recording tool, starting with free options. If you want to see what visitors actually do on your site before deciding what to test, start there.

Thinking about switching from Hotjar? The Hotjar alternatives guide covers why teams leave (pricing, session limits on the free plan) and what to use instead. Includes a head-to-head with Microsoft Clarity and the full comparison with FullStory, Mouseflow, and Lucky Orange.

The thread connecting all of this

Watch what visitors do. Figure out why. Test a fix. Repeat. That’s the whole CRO workflow.

Most businesses get stuck because they either skip straight to testing (without knowing what to test) or they buy observation tools and never act on what they find. Most businesses get stuck because they skip straight to testing (without knowing what to test) or buy observation tools and never act on what they find.

The posts in this cluster cover the full picture, from choosing the right tool categories to comparing specific products. The parent A/B Testing pillar page ties this into the broader testing and experimentation strategy.

When you’re ready to start testing, Kirro’s free trial gives you 30 days with the full product. No features locked. No credit card required. Pair it with Clarity and GA4, and you’ve got the same CRO workflow the enterprise teams use.

All CRO Tools & Software posts →

Competitor Comparisons

Skip the feature matrix. Your budget, team size, and traffic tell you which tool to pick in about 30 seconds.

The A/B testing market just got a lot smaller. VWO and AB Tasty merged in January 2026, creating a $100M+ revenue company backed by private equity. Optimizely keeps moving upmarket. The tools are consolidating into bigger, pricier bundles.

And the gap for small teams keeps getting wider.

High pricing already restricts about 35% of potential market growth among small and mid-size businesses. If that describes you, the comparisons below cut through the noise. No feature spreadsheets. Just: here’s what each tool costs, who it’s built for, and whether it fits your situation.

How to pick the right comparison

Start with your biggest constraint (budget, team skills, or traffic volume) and read the matching post.

Every post in this cluster covers a different buying scenario. Here’s which one to read based on where you are right now.

Comparing the two biggest names? Our VWO vs Optimizely comparison uses real G2 data, pricing breakdowns, and 796 verified TrustRadius reviews. VWO wins on ease of use and price. Optimizely wins on enterprise features. For teams under 10 people, the honest answer is: both are overkill.

Leaving Optimizely? The best Optimizely alternatives guide covers seven replacements, organized by why you’re switching. Price, complexity, privacy, open source. Each reason points to a different tool. Updated for the VWO/AB Tasty merger.

Lost Google Optimize? When Google shut down Optimize in September 2023, 500,000+ websites lost their testing tool. Our Google Optimize alternatives guide ranks 10 replacements with real pricing and honest drawbacks.

Need to know what Optimizely actually costs? Optimizely pricing breaks down the real numbers. Spoiler: it starts at $36,000/year, and total cost runs 35-50% above the license fee. None of this is on their website.

Evaluating Convert? Our Convert A/B testing review covers the privacy-focused mid-market option. Strong on data compliance and support. Starts at $299/month. But the visual editor and a HIPAA gap are worth knowing about before you commit.

The market right now

Enterprise tools are merging and getting pricier. The middle is thinning out. Small teams have fewer options than they did a year ago.

Here’s how the A/B testing tool market breaks down in 2026:

TierToolsStarting priceBuilt for
EnterpriseOptimizely, Adobe Target$36,000+/year50+ person teams, dedicated CRO specialists
Mid-market (post-merger)VWO/AB Tasty, Convert$299-599/monthAgencies, mid-size companies, privacy-focused teams
SMBKirro, Mida, Zoho PageSenseEUR 99-299/monthMarketers, founders, small teams
Developer/open-sourceGrowthBook, StatsigFree-$150/monthEngineering-led product teams

The mid-market is where the most change happened. VWO and AB Tasty combining means fewer choices at the $300-500/month price point. And PE-backed consolidation usually pushes prices up, not down.

If you’re a small team or solo marketer, the parent A/B Testing pillar page covers the full picture, from methodology to tools to strategy.

Our take: The tool matters less than actually using it. CXL analyzed 28,304 experiments and found that the companies producing results weren’t the ones with the fanciest tools. They were the ones running tests consistently. Only 1 in 8 A/B tests creates a significant lift. That means you need volume. A EUR 99/month tool used weekly beats a $36,000/year tool used quarterly.

Ready to stop comparing and start testing? Kirro’s free trial gives you 30 days with the full product. No credit card. No feature limits. Three minutes to set up.

All Competitor Comparisons posts →

Platform-Specific Testing

Every platform has its own A/B testing tools. None of them test the thing that matters most: your landing page.

A/B testing on your own website is straightforward. You control the traffic split, the test duration, and what “winning” means. Platform testing is different. Amazon, Meta, Google Ads, and Webflow each have built-in testing features, but they all play by their own rules.

The biggest difference most guides skip: on ad platforms, the algorithm decides who sees what. A 2025 study published in the Journal of Marketing found that platform algorithms create “divergent delivery,” where different ads get shown to different types of people. Your “winning” ad might only look better because the algorithm showed it to more receptive users, not because the creative itself was stronger.

That’s a problem if you’re trying to learn what actually works.

What you can (and can’t) test on each platform

PlatformWhat you can testWhat you can’t testBig limitation
AmazonTitles, images, bullet points, A+ ContentPricing, layout, storefront structureRequires Brand Registry + undisclosed traffic minimum
MetaAd creative, audiences, placementsLanding page experience, post-click behaviorAlgorithm redistributes budget mid-test
Google AdsAd copy, bidding strategies, asset groupsCross-campaign comparisons, landing pagesOne asset group at a time (10 groups = 40-60 weeks)
WebflowAny page element via Webflow Optimize or third-party toolsBackend logic, pricing, checkout flowsWebflow Optimize starts at $299/month

Our take: Every platform tests the ad or the listing. None of them test what happens after someone clicks. That’s where most conversions are won or lost. A dedicated tool like Kirro fills that gap: you test your actual landing page, checkout flow, or homepage with a clean 50/50 traffic split that you control.

Pick the right guide for your situation

If you sell on Amazon and want to test product listings, our Amazon A/B testing guide walks through Manage Your Experiments step by step, including the eligibility requirements Amazon doesn’t make obvious and the third-party alternatives when their tool falls short.

Running Facebook or Instagram ads? The Meta A/B testing guide covers how their Experiments tool changed in 2025, when to use it vs. testing landing pages externally, and how to avoid the algorithm interfering with your results.

For Google Ads, the Google Ads A/B testing guide explains campaign experiments, the new Performance Max asset testing beta, and the math behind why PMax testing takes so long.

If your site runs on Webflow, Webflow A/B testing compares your options: Webflow Optimize ($299/month), Optibase (from $19/month), and external tools like Kirro that work with any site, Webflow included.

The handoff problem

Here’s the thing nobody in the ad platform world talks about: the ad got someone to click. Great. Now what?

If the landing page doesn’t convert, the best ad creative in the world doesn’t matter. Platform tests stop at the click. Your website is where the actual conversion happens, and that’s where you need a separate testing tool.

Think of it as a relay race. The ad platform runs the first leg. Your landing page runs the second. Most teams only time the first runner.

All of these guides live under the A/B Testing & Experimentation pillar, alongside our tools and methodology deep dives.

All Platform-Specific Testing posts →

SEO A/B Testing

SEO testing splits pages into groups and measures what Google does. Regular A/B testing splits visitors and measures what people do. Different method, different tools, different use cases.

Most testing guides treat SEO A/B testing and regular A/B testing as the same thing. They’re not. The difference changes everything: which tools you need, how long tests take, how many pages you need, and whether your site even qualifies.

Regular A/B testing (the kind Kirro does) shows half your visitors one version of a page and the other half a different version. Same URL, two experiences. You’re testing how people behave.

SEO testing can’t work that way. Google is one crawler. You can’t show it two versions of the same URL. So instead, you group similar pages (say, 200 product pages), change half of them, and compare organic traffic between the two groups over weeks. You’re testing how the search engine behaves.

That distinction is why SEO testing is its own discipline. It needs different tools, larger sites, and more patience. For a full breakdown of how it works, read our SEO A/B testing guide. It covers the methodology, the tools, the real failure rates (75% of tests are inconclusive), and what smaller sites can do instead.

Who SEO testing is for (and who it isn’t)

If you have 300+ similar pages and 30,000+ monthly organic sessions, SEO split testing is worth exploring. Everyone else should focus on regular A/B testing first.

SEO split testing is built for publishers, large e-commerce sites, and marketplaces with hundreds or thousands of pages on the same template. Etsy, Pinterest, and Booking.com run it because they have the page volume to produce reliable results.

Most small and mid-size sites don’t have that volume. And that’s fine. If your site has fewer than 100 similar pages, the math doesn’t work for a proper split test. The noise in your traffic data drowns out any real signal.

The better move? Test what happens after visitors arrive. Regular A/B testing on your landing pages, headlines, and calls to action works at any traffic level. Try Kirro free and test your highest-traffic page today. That’s where the fastest wins are for most businesses.

Where to start reading

Our complete SEO A/B testing guide covers the full picture: how the methodology works step by step, what you can actually test (title tags, headings, structured data, internal links), why most tests show no clear result, and the honest alternatives for sites that don’t have enterprise-scale traffic. It also breaks down every major SEO testing tool and when each one makes sense.

If you’re new to testing in general, start with the A/B testing pillar page for the fundamentals. Want to understand how A/B testing affects conversion rates? That guide covers what to expect by industry. And if the term “split testing” is new, our split testing explainer starts from zero.

The bottom line: SEO testing is powerful for sites with the right setup. For everyone else, regular A/B testing delivers faster, cheaper results with way less complexity.

All SEO A/B Testing posts →

Testing Methodology

The method matters less than you think. What matters: picking one that fits your traffic and sticking with it.

Only 1 in 7 A/B tests reach statistical significance. That’s an 86% failure rate. And yet 58% of companies have no framework for deciding what or how to test.

Those two stats are connected. Bad methodology is why most tests fail. Not bad ideas.

This section covers the science behind reliable testing. Every method below has a specific use case. Some need thousands of visitors. Some work with a few hundred. Some give you fast answers. Others give you precise ones. The trick is matching the method to your situation.

Kirro uses Bayesian statistics and handles the math for you. You get answers like “Version B has an 89% chance of being better,” not p-values.

This cluster is part of the A/B Testing & Experimentation pillar.

Which method do I need?

Answer three questions and you’ll know exactly which testing approach fits.

Start here. Don’t read all 18 articles. Answer these three questions, then go to the one that matches.

Question 1: How much traffic does your page get?

Question 2: What’s your goal?

  • “I need to learn which version is better.” Classic A/B test. You want precision and confidence. This is what most teams need most of the time.
  • “I need to optimize revenue right now.” Multi-armed bandits send more traffic to the winning version while the test runs. Less learning, more earning. Good for short campaigns and sales events.
  • “I need to test several elements at once.” Multivariate testing lets you test headlines, images, and buttons simultaneously. But you’ll need serious traffic (think 50,000+ visitors) to get reliable results.

Question 3: How comfortable are you with statistics?

Our take: A December 2025 Harvard Business Review study found that traditional significance testing demands 24 to 55 times more data than you actually need for a good business decision. Speed often matters more than certainty. Most small teams are better off running more tests with slightly less precision than running fewer “perfect” tests.

The core methods

Six approaches, each built for a different situation. Here’s what each one does and when it’s worth your time.

Standard A/B testing splits traffic 50/50 between your current page and one change. It’s the workhorse. Reliable, easy to understand, works at almost any traffic level. If you’re new to testing, split testing meaning explains the concept, and landing page split testing walks through a real example. For the stats behind it, see A/B testing conversion rate.

Bayesian A/B testing updates results as visitors arrive instead of making you wait for a fixed sample. Kirro uses this approach because the results make sense to non-statisticians. “89% chance Version B wins” is a sentence your boss can act on. Our Bayesian A/B testing guide covers when it helps and when it’s overkill.

Multivariate testing tests combinations of changes. Different headlines paired with different images paired with different buttons. Powerful, but hungry for traffic. Our multivariate testing guide includes the traffic calculator so you can check if your site qualifies before committing.

Multi-armed bandits automatically shift traffic toward whichever version is winning. Less learning, faster revenue. Good for flash sales or time-limited campaigns where waiting for full statistical confidence would cost you money. Deep dive: multi-armed bandit testing.

Sequential testing lets you stop a test early (or keep it running longer) based on ongoing results, without inflating your false positive rate. It solves the “peeking problem” that Evan Miller famously showed raises error rates from 5% to 26%. Full guide: sequential testing.

CUPED (variance reduction) uses data you already have about your visitors to reduce the noise in your results. The practical result: 30 to 40% smaller sample sizes for the same precision. If your tests always seem to take too long, this is probably the fix. Guide: CUPED and variance reduction.

The statistics that actually matter

You don’t need a stats degree. You need to understand four numbers.

Every testing method above relies on the same handful of statistical concepts. You don’t need to calculate them (that’s the tool’s job). But knowing what they mean helps you avoid the most common A/B testing mistakes.

Sample size is “how many visitors do I need?” Too few and your test can’t tell a real winner from random noise. Our sample size formula guide breaks down the math, and the free calculator does it for you.

Minimum detectable effect is “what’s the smallest improvement worth finding?” If you’d only act on a 20% improvement, don’t set up a test designed to detect 2% changes. It’ll take forever. MDE guide.

Type 1 and type 2 errors are the two ways a test can lie to you. A type 1 error says B wins when it doesn’t (false alarm). A type 2 error misses a real winner (missed opportunity). Understanding both helps you set up tests that balance speed with accuracy.

Statistical power is the probability your test will actually detect a real difference. Low power means you’ll miss winners. Microsoft runs 10,000+ experiments annually and still obsesses over power calculations. If it matters to them at that scale, it matters to you. Power analysis guide.

For the full theoretical foundation (what a null hypothesis is, how to think about probability): null hypothesis in A/B testing.

Implementation and architecture

The method is one half. How you run the test is the other half.

Picking the right statistical method gets you halfway. The other half is the practical setup: where the test runs, how visitors get assigned to versions, and what happens when cookies disappear.

Experiment design covers the full process: forming a clear guess about what will happen, choosing the right metric, picking the page, and setting up controls. How to design a marketing experiment walks through this start to finish.

Client-side vs server-side testing is about where the test runs. Client-side (in the browser) is easier to set up but can cause page flicker. Server-side (on your server) is invisible to visitors but needs developer involvement. Most small teams start client-side and it works fine. Client-side vs server-side A/B testing helps you decide.

Cookieless testing matters more every year. Safari already blocks third-party cookies. Chrome offers users a choice. If your testing tool relies on third-party cookies, you’re losing data on a growing chunk of visitors. Cookieless A/B testing covers the alternatives.

Feature flags vs A/B testing confuses a lot of teams. Feature flags let developers turn features on and off. A/B tests measure which version performs better. They solve different problems, and some platforms bundle them together. If you’re wondering whether you need a feature flag tool or a testing tool, feature flags vs A/B testing sorts it out. (Short answer for most marketers: you need a testing tool.)

AI-powered testing is the newest addition to the toolkit. AI can help prioritize what to test, generate variations, and analyze results faster. But it’s not magic, and the fundamentals still apply. AI A/B testing separates the real applications from the hype.

Start somewhere

Most teams overthink the methodology and underthink the action. Microsoft found that a 1% improvement to Bing’s revenue equals over $10 million per year. Those gains came from running thousands of simple tests, not from picking the “perfect” statistical method.

Pick a high-traffic page. Change one thing. Run the test. Three minutes to set up in Kirro. The methodology guides above are here for when you want to go deeper. But the first test? Just run it.

All Testing Methodology posts →