4 Steps to Run Statistically Significant A/B Tests

Jasmine LeBlanc 2/8/18 9:56 AM

4 Steps to Run a Statistically Significant AB-Test-Significance-Check-It-Off

1. Select a Statistic to Optimize
2. Create an A/B Test
3. Collect Your Data
1. Sample Size
2. Test Duration
4. Measure Your Results

1/19/18 10:01 AM

Tips for Effective Ad Optimization in Google AdWords

1/26/22 9:19 AM

4 Ways to Jumpstart Your Nonprofit Marketing

1/31/18 12:00 PM

What You Should Know About Google Ad Grants’ New Policy Update

How do you know if the keywords in your ad copy are why your click-through rate increased? What about the color of the button on your landing page, did it increase the number of downloads? Don’t be too quick to jump to conclusions about the true outcome. You can only definitely find out by determining if your data is statistically significant. Yes, you’re going to have to do a little bit of math, but it’s worth understanding if there is a solid method to your marketing and advertising or if the results of your changes were just due to chance.

We’ll walk you through the steps of determining the statistical significance of your data, so that you can understand how to best test it and learn why it’s significant.

In a nutshell, here’s the 4 steps you’ll take to calculate the statistical significance of your data:

1. Select a Statistic to Optimize

To begin, set a goal for this test. Do you want to see an increase in conversion rates or click-through rates? Then, select any content (ad copy, email, landing page, etc.) and create two, slightly different variations of them (different keywords, different images, different call-to-action buttons, etc.)

You’ll be using this statistic to perform an A/B test to see which variation is more effective at meeting your goal. Make sure that whatever you select will yield valuable results for you to apply in the future.

2. Create an A/B Test

The goal of an A/B test is to determine the most successful version of something. To begin, you’ll need to create two different items (buttons, landing pages, ad copy, etc.) and present them to two different groups.

Here are some sample A/B tests that you could do:

Create two, different random groups of people to receive your two test emails. One group will receive version A, and the other will receive version B.

Use one set of keywords in your ad copy for a couple of days and then switch to a different set of keywords for the next couple of days.

Use one image on your landing page, after a couple of days switch out the image to something different for a couple more days.

Once you decide what you would like to optimize and how you would like to do it, the next step is to decide on your sample size.

3. Collect Your Data

To collect the most accurate data, you’ll need to ensure that you use an appropriate sample size and that you run your test for the right amount of time.

Sample Size

Your data collection is a huge part of correctly calculating the statistical significance of your data. Be sure to collect the correct amount of samples! Unfortunately, the number will differ for everyone based on what they’re testing. However, we found some helpful resources that will guide you to finding that perfect number:

Test Duration

Are you wondering how long you should run your test? Well, like sample size, it differs for everyone. However, we do recommend performing your test for at least a week. If you have the time, run it longer. Don’t stop the test just because you start to see the results you want to see. You want your data to be as accurate as possible. If you need help narrowing down a time frame, try using this test duration calculator.

4. Measure Your Results

After your A/B test is complete, you’ll want to analyze your results for their statistical significance. Don’t worry! There are plenty of resources to do the math for you. Here are a couple of our favorites:

When you use these tools, they will ask you to input your data in specific fields to give you accurate results. Here’s a glossary on what they’ll ask for, so that you can enter your data correctly:

Confidence level: Your confidence level shows how high the chance is that your data could be incorrect. The higher your confidence level is, then the more confident you are that your hypothesis is correct. If you are confident that if you were to repeat the test, you’d get the same results, then you would select a high confidence level such as 95%(.05) or even 99%(.01). It is common to use 95% as a confidence level.

Confidence interval/margin of error: For the sake of these tools, these terms are synonyms. Since you’re using a small group of people to reflect the views from the overall population, your confidence interval is a range that indicates the accuracy of your results. Usually, the smaller your sample size, the higher your margin of error will be. This is because the representation of your population is smaller, so the chances of there being inaccuracies in your data are higher. With a sample size of 50 people, your margin of error would be around 10. To ensure your confidence interval is accurate, use this margin of error calculator from SurveyMonkey.

Hypothesis (one-sided v. two-sided): Some calculator tools will ask if your hypothesis is one-sided or two-sided. Here’s what that means:
- One-sided hypothesis: A one-sided hypothesis means that you are only interpreting if your answer is right or not. So if your one-sided hypothesis is that Test B was going to perform better, but it didn’t, your results would show that your data is not significant and that you should change it in order to make your hypothesis true.
- Two-sided hypothesis: If that same hypothesis was two-sided, then your results would show that Test B didn’t perform better than test A. From these results, you can be confident that it is better to go with Test A, instead of changing test B.

We recommend selecting a two-sided hypothesis for A/B tests so you can determine which of the two is more successful, even it means your hypothesis was wrong.

Population size: This number represents how many people there are in the group that your sample is representing. If you have an exact number or an accurate guess, use that. If not, it doesn’t have that much of an impact on determining your sample size, so it’s okay to leave blank in these calculator tools.

A/B testing can help you determine which version of your content is more effective with your target audience. It also gives you reassurance that there is a reliable method to your marketing strategy and that your results aren’t just due to chance. Luckily, there are plenty of online tools to help you determine the statistical significance of your data. Once you have calculated that your data is statistically significant, you can reuse your successful methods in the future! If you have any questions about conducting A/B tests or interpreting your test results, we’d be glad to answer them!