Say you've started running a test. It could be a landing page conversion test, an email subject line test or a PPC ad variation test. You want to know what the numbers are telling you, but you need some math for that, and you're not sure where to start.

#### How to Define Your Experiments

First, let's define a few terms.

Clicks & Impressions -- If you're running a PPC test, "Clicks" refers to the number of clicks a specific variation received over the course of a certain number of impressions (abbreviated as Imps here). If the test is a conversion-based test, "Clicks" would mean conversions and "Imps" would mean hits to the page. In general terms, I'm using the word "Clicks" to mean the desired action you're trying to increase and "Imps" as an opportunity for that action to happen.

Expected CTR -- If you're running a test of a PPC ad you've had running for a while, you'll have a historic CTR for that ad. This is the " ExpectedCTR " for your test. If you're testing conversion rate on a previously existing conversion pathway, " ExpectedCTR " would be equal to the historic conversion rate of that pathway.

If you're testing a PPC ad you've never run before, a new conversion event or a new email list you'll have to calculate the " ExpectedCTR " by taking the weighted average of CTR (or conversion rate) of the variations in your test. For example, if you're testing two email subject lines and variation A gets a CTR of 1% and variation B gets 2%, each with the same number of impressions, then your ExpectedCTR is 1.5%.

Expected Clicks -- The " ExpectedClicks " value for each formula should be calculated by multiplying the ExpectedCTR of your test and the number of impressions the variation you're analyzing received. This will give you the number of clicks you would have expected to see for the test period.

For our purposes we're assuming a few things to make the math simpler. First, we assume that the populations we're studying have a normal distribution and that differences of less than 0.5% are insignificant.

#### Is Your Sample Size Big Enough?

The first thing to determine is if your sample size (the number of impressions each variation in your test received) is large enough to be considered useful. The formula below is simplified to serve marketing test purposes easily. It's based on the formula found here .

Once you've determined the needed sample size for your test, you should look at each variation in your test and analyze whether its observed performance is significantly different from the average or expected performance.

When each variation of your test has been exposed to a sufficiently large sample, you need to determine whether or not your result is significant. You can do this with either Pearson's Chi-Squared or a G test.

Pearson's Chi-square is the most well known test of significance. You probably remember it from your college stats class. Below is a simplified-for-marketing version of the formula, based on the formula listed on Wikipedia .

Now that calculators and personal computers are common, there is another formula that is preferred to Pearson's Chi-Squared. The G-Test is being used more often now because it uses a log function, which prior to the the ubiquity of electronics were a hassle to calculate. Below is the marketing-friendly-version, based on Wikipedia's G-Test formula .

To evaluate the output of either of these two tests we must refer to their "critical values." When testing clicks or conversions you have 1 degree of freedom. For our purposes we'll assume that you're looking to be 95% sure of the significance of the results, so if you find that your Chi-square or G-test value is above 3.8 you know the difference is significant, you can now confidently start using the winning variation.

#### What If My Sample Is Too Small?

If you have a small email list or your PPC ads or landing pages don't get enough impressions to pass the sample size test, there are still ways to determine if your test is significant. This is where the math starts to get a bit complicated.

The least complicated of the tests of significance for small sample sizes is Fisher's exact test. There is a certain amount of complexity that arises from this formula's use of factorials, the factorial of a small number like 100 is a huge number of many digits, so this will present a problem for some calculators and programming languages. This formula actually produces a p-value as its output, so if you do this calculation and come out with a result of 0.05 or below, you're good to go. Below is the formula for a marketing-test context based on the formula found on Wikipedia .

Another method of determining significance with small sample sizes is called a randomization test of goodness of fit or a Monte Carlo simulation.

The idea here is that you conduct a series of simulations of your experiment using random input data to see how often you are generating results at least as extreme as the results of your actual test. There isn't a simple formula for this, but it is pretty easy to code in PHP, so I've included some code below. The caveat with this method is that running a decent amount of simulations (say 100,000) takes up a lot of processing time. This function will return true if the Monte Carlo simulations seem to indicate that your results are significant, and false if they are not.

```
function

monteCarlo

(\$

expectedCTR

, \$imps, \$obs_chi, \$obs_p, \$runs=100000) {

\$clicks = round(\$

expectedCTR

*\$imps);

for(\$x=0; \$x<\$runs; \$x++) {

\$run=0;

for(\$y=0; \$y<\$imps; \$y++) {

if(rand(0,\$imps-1)<\$clicks){

\$run = \$run+1;

}

}

if( (pow(\$clicks-\$expected_clicks, 2)/\$expected_clicks) >= \$obs_chi) {

\$above++;

}

}

if(\$above) {

\$

perc

= \$above/\$runs;

if(\$

perc

>\$p) {

return false;

}

else {

return true;

}

}

else {

return true;

}

}

```

 Learn how small businesses can level the playing field and generate leads efficiently by leveraging inbound Internet marketing strategies and tools.

#### Written by Dan Zarrella

HubSpot's Social Media Scientist