For decades, researchers have preached the gospel of statistical significance. When running a research study and comparing two groups, do not merely compare the raw numbers, they point out. Instead, test to see if the difference is statistically significant.
For instance, let’s say a brand is looking to roll out a new logo. They run a test and find that 52% of people prefer the new logo, while 48% prefer the old one. They should switch to the new logo, right? Not so fast. Before making a change, they should look to see if the difference is statistically significant, or if it’s just likely to be due to the random chance of sample bias.
Paradigm Shift in Data Interpretation
Relying on statistical significance when evaluating a new logo makes perfect sense. It’s a one-time test (you’re unlikely to change logos every month), it’s an all-or-nothing one (you don’t want to have more than one logo for your brand), and you can control when the switch happens (you can postpone it a week to gather more data, for example). It’s also costly to implement, as you have to update everything — not just your product or website, but your business cards and packaging as well.
Unfortunately, though, those habits have carried over into the online world where the same rules may not apply. Traditional research firms conduct online brand lift studies using the same methodologies that were used for things like logo tests, and as a result, the data has been interpreted in the same way. More modern brand lift studies, however, which produce finer-grained lift data much more frequently, can be used for optimization. Data interpretation should, therefore, have a new paradigm.
Looking Through the Multi-Armed Bandit Lens
The setup is as follows: Imagine you are a gambler sitting down in front of multiple slot machines (various one-armed bandits, or one multi-armed bandit) with no knowledge of payout rates. How should you distribute your coins to maximize your returns?
The application of this class of problems to advertising should be obvious, and it has been used frequently by companies like Google and Facebook to optimize campaigns. You have different creatives for your campaign, for example, and you don’t know which ones are the best. So how should you divide spend between the creatives in your ad server? There is a trade-off here between exploration, or making sure you try out the different creatives in case they are better, and exploitation, or putting your money behind the creatives that are most likely to be the best. When viewed through the multi-armed bandit lens, making optimizations based on non-statistically significant data seems more intuitive. If you were 75% sure one slot machine was the best, would you not put more coins in it until you were 95% sure? What if you were 94% sure a machine was the best? Further, you are going to play the game, and deciding to leave an equal-weight strategy is still a decision about how you’re spending your money. Once you realize you’re going to use some creatives, you need to decide how you will allocate ad budget. You could not use data at all (equal weight), or you can apply some data, even if imperfect.
At the other end of the spectrum, should you be willing to make an all-or-nothing decision based even on a 95% confidence interval? There is still a 1/20 chance that the data support a conclusion (wrongly) due to random variation. That sounds like a low chance, but if you are examining a lot of possible decisions, there’s a high chance that at least once is simply a random variation. And again, in our logo test example, we had to make a decision at the end, because we can use only one logo at a time. In online advertising, however, as illustrated by the multi-armed bandit, we can keep a small amount of money flowing to a creative that is losing, reducing the chance that we make a mistake on a false positive. For a decision that is costly to produce and difficult to undo, however, it may be wise to hold a standard higher than a 95% confidence interval and/or look for other data pointing in the same direction.
Optimizing Campaigns without Statistical Significance
A technique like Thompson sampling, where the allocation to a strategy matches the probability that it is the best strategy, is appropriate to use, but there are some conditions:
- It requires that you can make small allocation changes. This can work when changing creative weights, for instance, but probably doesn’t help you when looking at homepage takeovers.
- You should optimize frequently. The exploration part of the exploration vs. exploitation only works if you come back to re-optimize based on new data.
With these conditions, though, we can feel confident in leveraging modern brand lift solutions to make changes to a campaign, even when the brand lift results are not statistically significant. It is mathematically provably optimal to do so.
Learn more about how a cutting-edge brand lift technology solution can help you elevate your brand measurement and campaign optimization game.