Q: When is it OK to run 2 A/B tests simultaneously on a site? Is it ever OK?
We recently posed this question to the Experiment Nation community on LinkedIn. This is what they said.
Surprisingly, people can get quite passionate around this topic. As you can imagine, there are 2 camps of thinking: “Duh, yes you can”, and “Hell no you cannot”. The arguments are generally as follows:
The No camp usually says:
- This ignores interaction between experiments
- Traffic should be isolated
The Yes camp usually says:
- If you randomize properly, it all comes out in the wash
- It is impractical to run tests serially (i.e., one at a time). Who has time to wait?
- Isolating makes no difference because once you go live you will be releasing a variant in untested conditions anyway.
As my typical style, I am somewhere in the middle. So I’d say “Yes, but with caveats”. Here are my general rules-of-thumb based on my time testing at various environments, but mostly from supporting Loblaw Digital which runs dozens of tests at any given time:
If you can test one thing at a time, why not?
If the test is critical enough, and you can afford it, test serially (i.e. only one thing at a time). This will give you the cleanest data, and you don’t have to worry about test interaction. With that said, virtually no one in this space will be able to do this. If you’re one of the few that is able to, I’m jealous. For the rest of us, let’s move on.
Avoid overlapping experiences and KPIs
You should definitely avoid running 2 tests at the same time that are targeting the same KPI or are on the same user flow (i.e., upstream or downstream of each other). That’s like two bakers trying to figure out the perfect cake recipe at the same time, on the same cake – you just won’t have confidence in what contributed to success (or failure) – i.e., you can’t make any inferences on causality.
Don’t bother to silo traffic – quickly check interactions instead (generally)
The main reason I don’t bother siloing traffic is that once a test has concluded, most Experimenters roll out the winning experience. If you’re releasing features constantly, this will be happening continuously. Good luck trying to silo.
Another reason not to silo is that it probably won’t matter that much. So say you ran 2 tests at the same time on siloed audiences and rolled out the winner of each test. Because you tested in silos, once you launch the winners, they will be launched in environments that they have never been tested on – so where’s the gain?
To make matters worse, the more you slice and dice your traffic, the longer the tests will have to run because you’re killing your ability to detect changes.
I find it’s a better practice to perform a quick check on an experiment’s results (once ended), we’ll call “Experiment A” by
- Looking at the general results of Experiment A
- Looking at the results of Experiment A but only looking at visitors that also were exposed to an Experiment B
If the results are directionally different (e.g. one is increasing and one is decreasing), or if one seems to be amplifying the results of the other, there could be an interaction going on – and you may want to retest, or isolate (but only if it really matters of course. Time is money after all.)
Understand and accept you will never quantify effects perfectly
Everything is an estimate. Especially in business experimentation, there are so many factors outside of your control that you’ll never be 100% sure of the impact of a particular factor. You can get a rough idea of magnitude, but you’ll never be completely accurate in your estimate. Focus on making things better – and this is particularly true if you’re in product management, where you are often focused on not harming performance in order to unlock future potential.
But that’s just my opinion. What’s your opinion? We’d love to hear your thoughts! Find us on Slack and continue this conversation in the #ask-experiment-nation channel!
Until next time,
Founder, Experiment Nation