Create a more impactful testing roadmap with this framework – ft. Edmund Beggs

AI-Generated Summary

Documentation. The unsung, yet incredibly important part of any experimentation program. Great documentation allows you to look back at past experiments and find patterns of what were the most effective levers to inform your testing roadmap and levers you probably can deprioritize. In today’s episode of Experiment Nation, our own Charlotte April Bomford recently spoke to Edmund Beggs about the “Levers Framework” which allows to deliver results for their clients in shorter amounts of time. If you’d like to learn more about this interesting framework, watch the full interview.



AI-Generated Transcript

Edmund Beggs 0:00
The Levers framework is a way of really organizing these levers. So you know, these changes in the mind of a user, which we think are going to impact behavior into a standardized set of categories. It’s organized in a kind of hierarchical structure, with each level of describing kind of subcategories of the level above it.

Rommil Santiago 0:23
My name is Rommil Santiago. I’m the founder of experiment nation. In today’s episode, Charlotte Mumford speaks to Edmund Beggs about his Levers framework for categorizing conversion rate optimization experiments and insights based on how they impact the user mindset. The framework provides a common vocabulary and structure for practitioners to organize data prioritize which experiments to run and conduct meta analysis across many programs. We hope you enjoy the episode.

Charlotte April Bomford 0:48
Hello, and welcome to another experiment nation podcast. I am your host, Charlotte Bamford. today’s podcast is quite exciting, because we will be interviewing Edwin Beggs, who is a CRO consultant for And is the mastermind behind our topic. Would you mind introducing yourself further? And then?

Edmund Beggs 1:08
Yeah, I mean, that’s pretty good introduction for me, I’m conversion optimization consultant. But I have a particularly strong interest in conversion rate optimization methodology, developing frameworks to help us kind of work and particularly learn at scale in more efficient, more robust ways.

Charlotte April Bomford 1:27
Awesome. That’s actually a good introduction as well. So right, let’s get into it, because this is something that you guys have been working on. And it’s exciting to announce. So right Levers framework. First, what is a lever? And why is it important to CRO you? Sure. So levers can be described really as a way of

Edmund Beggs 1:55
talking about describing denoting something that’s going on in the mind of a user, which can influence their behavior, practically for us, there are ways to group experiments based on the similarities and how these experiments are likely to impact the mind of the user. And that that focus really on how the user is thinking is really central to it. They’re intended to apply across lots of possible executions, different ways, you might run different kinds of experiments to offer us broader learnings. So we’re not interested just in the specific components involved, for example, we’re really interested in those broader themes that come out across experiments. And I think, you know, that can sound a bit abstract, but with an example, it can be pretty straightforward. I think, if you were to take a really simple experiment, like you’re adding a 4.6, Star Trustpilot rating to a website homepage, maybe at the bottom of the hero section.

And you run the experiment, and you want to think about what are we learning? And likewise, you know, how would you even describe what it is you tested, because there’s a very literal way in which you could

just describe the exact implementation, right, it was a 4.6. So as far as that rating, maybe there’s a number of reviews associated with it, on that particular position on the page on that particular website. But we really wouldn’t want to limit our learning from that experiment there, it wouldn’t tell us a lot about what we should do next.

And so that’s really where levers come in. They’re a way of describing your independent variable, what we’re trying to change in an experiment at a slightly higher level.

So in this case, you know, adding Trustpilot rating, we think it’s probably something to do with trust is, you know, a broad way you could construe that experiment. If you want it to be a bit more specific, you might think in terms of the credibility of your sites, and How reliable do you seem? How plausible are the positive claims that you’re going to make about your product or service?

And you could even get a bit more specific there and think, Well, are we talking about social proof, specifically, the way that we can appeal to people like you as a user, in order to seem more credible here. So, you know, I think you can see that it’s all about finding another way to think about your change, one that can be applied to other changes, you know, even within social proof, you might think about testimonials as an alternative way to do it. Or if you’re taking a slightly

broader view on the independent variable, again, credibility more widely, you might look at what we call authority driven credibility, so that appeals to authoritative third parties, it could be reputable brands, you’re associated with third party institutions that have validated you in some way. And critically, when we start placing these categories of levers on top of experiments that are in a sense, about testing and exact execution, it brings out this central idea, which is that the independent variable and the hypothesis of the CRO experiment can be defined in a lot of different ways. And that leaves interpretations how we decide to describe our experiments and what we’re changing is going to impact what you’re able to learn from an experiment what kinds of claims the outcome of an experiment can be brought as evidence for or against, and the

That really is the core of the Levers framework itself, because we’re looking to describe the changes of an experiment at three levels of generality, again, based on what it is they’re doing in the mind of the user as we understand it.

Yeah, it’s very interesting. You said very well. Well, we want to expand further, can you? Can you tell us what is the Levers framework? Sure. So this is Rama Santiago from experiment nation. Every week we share interviews with and conference sessions by our favorite conversion rate optimizers from around the world. So if you liked this video, smash that like button and consider subscribing, it helps us a bunch. Now back to the opposite the Levers framework is a way of really organizing these levers so that you know, these changes in the mind of a user, which we think are going to impact behavior into a standardized set of categories. It’s organized in a kind of hierarchical structure, with each level of describing kind of subcategories of the level above it. So at the highest level of generality, right at the top, we have just five pretty simple categories, but very broad ones. So cost trust, usability, comprehension, and motivation, I think concepts will be familiar to CRO practitioners. But each of these concepts is then going to have some categories that sit within it. And any experiment that sits in that category is also going to belong to the parent one above. So usability, for example, might be broken out into specific things related to user flows. So how do users understand you know, where they are, where they need to go next, what the next action is? effort, you know, how, how much friction? Are they feeling? Any experience? How much do they anticipate it’s going to be a lot of effort for them. And even in, you know, how we prioritize attention, and so on.

Cost, again, can be broken out into, I think, probably the most intuitive side of it being the financial costs. So, you know, how are you paying for your product, how are we able to position that pricing in a way that makes it seem as appealing as possible. But the other side of that might be the soft costs, you know, in order to benefit from some kinds of products, you need to put a little bit of work in, if you’re looking at something like Duolingo, lots of educational products, and even you know, newspaper subscriptions, there is something that you’re going to have to contribute some kind of additional downside baked into the use of the product, which you might also look to mitigate in your experiments. So that already brings us I think,

into a clearer image of how it’s structured, we have these

broad categories at the top, then we get a bit narrow within that. And then finally, we have a third level, which is breaking out these kind of mid level levers into what because suddenly that so almost granular kind of

change in the mind of the user that we’re looking to categorize. So to return to an example I mentioned before, credibility is, you know, one specific kind of trust, and that might be broken out into contrasted appeal appeals to authority on the one hand, as I mentioned, and social proof on the other. So two different ways of achieving similar kinds of thing in the mind of a user. But with a slightly distinct foundation for each in terms of the kind of terms of how we expect it to work in the mind of the user, right? It’s just, it’s a different thing to appeal to people like you, versus authorities, and in different contexts, one or the other might work better.

And yeah, that’s, that’s the core structure of it. And fundamentally, what that’s trying to manage why we have three levels, is this trade off between the bigger and the smaller categories. We started putting this together with things 130 fairly disparate kinds of lever categories we’re using, you know, each practitioner could define their own categories use for their own work. But this obviously meant there was a degree of duplication, overlapping concepts, but also a real variety and really broadly scope things big concepts like UX, sitting next to concepts that were very specific to an industry or even one specific website. So we kind of had to make a decision really, like how are we going to how are we going to approach this problem, there are benefits and downsides, really, to each, you can take a small list of very broad categories to describe that space of experiments that we’re running that space of changes we’re making in the mind of the user. And, you know, we can have a few, very memorable, easy to learn things with large sample sizes, you know, we do have some 1000s of experiments in our database, but often you’re segmenting by things like industry, the kind of page that you’re working on that there are lots of other variables that are gonna be relevant here. And so, there is an advantage there to keeping that reliable sample size will be it to a kind of category which is very broad. The narrow categories on the other hand, there’s very specific things things like sub levers, they have the advantage that because the scope of that category is smaller the average level of similarity basically between the concepts within that to the experiments you might categorize that is going to be

smaller, smaller, so

That difference is smaller. And therefore we expect there to be more predictive power really, when you’re going from historical experiments through to trying to estimate how might these experiments behave in the future? Understanding? How can a category of experiments that are very similar to one another have behaved is going to give you

a slight advantage over the broader categories that contain kind of more desperate changes? So we tried to kind of bridge these two things together, how can we have the flexibility to sometimes use very broad categories, and sometimes use very refined and narrow ones. So that you can adjust the level of the framework that you want to use how specific the categories are to the kind of query you have the amount of data you have in order to answer the questions. But you can also bridge some of that usability gap that would happen if we provided just a massive list of lots of very narrow categories, for example, because if you start at the top of the tree, you have just five options to choose from. And by choosing to one of those five options, you’ve already eliminated most of the possible categories that are there, you can repeat the same thing at the next level. And you’ve actually not had to compare a lot of things in order to quite efficiently sort your way through all of the options. And I think the the final piece around the core structure of this that I think can be really useful, is that by grouping together your narrow concepts to describe kinds of experiment underneath broader ones, we can invite you to make horizontal movement. So you can see that say something like authority isn’t working particularly well, to solve your credibility problem, it invites you to consider the alternative solutions, which might belong under social proof, for example. So

as much as we you know, we’ve worked a lot on the the specific concepts that are going to fit into this framework, and that ever evolving this core structure is, I think, a big part of what gives it the strength that has when you start looking at how you can build strategies around it and conduct meta analysis.

Charlotte April Bomford 12:00
That’s actually a very interesting thing that you’re saying right now. And I think like CROs around the world would be like, okay, if I’m going to start this you experimented rotation program? How would that framework work into, you know, the program that we’re working on? Sure. So?

Edmund Beggs 12:23
Well, the short answer is that levers get embedded, it’s a pretty cool concept, really, across the strategy. They’re gonna inform, you know, if you recommit to this process, how you group and interpret insights about your users, as you gather them through user research, all the way through the strands of experiments you’re going to explore. So how you’re going to structure your ability to learn, pivot and iterate on what kinds of experiments are likely to be most effective, few prioritization and meta analysis and so on.

But yeah, if we kind of walk through each step a little bit, I think it’ll become a little bit clearer how that works. So let’s say you’re right at the start of an experimentation program, you haven’t actually necessarily run any experiments yet. And you’re trying to think about, you know, how can I develop a better understanding of users, you’ve hopefully gone a bit beyond best practice, we’re trying to develop a richer understanding of how users behave and the specific context of your site, deployed some maybe mixed methods research, you have some analytics data combined with some maybe usability testing, surveys, and so on. And some foundational insights, basically, that one of the first things you can look to do is categorize these insights grouped them together, based on these look at the categories, these categories from the lever framework, you could, for example, if you see a lack of brand confidence, categorize it and trust, maybe at the broadest level, but again, you might have data specifically suggesting probably this is credibility. And maybe even a problem with a lack of social proof of authority really depends on what the data is saying. But the key thing is that if you have these broader themes and want to group your specific insights, it’s already helping you to build out your mental model of what matters most to users. It’s a way to put all the different insights you might get together

and manage the fact that some of them may be positive and received and some negative and kind of work through them and the key here and group things and get a kind of top line priority, where is it that we need to start looking and focusing with our initial experiments.

From there, when you have your top line priorities from levers, it can if you have access to them to the data, be a really useful thing to look at what other solutions already exist. So a conversion we benefit from having these 1000s of experiments in our database. Once we get a kind of at least heuristic mental model, or as we add more research and kind of more fleshed out full view of how users think on our site, we can use that to look at what kinds of solutions have been effective in the past at resolving problems or opportunities under a given.

But even if you don’t have that data, I mean, it gives you a way to structure your thinking to get out

out there and start looking for solutions or to create entirely new ones. It’s all about having a clear cut mental model that’s going to help you build out a backlog of experiments that are actually aligned to the things that that users are indicating in your research or problems or opportunities.

Charlotte April Bomford 15:16
And so, kind of like the list of experiments that’s already done based on the methodology methodology. So it kind of like saves time, especially for you know, you Cerro experimentation program. Yeah, no, exactly that,

Edmund Beggs 15:36
again, actually comes back in way to the structure of the framework. One advantage of having those narrow categories that you can work your way down to is that you can restrict the set of possible things that might come up

instead of solutions, particularly if you’re looking at a larger database, that might be applicable for social proof can be much smaller than those that are applicable to trust more broadly. So if you can get that more specific understanding of your problem, you benefit from

a more efficient search for what kinds of solutions works, well, I like this. That’s very interesting.

But yeah, and that’s obviously a key

part of the process actually going from research through to experiments. But often as important, the thing is determining, which is which of those experiments you want to run, right, which things you want to prioritize. And again, that is somewhere that if you have organize your insights around these, these lever level categories, it should tell you something about which ones should be prioritized, and which ones are most important. And this is even stronger. Again, if you have the benefit of a database, you might build up over time, because it’s going to give you some data from from meta analysis, which we can maybe go into, in more detail, I think it’s a rich topic.

But we’ve invested a lot in conversion and trying to find these robust and automated ways to use the massive amount of data in the database. That means using the performance of experiments that may be outside a particular program, so they’re not necessarily run on our website, maybe they just run in that industry or more broadly, but they’re run under that same kind of change in the mind of the user. And that can already tell you something about how you ought to rank these experiments, which one you ought to focus on, on trying to run and following that thread through what what is it you’re learning when you run each experiment? And so how should you look, to iterate you take that really valuable data, that is the AB test itself, something which is can be really robust, using this real world data, to then teach you even more and tell you, you know, you need to do more of this kind of thing, less of another kind of thing.

And, actually, as a side note, I should mention here, when you’re going through that process of going from one experiment and deciding what else to do under that, under that lever, it’s really important to pay attention to the losing experiments as well, I think it could become easy to just double down on the winner and ignore the loser. But actually, what we’ve seen, again, in some of our meta analysis is that direct iterations on a loser actually outperform direct iterations on a winner. So often, it’s as much about trying to understand, in your specific context, what matters most to users? And how can we find a way to actually solve that, again, thinking in terms of which levers seem to have the biggest impact on users, whether it’s positive or negative? And then how do we leverage that in order to generate a positive experience out of it?

And yeah, that’s, I guess, a foundational part of how we want to work, right? We want to use our AV test alone over time, and levers are fundamentally about trying to do that. We don’t abandon our most specific sub levers after just one test. Because we know the execution itself can be really responsible for what’s going on specifics that the context. Instead, we’re looking for cumulative data, over time, all organized around these levers and giving us those opportunities to make the horizontal shift once we have accumulated more data, we know from from running several experiments that these appeals to authority aren’t as effective. How can we solve this credibility problem by social proof? So you can kind of see how all the way through from how do we organize our insights through how should we follow this thread, to coming up with concepts to prioritizing those concepts, and then making sure that we learn from the experiments we run over time that it can really formulate

quite a deep part of the strategy that kind of follows that thread from from start to the iterative process that goes on through an experimentation program.

Charlotte April Bomford 19:48
mazing so how does the Levers framework enable meta analysis and why is this so important? Yeah, this is, I think, maybe what the Levers framework is most valid

Edmund Beggs 20:00
Well, I think where the work that’s being done with it is most interesting and was a major reason that we started working on it in the first place. And fundamentally, it comes back to wanting to approach CRO like a science. You know, when we’re doing AV testing proper AV testing, we’re adopting a scientific method, we’re using randomized controlled trials and statistical analysis to make informed decisions. We want to develop a robust understanding of the phenomena of interest. And that in our context, is user behavior on our websites and study, right, so we want to take all the lessons we can take from science with the rigor of a scientific method, and do that properly. And one specific thing I think we need to be really conscious of is that there are problems facing other areas of science, particularly those concerned with understanding and predicting human behavior. It’s now looking pretty well known that quite a few high profile findings in psychology and behavioral science have been shown to either be a bit misleading or less impactful, or at least only partly true. And let’s say and there have been lots of prominent examples of this.

A couple of things that, you know, really captured the popular imagination to a real extent, I think were concepts like power poses and grit, which were popularized by, by Ted Talks, which, you know, both little bit of controversy because some components of them seem to replicate and others not.

You know, those may be familiar examples. But there’s a much deeper problem going on here. In behavioral science, which was, you know, some of the longest standing foundational work on things like priming effects done by Daniel Kahneman, has come under really extensive criticism about how impactful it is. And even the existence of some of these kinds of effects where these little cues in the environment are leading to big changes in behavior. Some people are concerned about the entire idea of small nudges being an effective way to induce substantial behavior change. And actually, there are mainstream or the mainstream estimates of how often we expect experiments to replicate in those examples. So for psychology, Is there less than 50%, there may be as low as 20, to 25%. So very often things that are seeming very interesting in a narrow set of experiments, or, yeah, just one off programs that are being done, actually don’t seem to replicate very well, when they’re subjected to the wider challenge of being tested in different ways in different contexts and stretching the that the scope of the theory behind it actually works. Everywhere, that it’s that it’s predicted to. And, you know,

we know that CRO experiments aren’t going to be exempt from this, we’re often looking at, you know, bigger theories and claims that are going to come from our experiments, bigger claims, like, you know, credibility is really

important for for subscription companies, for example, these are claims that are going to be under determined by the data from from any one experiment. And we need to go through that

kind of rigorous scientific process of

replicating experiments, running experiments and collecting data across lots of iterations of them to see how effective they actually are. In practice making more novel predictions and seeing that they replicate I think, is probably the fundamental thing that’s that’s come out in terms of how we can learn beyond the kind of mistakes that can happen with an individual experiments were because they were tested in too narrow context, because there were mistakes in the statistics used, just genuine randomness, or just, you know, errors in the research design itself. These are all things that can throw at any one experiment, it’s on aggregate, it’s over time testing things again, and again, that we learn and leave a level claims in particular, are not going to be exempt from that they are by definition, broader than the claim made

about any one execution. So if we want to be able to make those big claims that, you know, credibility stimulus is important, as I say, or, you know, kinds of usability are particularly important in one industry or another kinds of motivation work well, in one industry or another, they can’t be confirmed or falsified based on one or even a handful of AV tests, we need to have a way to accumulate data from a larger sample across various executions. And that’s what we’re using the lever framework to do. And in fact, that is, one of the main motivation to develop it in the first place is to have a common vocabulary and mental model to actually collect data at that kind of scale. You know, we have programs that are going to be run by on different industries, different websites, but also by different practitioners over months and years. So how is it that we can have an underlying framework to help us capture this data? Whether it starts, you know, in a fairly straightforward way, which is just do we get a better estimate of the overall win rate or effect size associated with a particular group of experiments? Just

Some simple descriptive data from your program that can already yield some great results. And we’ve seen, particularly in these longer term strategies where there’s lots of data tested there. And they can really exploit those sub levers, those those narrower sets of really similar levers that they can read and find under and over exploited levers. So things that we need to be more of or less of based on what we’ve done, historically. And I think one really useful lesson from that, for anyone who’s already gotten experimentation program, but is interested in trying to leverage levers to learn more is that we’ve recategorized, the strategy of behind experiments for programs which were using a different set of categories from before we started implementing the Levers framework. And it’s actually opened up learning. So we found that when you define it in terms of a particular sub lever, you end up discovering that, in fact, this has always worked well. But we’ve really not tested it as much as we’ve tested other things. So it’s an under exploited area.

But once you start to collect large data sets, and that and I think you can get into a really kind of exciting and powerful place with this is when you can start to deploy machine learning as we’ve been doing. So we’ve started to use confidence AI, in order to take these 1000s of data points that we have to generate stronger predictions about the outcome of an experiment, I think we’re looking at something like 60% accuracy at the minute, we want to feed more data through this, we want to train it more again, get better and better predictions. But so far, the data is just really promising and seems like something that’s likely to, you know, substantially outperform the standard industry win rates.

And while levers are just one part of this prediction, you know, we also categorized based on the executions, psychological principles, and so on.

We know from the data that the lever when combined with the industry is the most important thing. So these are the two variables that taken together, are weighted most strongly in this model. So if there’s one kind of new tagging that you’d want to introduce a meta analysis, I’d strongly suggest that the Levers framework is a really powerful way to do that to generate these more robust learnings in the long term.

Charlotte April Bomford 27:12
It does sound like that, that does sound like it’s a robust learning to we’re gonna get this looks like a huge advantage for CRO worldwide. And

just a question, then how’s the Levers framework being used as an educational tool?

Edmund Beggs 27:28
Yeah, it’s

an interesting one, because it’s not necessarily one that we had in mind, I think, my intention, when we first set out to consolidate our lever categories into a consistent framework, it was really about gathering high quality data for meta analysis. Yeah, it really was sort of more about this kind of background,

earlier, but I think more exciting thing of just trying to improve the way that we collect data, and we can generate, you know, big picture learnings from it, it wasn’t necessarily something that I ever imagined was going to be a more public facing tool, something that we would put out there in the world to open, wider discussions. But in the years since the project first started back in 2020, we’ve grown a lot the conversion, we’ve been fortunate to grow from something like 25 people to 60. And this has meant a lot of new consultants coming in, right, a lot of new practitioners of CRP with that with their own models, a lot of different ways of doing things diverse and interesting ways that we can feed into our processes. But nonetheless, you know, different ways of looking at things. And in particular, we took on quite a few Junior consultants, who actually like me when I joined conversion, in 2020, we’re fresh graduates from a behavioral science background. So

Unknown Speaker 28:45

Edmund Beggs 28:47
are used to thinking about the way that people think right in a scientific way, not necessarily, with the specific kinds of changes that we make in CRM, we have this kind of intensive process of learning where you get exposed to all these different kinds of experiment implementations really quickly, which is a fantastic and exciting experience. But it became clear that the Levers framework was one way that we could introduce people who are newer to CRO to this a way to structure this wide range of of interventions on website. So there’s so many different things you can do and so many different site contexts, how do we give people a kind of a common vocabulary, a common mental model to start organizing that. And now that’s no more widely spread or conversion, or consultants, when they join become really quite closely acquainted with the framework from early on, much more embedded in how we work now when we have these, you know, better aligned mental models more of a shared understanding. And I guess our hope is that, as we put this out into the world more as we invite more conversation around it more feedback, that some of this vocabulary and way of thinking might spread further, but critically, I think that we can evolve it we can cover more use cases that other people may have spotted and continue to evolve.

Over time, and I think that comes back to the idea, I guess behind a lot of this, we want to be doing good scientific work within within our

context and the kinds of behaviors that we’re interested in. And these kinds of theories, approaches to things have to be able to evolve over time, right, they have to be able to be corrected and updated as we get better, more data. And I think that’s a big part of the ambition here, I’m really interested to hear more about how people are using it, how it’s working well for them, but critically, how it’s not working well for them, how we can refine this and make it more effective tool.

Charlotte April Bomford 30:38
I’m actually like, really interested in this topic and this lever framework, because I think when I started as a CRO specialists, the first thing you know, you have all these quantitative qualitative data, the first problem that I usually encounter is like, where do I start? What’s the first solution and all that and having this framework just eliminates a lot of thing, it stops you from wasting so much time on creating a solution that might not be, you know, helpful in the end, or something that shouldn’t be,

you know, won’t be as effective with your user as to the the sample size, like you’ve said. So it’s actually very, very interesting that you guys have put this up. And, yeah, so

the next topic, I think, that we’re going to talk about is

benchmarking. Am I correct? Yeah. Yeah, yeah. Yeah. Okay. So the first question I have would be, what is sero? Competitor benchmarking AD conversion.

Edmund Beggs 31:52
So, so the way we use it conversion, is as a way to evaluate the strengths and weaknesses of websites, particularly to gather a set of sites that share some kind of common challenge theme, value proposition, whatever it is,


try and come up with metrics to describe how well they’re doing across a range of things, which going back to the Levers framework, we have reason to think are likely to be important

for their users, specifically, I guess it’s a way of assigning numerical values, when you’re similar to what you’re mentioning them merging different kinds of data. So you may have a mix of, you know, particularly lots of qualitative data, some heuristic analysis even in there. It’s about trying to build out a set of questions that you can answer these research methods that can give you a more systematic way of looking at the strengths and weaknesses of a site. So how well are the tactics being deployed that might make the site more usable or persuasive, or trustworthy? And it’s all about being able to compare as directly as possible. So by having these structured questions, it’s fixed criteria.

It makes it much easier for different practitioners working on these projects over time, for all the

research designs behind user testing and surveys that might feed into this to be structured in a way which is actually going to feed back into an overall kind of coherent set of questions, where at least always asking the same things about ourselves, we have some agreement on what’s going to make up a good CRO architecture. And from there, we can look at trying to fill in from more specific questions that might, that might ladder up. So you can never have, I guess, a complete

common understanding of exactly how to interpret every bit of data, and particularly where you’re leaning on kind of the expert review component, there’s always going to be some differences that we know that everyone’s at least asking themselves the same questions and doing it in the same structured way. And I think that’s probably

a fundamental thing, really, if you have the ambition is that we have with this, which is to make it something which is

viable to be deployed at six month intervals to give more data on improvements and the inside time, obviously, it’s notoriously kind of tricky when you start looking at a directly measurable metric, like conversion rate over time, which is always fluctuating for reasons. So building our benchmarking is meant to be a way of

demonstrating that progress is being made, but also finding where those opportunities are over time. So you have noticed is one, one off projects, but repeated projects and even comparison points across projects, different industries that may have common features, and in the long time just building up

a database of this kind of stuff that then can provide I think, some really interesting kind of bigger picture insights. Right. So

The simple answer, I guess, initial question would be, it’s about trying to assign this standardized set of criteria to compare across websites. But as you can see there, hopefully there are

Charlotte April Bomford 35:12
important reasons for doing this and things that allows you to do in the long term. So what’s the difference about your approach to benchmarking?

Edmund Beggs 35:22
I think, aside from the fact that the questions that make it up are based on this collaborative understanding we like to think of, from the decades of cumulative experience, there is a convergence. So the people who are feeding into this and around, I guess validating through the meta analysis. It’s certainly built around these categories from the Levers by work, which we think we’re increasingly demonstrating. Through our meta analysis. In particular, it’s a really powerful way to categorize changes in the mindset of a user. So knowing that the Levers framework is conceptually coherent, and picks out real things in the world, it gives us confidence that the way that we’re grouping these scorings isn’t arbitrary. It’s not just in the opinion of that practitioner makes sense. It’s actually something which has been demonstrated to work in a lot of contexts.

And this can provide a nice kind of headline, top level view of how each site is performing compared to one another, just taking the top five and abroad is master levers so that a cost trust usability, comprehension, and motivation to give you a kind of, at a glance view of what’s likely to be performing well, or poorly on a site,

I think, to get into how you can get up to one of those athletes, because they are very broad things. They’re composites of lots of individual questions. So for cost, you might ask, is that compelling price framing being deployed, and, you know, are discounts being communicated effectively? For something like usability, it’s gonna be made up of lots of questions like, you know, by the action buttons well placed and clearly labeled, the field names clearly communicate, what’s needed for user input, and very practical thing is that

each of these top line levers scores is gonna be made up of 15 to 30 questions, depending on the context of the study is going to make some questions more relevant, which pages being looked at, and so on. But it gives you a way of taking very practical detailed things and building them up again, into this kind of broader prioritization framework, where is it that we should look in order to have the best impact. And because it’s structured in that way, it is a very complimentary research exercise, if you are building a program around levers already,

you can feed directly into your strategic approach, you have all of your insights kind of gathered around these these key top line metrics, a sense of which ones are performing well against other sites. And then you can follow that same conceptual thread, the idea of a lever through the kinds of solutions and roadmaps that you’re then going to go through all the way into meta analysis. So again, it just by having a coherent mental model in your head of how it is you should describe the things that are going on and on your website, in the mind of a user. It just gives you that ability to jump from one step to the other without having to switch mental model every time it gives you these these groupings.

And again, it’s

I think, it’s important to think of

how it fits together the legal framework in terms of the lever framework, not really being

realistic with sort of recommendations of what you should invest in your site. So it’s not saying that

you should try and do all these things and all these things at once. We obviously know that in one context or another, each one can be important, but you kind of need to fill in the data to understand on your specific site, or in your industry, or whatever the scope of your learning is, what are the most important things in that particular context. And this is a really nice way of structuring that research because you have a comprehensive model, a very broad model, at least framework. And you also have a kind of comprehensive view of your research data by structuring it around these kinds of benchmarks for each lever. So you have

data that matches up in the in the breadth and scope that is going for, with the way that you’re actually then going to pursue your strategy. And I think this is

Charlotte April Bomford 39:31
a, I guess a distinctive important way about how we’re using it is that it is so complimentary with the rest of how it working. It’s kind of like matches. So as you’ve said, you have the framework and then you have the benchmarking, which kind of like complements the work that you’ve done in the framework and kind of like gives you what the next steps are and how the impact is of that specific

you know, change on the website.

or any type of you know, platform? That’s interesting.

Question. Okay, so it’s in How has it been deployed? And what has worked? Well?

Edmund Beggs 40:15
Yeah. So it’s primarily been deployed, as, I guess, a foundational research piece. So right at the start and experimentation strategy and program, you’re trying to understand, where are we now? And where should we focus our efforts. And this is a great time to run it, as I say, because you start filling in the details of the box, it’s like a reverse framework can give you

I guess, to give handable sense of what that might look like. So one of these competitive benchmarking projects was run for an online food vendor. And they were found in these kind of top line reviews to have real growth opportunity around the way that users perceive costs or product options. So how are they best able to, to present the kind of price incentives that are there for users, and obviously, you know, in,

in food, selling online, this is a really important thing, like this is something that deals bundles are something that we know from from other contexts is something that users are always looking for. But taking that top line view could then lead us into specific questions that we know are important to this big picture opportunity. So components of a site, we know that it’s certainly they need to do better at how can they go about doing that better at it? Well, then we can kind of use that to then dig down into the data behind it. And we found that there was a perceived lack of stuff can then what deals and bundles were actually available. In fact, users who weren’t logged in weren’t seeing any of these weren’t having them surf with them as an incentive. So the natural step as one of the first experiments you’d want to run was just to introduce a variation which surface these deal options for users. But interestingly, still with gated content, so you could show users an image of what the bundle was a name for it, but actually not showing what the content is just still maintain an incentive in order to sign up.

And actually,

both signups and the conversion rate itself. So pretty tangible example of how you can start from, okay, what’s my kind of top line dashboard, almost where we need to look, and then follow that thread down until it gives you the specifics that you need from the from the more detailed parts of user research and what you’re going to do.

I think one very neat example, I heard about recently, a conversion was for a supermarket brand where they actually had, I guess, a really clear kind of

big picture challenge that goes beyond just the website, it’s about how they’re perceived as a brand more broadly. And that was basically economical, but stereotyped is not as prestigious or reliable as some of the alternatives. And those are the exact kinds of bigger picture problems, which we’re going to see prioritize. In this kind of benchmarking approach. When you see that the trust metrics just up there, there’s not enough being done the feedback from users isn’t, isn’t trusting enough. So working down the framework a bit we’ve thought about, well, how can we apply authority as a way to resolve this problem? And through, I guess, quite simple implementation, but one which scales well, across the an actual user journey on that kind of supermarket website. It’s just a highlight all the award winning products that are there. So they have this kind of this

rich collection of products that you can apply great authority.

And yeah, so I think, why do this, this experiment hasn’t yet been been run, I think it’s a really nice example of why prioritizing these bigger picture, things like trust is important because users aren’t necessarily gonna call out specifics on a listing page, like, oh, there’s no no award winning

signposting going on here, they’re gonna raise across lots of different contexts and pages, this wider concern around trust, and then you can take that and combine it with the user research you’ve done to try and narrow down what the solutions are. So that’s how it’s working

broadly, for our client programs that we run a conversion, but what we’re trying to do it then then it is deployed more broadly into more public facing Industry Insights stuff. Minute, we have really interesting project going on to benchmark five different subscription brands. So

it’s Netflix, Spotify, Kyoko and HelloFresh on Duolingo.

To really understand how they’re dealing with a shared challenge of subscription. So the idea is that you take a load of websites with very different value propositions in some ways. Don’t share a lot with one another and try and dissect out the common theme

aim, which is they’re all trying to effectively sell subscriptions.

Yeah, this, this should be a great piece of work that should come out relatively soon. And our hope is that these kinds of initiatives, these kinds of projects that take a kind of broader cross industry view and can be shared more widely as something that can provide really practical examples, really useful insights for people across the CRO industry. And I think it comes back to some of the same ambition that’s there with the Levers framework around trying to be part of a wider conversation. We really want to get more feedback from people understand more about how other people are working and to try and put resources out there that that can be as helpful as possible to other CRO practitioners.

Charlotte April Bomford 45:42
Amazing, amazing. Thank you so much for your time again. Esmond bags CRO consultant for and is the mastermind behind all of this crazy crazy things. Like I don’t like you know, your mind is amazing. And thank you so much for bringing this amazing framework and the benchmarking and all that

Rommil Santiago 46:11
And yeah, glad to have you here. Again, this is Charlotte Bomford and thank you for watching experiment nation. This is Rommil Santiago from experiment nation. Every week we share interviews with and conference sessions by our favorite conversion rate optimizers from around the world. So if you liked this video, smash that like button and consider subscribing. It helped us a bunch

If you liked this post, sign up for Experiment Nation’s newsletter to receive more great interviews like this, memes, editorials, and conference sessions in your inbox:

Connect with Experimenters from around the world

We’ll highlight our latest members throughout our site, shout them out on LinkedIn, and for those who are interested, include them in an upcoming profile feature on our site.

Rommil Santiago