Stop having bad test ideas – with Bart Schutz

Video

Audio

Transcript

Bart Schutz 0:00
So this is what this is why this is the problem, that the whole problem again, I cannot say that enough is the fact that our ideas are so bad, you know,

Bart Schutz 0:11
75% of the things we think of do not bring sales, more sales. In reality, if we would only have 10% More good ideas or 10% sense and less bad ideas, it will bring in this analysis 40% more growth. So by doing an explorative phase and a Confirmative phase, you will definitely shift a lot of the bad ideas out of it already do in the Confirmative a lot more good ideas testing, it’s definitely going to be more than 10%. And then you know, exponentially, you will definitely find an increase of your effectiveness of over 40%. Right, probably 80% more real growth added to your program. So this is very important.

Richard Joe 0:58
Hi, it’s Richard here from the Experiment Nation Podcast. Today, I’ve got a special guests. It’s about shirts from eyeline dialogue. He’s one of the founding members of island dialogue, one of the world’s most renowned SEO agencies. It’s been around since 2009, with background and consumer psychology and a renowned guest on many podcasts and conferences. Welcome to the podcasts. But very good to be here. Thank you for the introduction, me being a psychologist. Very proud to be one actually. Well, the part of the reason why I wanted to interview us because, you know, online dogs, one of their main sort of,

Richard Joe 1:40
sort of, I guess, points of difference compared to other CRO agencies is the huge emphasis on, you know, in depth research on into psychology and consumer behavior. And that’s even in the kinds of tests that you run. So maybe you could you could just as a background, could you tell me some information about you know, how you got started in the world of CRO and Asian life?

Bart Schutz 2:06
Yeah. So my background is, obviously consumer psychology, I’m probably the worst psychologist you can get. Because both my parents have psychologists, my only systems of psychologist, Mary, they’ve psychologies half the company or psychologist. So let’s say I just have a huge interest in human behavior. And since I was born, and when I ran a, let’s say, my back and very quickly as I started the usability testing company, qualitative research I got into, but that’s what back in the 90s, right? Internet was sort of still in Nietzsche, and the new sexy. Then I got into online market research, which was more quantitative already. And then I got into consultancy, to use the knowledge that we got from our research on my research, and combined with the the insights from behavioral science to apply to on more strategic level to companies. But then at some point, you know, web data web analytics became available. And I want to play that, yeah, I’m a behavioral scientist, I love studying people’s behavior, I’m not so much a fan of listening to what they say because they don’t they don’t do what they say, right, and they don’t see what to do. So this availability of, of behavioral data was became sort of ubiquitous, right? It was everywhere. And that just gave me goose goose bumps, and I wanted to do something with it. But, you know, I was sort of a minor shareholder in the consultancy firm was working for. We didn’t we try to buy two web analytics companies, but it didn’t really work. And then one of my best friends was a witness at my wedding, that type of best friend, he taught me that he met this person by the name of Tom Wessling, who was fully into, you know, web analytics and experimentation, a B testing. So I needed to meet him. And so you sort of connected us. And yeah, two years later, the magic started happening because I quit my job. And we started online dialogue. And my idea of the combining the insights from behavioral science, together with them, the methods and tools that we had developed to studying human behavior within bagel science, and then the data available in the digital scene. That was, that was that was so much fun. And it was it was a real, like a thrill to work together with Tom wrestling, who was available to make sense of all that data and run the experiments and back in the in the early days, he was even still doing the coding of the experiments himself. And it was it was a lot of fun. Absolutely. What

Richard Joe 4:59
would you say? say that I mean, you were the so he was sounds like he was the technical arm of the of the company being able to code up tests and run them and look the data and analytics and

Bart Schutz 5:13
data analysis, right? Yeah, data analysis. I was, I wasn’t even able to do a simple analysis within Google Analytics. That was absolutely his his part. And not only just to sort of analyze the experiments that we did, but also like the pre analysis, how does the behavior look like, we will read that sit together, dive into the data, and he would sort of unveil all these, all these click path and the user journeys? And so are people switched off, and I saw the pages where they were, they spent too much time. And then so I came up with the ideas like it has to be this type of behavior, make distance certainties is a play, and he came with a lot of ideas himself as well. Right. But that combination was was really strong. And I have to say, I mean, we’re not the only company. I mean, you’re saying we’re sort of the thought leader of

Richard Joe 6:06
royal want to feel one of them. Yeah, right.

Bart Schutz 6:09
But there’s so many that also embraces, it feels like you sort of started a movement, because there’s a lot of I think, attention, I think the average knowledge on on how the brain makes decisions within the CRO scene is a lot higher than in a lot of other digital marketing scenes. So yeah, I’m very proud of that. And I love it. And I love the fact that also, you know, we’re sharing community, everyone’s shares ideas. And, yes, I

Richard Joe 6:38
like that aspect of it. So awesome. Ensure you can awesome story. Yeah, I’ve got our listeners who have got a lot to learn from you. In today’s chat. So I think the first thing you you mentioned, you know, before we went on the podcast is that a lot of CROs don’t realize that I think a third to a half of their tests aren’t actual winners. Can you explain that? And explain the significance of that? And,

Bart Schutz 7:16
yeah. So let me start with something else. I’m sorry. Yeah. So within CRO, we do lots of types of research to unveil what is the best decision, right. And the more users we have, and the better conversion rate, the more we are able to run a B tests. So we don’t we do a lot of AP testing within CRO, which is a very good thing. And for the listeners that are unaware of that hierarchy of evidence, or the pyramid of evidence, try to look it up, right, it’s a it’s a scientific pyramid, where they qualified, the types of research that are available within science, on on the quality of the evidence, and the higher you go in the hierarchy, the higher the quality of the evidence, or as I prefer to say, the less risk you are taking, by basing your decision on that type of research. So down on the bottom are expert opinions and you know, more qualitative studies than halfway you get into the more data analysis types of studies, case studies, cross sectional studies, case control studies, cohort studies. So this more than the web analytics, where you see two segments, you compare them, and then you base your decision. But the highest in the pyramid is what they call in science, the randomized control trial, which we call the A B test, right? Yes. And then if I compare it to science, our numbers are usually even lobby, we tend to have 100,000 of users in a randomized control trial or AB test. That’s unprecedented. The scale, the amount of the experiments that we run, the numbers of users, the power level that we have is unprecedented. So we should never stop using the A B test. Right? If you have enough users and statistical power, we should never stop using the AP test as our primary tool to base decisions on. We should make it faster, cheaper, better, of course, but I want to start with it. Having said that, yes, your rosiness into the replication crisis. And our scene is not aware of the problem yet. And it’s big, its massive. I’m going to start with the replication prize in itself on the winner side, but um, my second the second thing I’m going to say is even more important so that we find a lot of winners, right? Let’s say you’re working in a company You run 100 AP test and you and 25 of them are turning into winners. Yeah. Now I experience with a lot of CRO experts that they somewhere, are aware of, like, mostly unaware, but to some, like we have this feeling that somehow it cannot be true. All these winners if we add them up, right? We don’t we have this feeling like maybe, maybe there’s something okay, but but we don’t dive into it. Yeah, we sort of have what we call a cognitive dissonance within the signs is an uncomfortable feeling. So it’s an area where we’re staying a little bit away from, and maybe for a reason, because if you would really dive into it, you would find that, on average, at least half of your winning experience is not a winning experiment. I spoke at the Dutch hero Awards last week, Thursday, and I asked for their significance levels in that, like within that group. And these were the like the the best CRO, with the best experimenting programs, etc. It was actually 2/3 of the winning experiments, that would not be what they are, in really reality, not a winner. And, and you know, the thing is that very often, the C suite or your boss, he knows, because he’s running the hotel, and the rooms are not fully booked. Or he’s running this webshop. And the warehouse is not empty. But if he would add up all your experiments and all the business cases that are often presented, then it should be right but no one in digital marketing makes it a real problem. Because in the other digital marketing channels like making in mail and search engine advertising or display ads or social, we also have this attribution problem. And everyone says, oh, it’s I cost the sale or the lead. And also they don’t add up. So it’s a bit sort of the status quo than digital marketing. But so

Richard Joe 12:13
just straight up, just clarify the the litmus test of whether or not because I did read your article about this. And when you send a litmus test of why he came to this conclusion that a third to half of winning test at Winners is really just done. So this is not really actually impacting our business for our primary KPI that we have, which is either sales or sales volume or whatever metric you’re looking at.

Bart Schutz 12:47
Yeah, so. So let’s say, let’s just take an example. We’re looking at sales. Yeah. So you’re zero special, you work for an E commerce platform webshop. And you’re running an experiment. You do 100 experiments. 25 of them are winners, according to the statistical methods that you’re using, let’s assume metrics, right? Let’s assume you’re running at a significance level of 90%. Yeah. So last week, typically, the the audience was was clarifying a winner above 80%. Right. But let’s assume 90%. Yeah, if you’re running a experiments at 90%. And let’s, let’s say it’s one sided, this is a bit statistical array, and I never stopped toxic statistics to pee because they find it boring. But it’s some sort of t test. Yeah, one sided. Yeah. So and when the P failure ends up below point one, alpha is point one, then you clarify the winner. Yep. Then people have this assumption that somewhere around 10% of the winners will not be a winner. But that’s not the case. Because the 10% threshold that we’re using, is for all the experiments that don’t make any uplift in reality. So let’s say you have 100 ideas, of which, let’s say 70 to do nothing in reality, yeah, 28 bring an uplift. In reality, what you’re trying to do with your experiment mental program is find the 28 Right, but of the 78 because you use 90% significance level 10%. So seven will become because of all the noise and like the coincidence will become a false winner. Right? So significance is about the amount of bad ideas no uplift, no growth, no extra sales IDs, turning into a winner. Then we have 28 Good IDs, which we don’t want to find. And but those will not be become a winner all of them. Because you don’t have the power the same reason why you find false winners, you also find false losers. I call it missed growth or missed sales. And it’s it’s disturbing to see that we’re not focusing on balancing this because this is what the bosses asked me is find me more sales, right? Yeah. So that the problem is that probably 1/3 of the good ideas will not turn into a winner. Because of the power levels, we tend to calculate with 80% power. But if you if you calculate it backwards, it’s usually somewhere between at the most 65 or 70%. So the third, at least of the good ideas will end up in the bin, and will not, you know, be pushed live and will so you miss some growth. If I now look at the real winners, and I combine it with the false winners are usually with a 90% significance level, come around 50% of false winners. And I can do the math for you, I mean, beans in the sea, how many people are going to react? But if people would contact me or I would I do it myself? It’s not a very difficult calculation. But we need some of your metrics,

Richard Joe 16:20
say false positives puff. Like, they

Bart Schutz 16:24
just want I just want to know the significance level, I just want to know the the winner ratio. Yeah. And I need some indicators of the real power level. So I basically I need a list of your experiments, I need the numbers in the variations. And then we can do the calculations ourselves, right? So if you just have the number of visitors per variation, the conversion rate per variation, we can do the Calculate what was the real power, on average, among your experiments? And then I can calculate this, these metrics, how many of the ideas were statistically seen good ideas that will be brought to sales? In reality, how many, how much of those are actually turned to real winner? And from the ideas that bring no growth? How many turn into a false winner? And then you will see a very disturbing reality, because again, at 90% significance, half of your winners are no winners at all. Just over 10% of the false of the of the losers, like and which is usually most of them, right? We tend to have more inconclusive or negative results than we find winners, around 10% actually are winners in reality. Yeah. And that’s actually the biggest problem. So people are usually shocked when I tell them that half or usually even more of their winning screams are no winners at all, just statistically based on statistics. But it’s not the problem because pushing a winner life has very little cost, usually, right? So we have an experiment. That’s we’re an ecommerce, we’re webshop we’re selling, let’s say toys, we run an experiment, we include in the title of the toys, we include the age, the target age. So this is a toy, and it’s targeted at age, three to five years, maybe that’s a winning idea, right? So we run the test, it’s a winner, we put it live perfect, but maybe it’s not a good idea. Maybe it doesn’t make any difference, right? We push it live does nothing, how much did it cost, but maybe four hours to have it, check it or let’s say, maybe even 1000 $1,000 or euros, whatever. It’s, it’s the costs of a false winner tend to be very low.

Richard Joe 18:45
Because it’s just production costs isn’t that you’re just, you’re just you’re just, it’s just, you know, you’re doing it yourself or some developers doing it. So whatever. But the the real, the real money, the real

Bart Schutz 18:56
problem is, is basically what your boss is asking, find me real winners. If you heard of your winners, you don’t find them. Right? So it’s because science is so focused on the truth. They’re so focused on, you know, excluding all false false winners, false positives, as they call it. But we’re not into finding the truth, we’re fine, we’re into finding the money, right? And the problem is that we tend to have only, let’s say, 25 or 30%, of good ideas of those good ideas. We don’t want to miss any because that’s, that’s Miss growth. That’s, that’s, you know, winning experiments in the in the bin, you don’t want that. So we need more, we need more power right? And I have this term which are called dis discovered positive rate of the positive experiments. How many of them? Do you discover there’s not even a word in science really for this? Or you know, discovered growth? Three to have the end with 90% significance. It’s around 75%. With 95% significance, I see, I see companies increasing their significance level nowadays because science is doing it. But no, you’re missing out on more of your good ideas, the higher you bring your significance level, you’re more strict, you will, you will find less of the true winners, which so you have to balance this normally, if I can do the calculation, people have to lower their significance level, which increases the power. And therefore you find more What’d you find more winners? And not only false winners, but also true winners?

Richard Joe 20:42
Yeah, I was gonna say because you low this level, you also increased the risk of getting false positives, you’re basically trying to find the balance between false positives and false negatives. I’m guessing. So just to clarify for our audiences, finding their balance, I guess, is key. And you’ve got a lot of experience there. So that the higher rescue, say from your posing from a business perspective is if the certificates levels are too high, then you’re gonna you’re basically saying that you’re not going to get you’re going to get more false negatives, right. Yeah. Which I, which is money we have at the table, which is basically winners get it declared? isn’t scientific

Bart Schutz 21:25
term. So I tend to call it miss sales, right? Or Miss growth? Yeah. Because that’s, that’s if you’re going to talk to your boss, or if your executive level yourself, right, you’re not don’t talk science. People, people will disconnect. But if you talk business, they will stay connected. And if it’s missed sales, everyone doesn’t sense. Yep. Right? And if it’s fake sales, okay, we know the costs are fakes, like fake growth of Excel, are not so high. But you know, if you can do the, the discussion with your boss, your executive, on the the balance between, okay, maybe for the whole program, but preferably for every experiment, what would it cost to have a fake sale fake winner, which is just execution, pushing the winner life? And what would it cost to have a missed sale outcome, right and missed growth, false, negative, and you will find out, it’s way more, so let’s say false, false negatives of Miss growth will cost you on average, if you look at your programmer, 100,000, whereas a false positive is only 1000, then I can do the calculation for you or you dive into your statistics, books from back in the universities, you can maybe find out yourself and and come to the optimal alpha, the optimal significance level. And the funny thing is, you can do this rolling experiments, if you didn’t back in the past, and you will find you will actually increase your conversion rate without running any extra experiments just doing the basics right. Looking backwards. I wouldn’t go too far backwards.

Richard Joe 23:12
But yeah. What else would you say to people who are listening to this? And I mean, maybe they don’t, they, they can hire statistician, maybe they’re in a small CRO team, or they don’t have the resources. Yeah, you know, what all behavioral scientists like yourself, or

Bart Schutz 23:31
all behavioral scientists have a full year of statistics and research methods. So they should be able to do this. But to be very honest, even behavioral scientist, they think statistics was like the worst part of their studies. So like, to be really honest, I also studied physics, right? Um, yeah, I’m, yeah, I love human behavior. But I’m also a like, very technical, better type of data guy, too. So I actually like data statistics. But there’s, there’s more to it than just optimizing the significance and power level to do this, to this balance. Because if I, if I look back at, you know, 1012 years ago, the behavioral science world was in shock. Yeah, because we found out this problem within science, right, we call it the replication site, crisis. That’s why they call it the replication crisis a beast destined nowadays, is we’re just wearing exactly the same crisis as the empirical sciences. So behavioral science, but also medical science and biology are we’re back 1010 12 years ago. Now most empirical sciences have gone through this renaissance biology in all honesty is still not really aware, but whatever. So we found a lot of ways to do this better. And so one of them was significance levels, which we enable signs now increased to 99%. And because we need to find the truth, and we’re not finding more money, so in your business do it, but you know, focus on on finding the best alpha for not missing out on on winners. But there are more ways. And one of the most important things that we do nowadays and behavioral science is to have this distinction, the separation between explorative studies, or experiments and Confirmative, or validating experiments. So exploration, validation. And exploration brings a lot of fun to that zero, well, if we will really start doing this, because exploration basically means instead of running one AB test, you’re gonna run, let’s say, five AP tests on the same population. And everyone will see, or at least if you have a good understanding of statistics, they will say, Oh, but then you know, your power level drops, your chances of finding false positive increases. Yes, it’s true. But let’s say we ran this experiment, you know, ABCDE experiment,

Richard Joe 26:11
on the same page, for instance, on the same page

Bart Schutz 26:15
on the same group, so you just have a 1/5 of the original population to each experiment, you find an outcome after let’s say, a week, right? And maybe there’s a variation as plus 1%, and a minus 5%, plus 7%, minus 3% plus 4%. That’s one of them as plus 7%. increase in conversion rate, right. Now, this is an indication within the explorative phase, that this might actually be a good idea. So what are we going to do, we’re going to validate whether that’s right in the second phase. So we keep the the experiment running for another week. But we close down all the other variations. So A, B, D, and E, that we’re keeping, see, because that was the exploratory phase, the one with the highest uplift. And then we validated within with another week of experimenting. And so just to validate that ID, why is this so important? Why is this the way we work now in behavioral science? Because the the problem of statistics is the fact that I was mentioning 72 bad ideas that bring no uplift, 20 20% or 28, from 100, good ideas that would bring sales in reality, that’s, that’s the problem, the fact that we have so many ideas that bring no uplift in reality, right? Yeah, by doing this exploratory phase upfront, we have a lot more good ideas that will bring uplift in reality in the second phase. So we’re in the replicating face, we have this this division of 78 to 28. Is is, is maybe switched around, right? So maybe, or maybe even if it would only be 5050. But we increase the chances of a winning experiment up front. And that has exponential effect on on the statistics of finding false discoveries of finding false winners and false positives. And the bigger boys in the zero scene, so all the Microsoft brands and companies that booking they’re already doing this, to have more explorative studies and then replicate or validate them in a confirmatory phase, but so we really have to sort of meet invent our approach to experimental design, we should we should stop doing one AP test. Right? We should, we should look at it. Again, within science, we do an A B test twice or three times to be more sure, but you don’t want that in business, because you’re going to find too many. inconclusive, that would have been a winner in reality. Therefore, we should do this exploration, confirmation distinction. And and, and it will only bring more fun because exploring we love exploring, yeah, you know, just come up with all your IDs, put them in one big jar and run five experiments at the same time. The chance of finding a winner are among them are pretty big. Yeah, it’s also large chance that’s a false winner but that’s why we replicate or confirm or validate whatever you call it. And you will find more winners this way. There’s a study which is I think it’s a lot of has a lot of value to it. It’s done by the University of Pennsylvania people are so if you if you want to bring more growth to your company, people look at this scientific article, it’s from Berman and found and built it is from 2021. And maybe I can send you the URL to the, to the PDF of that study. It’s called false discovery in a B testing. And it was published in management science. It’s one of the most undervalued studies, I think in, in Sierra will. And it explains everything I’ve been talking about. And I think not even too hard to understand language. And they also propose this way of doing experimental design, separating explorative phase and validating phase or Confirmative. Phase, they also did a modulation. So they looked at, they looked at, I think, almost 5000, AP test or effect sizes within Optimizely data. So this data from Optimizely, they ran all their analysis on the metric, I think it’s engagement. So you know, any, any click on the page, so you have to sort of realize that, because that’s probably different than because you’re, I think, I hope most of the people are looking at conversion rates or sales in the end, but they did this modulation. And what I loved about it is that if you would increase the real winners, among your very the among your IDs, right? With only 10%. So instead of having 72, bad IDs, and 20 good IDs, in reality, you would change it to 65 bad ideas and 35 good ideas, it’s only a small uplift in, in experiment that would win when in reality, right? You would because of the exponential effect of it, you would find 40% more growth. So this is where this is why this is the problem. That the whole problem again, I cannot say that enough is the fact that our our Ds are so bad, you know, 75% of the things we think of do not bring sales, more sales, in reality, we would only have 10% More good ideas or 10% stents and less bad IDs, it will bring in this analysis 40% more growth. So by doing an explorative phase in a Confirmative phase, you will definitely shift a lot of the bad ideas out of it already do in the Confirmative a lot more good ideas testing, it’s definitely going to be more than 10%. And then you know exponentially, you will definitely find an increase of your effectiveness of over 40%. Right, probably 80% more real growth added to your problem. So this is very important. You have to realize that first of all these winning experiments don’t add up. So just to set the scene, if you thought you would you brought a million extra revenue last year, I bet news, it’s probably maybe three and 1000. Right. But it’s very easy to increase the three and 1000 tonight to let’s say 600,000. By first setting your significance level, according to the business rules, not the scientific rules, you have to calculate the optimal one, we can help you with that. And you have to rethink your experimental design by adding an explorative and a Confirmative. Face to it.

Richard Joe 33:45
It’s very, very thorough explanation, I think gallery will get a lot of value out of that. Hopefully, will link I think you can see me I think that’s been a PDF, so we can probably link it to the show notes later on.

Bart Schutz 34:01
Not sure if it’s available for free, but I think it sent

Richard Joe 34:04
me something or that might have been sick. So yeah, anyway. Yeah. Um, that leads me to another point in terms of trying to get more money winners and this is you know, the use of applying you know, consumer psychology to your test and I do know that you said that. It before the the podcasts that you’ve got on database of tastes over the last 13 years, most of them would you say not 80 to 90% Focus on system one, as opposed to as well

Bart Schutz 34:40
as a bit. It’s a bit more complicated, but I think that the the point now it’s very easy, why we are so focused on adding behavioral scientific insights to CRO programs, because again, if you would only find 10% More good Ideas are less bad ideas. Yeah. And I think everyone realizes that by having more scientific knowledge on how the brain makes decisions can probably increase the number of good ideas with 10%. Right?

Richard Joe 35:17
Can you explain

Bart Schutz 35:19
already 40% more sales? That’s, that’s so back in the early days, people say it’s only a small difference. Yeah, no, it’s not. It’s a small difference in the amount of good ideas, maybe because even a psychologist or consumer, but even people like me who run 1000s of experiments cannot predict with 100% certainty if an experiment will win or if it’s a good idea, but I think we are slightly better. But I also think we have not a slight effect on on real increase in sales. Because 10%, again, is 40% more sales. So that’s why I’m really emphasizing companies to focus on how the brain works. Yeah. And I think I love the fact that we were talking about this, before we started the podcast, I love the fact that within the whole digital scene, I think CRO, is I think one of the thought leaders from leaders in adding behavioral science to their the way they operate, more than search engine advertising or any of the other any of the marketing channels. And I also think that’s, it’s cero should be, I think he talked about it with Tom to be like the layer over all your at least digital activities, but even HR and all the business decisions will be based on evidence. By applying behavioral science, you will truly understand what drives people and it starts with the fact as you were mentioning that in the brain, there’s happening more than we are consciously aware. And I think more and more people are being, especially in the CRO scene are aware of this. They know about system one and system two thinking. The distinction between conscious rational goal directed decisions and the more unconscious or automated intuitive decisions. There’s one thing I think is a fun fact, that is that we just discussed discussed the false discoveries, the replication crisis within the CRO world, right. Daniel Kahneman wrote his book in 2012. He published in 2012, that was in the midst of the replication crisis within behavioral sides. Daniel Kahneman says nowadays, that if he would write his book again, it will be less than half as thick, because half of more than half of the studies in his book are underpowered and probably false discoveries.

Richard Joe 38:03
Wow. I mean, it’s a huge bloody book, because like, I actually didn’t end up finishing it was just so long. I just, yeah, it’s too much for me,

Bart Schutz 38:12
I think. I think I like it sort of, if you’re a bagel scientist, yourself, one of the funniest fin factors, I think about it is that the first scientific publication that Kahneman had, together with his late friend, Frisky was about the belief in small numbers. So he, his very first scientific article was about the belief in small numbers. And then he wrote, like 30 years later, he wrote this massively impactful book. But half of the studies were based on small numbers he was he was prone to his own bias. That way, doesn’t mean that the basic point of the book, that there’s so much happening in our brain that we’re not aware of, and that’s way more important than we are aware of, in creating our decisions still holds very strong. And that’s the division between what we because of carnamah, nowadays called system two and system one thinking. So just for the listeners, if you’re not aware, let’s say I’m asking you how much is to and to all of us instantly answers for. Yeah, but the funny thing is, we don’t have to do the calculation anymore. It’s automated. Yeah. Because we are not looking at our fingers and do two and 21234. No, it’s automated answer. But if I ask you, like, how much is 16 times 24? Yeah, I think I’m giving you a hard time, you’re probably going to multiply 24 by 10. And then adding, trying to remember tune in 46 times. It’s, you know, your smartphone has calculated this to zillion times in the same time that you take for it. The answer is By the way, 384 But this is not an automated answer, because you’re not getting the the question or you don’t have to do this calculation so often six years time 24. So maybe you experienced a bit the, the two systems, we have reflective automated, you don’t have to think about a system and a rational one, which you can actually do difficult calculates, but it takes a lot of time. It needs a lot of focus and attention. Because if I would distract, distract you by calling out random numbers, that I think everyone knows that it will become harder. And so those are two things. There’s one other one for the native English people in the audience, which is not a scientific one, but I think it’s a very funny one. My son came up to me, and he asked me the following question. He said that I have a question for you. Jack’s dad has three sons. What are their names? You Dewey? And Louie? Yeah, but if you think about it, because I go like, okay, Louis, but there has to be a trick to this, right. So I’m yeah, yes. Where’s the trick? And I got it. Oh, wait, it’s Jack’s that one of one of them has to be Jack. So it’s you we do, Jack. But I think elegant. This is sort of elegant way to explain the same right? Do you have this automated answers, automated behavior, automated decisions. But you also have a rational thinking that I can inhibit these automated answers. And think logically and come up with, in this case, the right answer. So I was proud of that gave him the right answer. And he thought like that, but I didn’t, I didn’t get it.

Richard Joe 41:44
Have you ever gotten any examples of system? Wall System two tests that you thought were? Well, that surprised you? So I mean, there must be outliers, where system two tests work work well, compared to system one tests? What kind of examples could you you give to our audiences?

Bart Schutz 42:06
Right? So one of the one of the examples that I like to use, that was one of the first experiments that we did nine years ago, 13 years ago, is that there’s very often, especially in the early days, we were doing a lot of usability testing, qualitative research. And then we would run a B tests as well. So it’s, it’s I think it’s fun to look at the the arguments of both Right, yeah. So we’ve had a lot of experiments where in the qualitative usability tests, people would say, like, I don’t like that you shouldn’t do that. Example popups. Back in those days, we were still like this, the exit intent popups, for example. Yeah, but we were even having still pop ups when you enter the website. User testing would say, don’t do that. It irritates people, right. They, and they, they tell us that they will leave the website and prefer a website that does not have a entry. Point entry. We found so many winners, with having a popup upon entry on the website. And then I’m not going to explain the the scientific or the behavioral science background of it. But I think it’s shows already that there’s a difference between what people think and what they do in reality. Yeah. Now, if we apply system one and system two thinking in online dialogue a lot. But we’ve got, we’re going a bit more deep, because the thing is that you want to know upfront, whether you’re dealing with users that are in conscious control of their behavior system two type of users, or whether they are led by their intuition and automated and habits for system one. And you can find it in the data, there’s a lot of indicators that you have in the data, whether you have goal directed rational in control users, or more intuitive users. So let’s say if people would search on your page for a very specific product, and they follow the ideal steps in your funnel from you know, the search listing to the product page and the media type of the detail and go to check out 123 Thank You page, that that’s very indicative for system to rational, logical in control type of behavior. Whereas if you have people that are you know, first of all, they came from social media where you place an ad and they you see them all over your website, clicking everywhere, those people are not going direct in control type of users, they are more the emotional people that click whatever they like, and that’s it’s much are unconscious driven. That’s the first indicator. So we, we start with the data. And then we start running experiments. I’d say within the system to the rational one, we run three types of experiments. And for those who have who dives into behavioral science deeper, I think this is going to be they already know these three types. Because it’s based on BJ Fogg behavioral model. I love the most. I love the model. I had a professor back in the Netherlands is coattail poise, I want to give a big shout out to Tim porous he’s retired is put us he’s my main Inspiron because he had the same model, he was just not so good in marketing as a model. If that were the same three pillars, ability motivation. And then BJ Fogg calls it prompts or used to call it triggers. Whereas might, my professor had another name for it anyway, he called it attention. So I still call it attention. And these three behavioral drivers for rational thinking are very logical, if you realize, first, I start with attention. If you process information consciously, and you are controlling it, you need to have focus and attention, which is very hard nowadays. One of the biggest challenges we have is all these notifications, right? People are, even if they’re on their, on their laptop, they’re still having their phone, or they have a nice, push notifications appear in their screen, people are very easily distracted nowadays. So we find a lot of winners just to make sure that people keep paying attention to the task at hand, right. So winning experiments will typically be using visual cues, arrows, you know, pictures in the forms of arrows towards the main, most important information or call to action, whatever is most relevant at the moment for these consciously in control people, or we will just make things tend out with very shouting colors or movements, we just write as long as we keep their attention to the task at hand. Because if people are distracted, they’re no longer in control of the tasks that we’re doing. That’s one. Second one is motivation. I think this one is the easiest one, because it’s about you know, the newest piece of your product or of the, the, let’s say the toy, you’re selling in your in your webshop. So, there’s usually been a lot of thinking about motivation already. But motivation is also on our psychological level. Because what also motivates people is things like autonomy, right? We are motivated if we feel autonomous, in our decision. So I love to take out mandatory fields in forms, for example, right? And just see an increase, because people are less hurt in their need for autonomy. That’s, and that’s motivating for people. It’s also motivating is what we call self efficacy, which is an awkward term for just making people feel that they can do it. Right. Yeah, anything with green checkmarks in forums, you know, just make the whole form fields, the field itself. Green, it’s like a sort of tap on the shoulder. It’s treat them as if your users as if they’re your kids. Yeah. Reinforcement, right. Positive reinforcement boosts themselves, fix it, right, that the fact that in booking.com, every step you go, yeah, you got the best price. And you’re, you’re always there, it’s like, yeah, for those of you if you have kids, and they’re doing sports, and you see them on the field, and you go like, Oh, this, this is so bad, what are you going to say? You’re going to tell him you’re doing good, right? You can even do better, you’re not going to say use us, you suck, you shouldn’t, you’re in a useless piece of you’re not going to say that, right? So don’t say that to your users. That’s, that’s, you know, that’s also the motivation part. So we’ve had attention and motivation. The third one is also not that hard ability, ability. Consciousness has a very scars, mental energy, it depletes, right? So we need to make sure that it’s as easy as possible if people are in conscious control of their behavior. So I take out as many field steps as possible. I take out distracting images. Just Just make it as easy as as possible. It’s also about you know, let’s say chunking is a is a very easy ability technique, right? Just if you have that your checkout form Let’s say so a one page, technical solution, just chunked the information that’s logically combined. So your details are your first name, your family name. And then you start with your, let’s say, if you ask nationality asking, they’re not somewhere else. So we tend to shift around a bit with, with the things that we’re asking. Also on the product page, right? You have this very detailed product details, put them all together, it’s, you know, because people are finding more easy, it’s what we call with an awkward term, cognitive fluency. It’s more easy for people if you know if it’s chunked in right pieces. So that’s, if you’re dealing with a system to type of audience, with their they know, this is the business people that are looking for a very specific product. They’re in control of their behavior. And it’s a big, it tends to be a big group, right? I know, either somewhere from a, I think, just emerging baby type of science called neuro marketing. Yeah, not a big fan of neuro marketing. I love the fact that they can look into the brain, and I love to combine the insights more, but I hate the fact that based on the brain scan or EEG scan, they give advice on how to optimize your page, right, you should always validate that with an A B test, if you if you can please share the results, because they still have a lot to learn. So that system to wreck if you have system one, we only have two types of experiments that we do. Yeah. And I was gonna say neuro marketing has a nonsense thing, which is that 95% of your decisions are made unconsciously. I’ve tried my best to look up where that comes from. And as far as I can get is that there might have been a professor, I think even

Bart Schutz 51:53
Harvard, I think, that has mentioned that somewhere. But it’s not a scientific study. And if you realize that system one is always on, right, your unconscious brain is always processing the information even while you’re asleep, which I think is a good thing. Because if you have a fire house, you will wake up and you will be able to save your life. But it’s always on. So it’s either 100% Right? Or if they mean that in 5% of the times, your consciousness is controlling your own consciousness. I think it’s way too little. Because as soon as you start separating people from their money, it becomes important for them. They have to justify their decision probably to their partner. And you bet system two is woken up

Richard Joe 52:46
as I did a day otherwise there’ll be there’ll be dead meat. Yeah.

Bart Schutz 52:50
Yeah. And it’s not like I’m also not a love Cialdini for no Cellini with the bison. Yeah. Robert Cialdini? Yeah. Six, six bison heuristics. Yeah, he said, you know, he brought psychology into marketing. And I thank him so much for that. Folks, if you start with applying these more system on insights, you can start with Cialdini six principles or seven. Mon lately. But it’s just the start, right? There are hundreds of these biases and heuristics and they don’t fit into six main ones.

Richard Joe 53:27
Yeah, I mean, this is a start, I think Chelsea needs work as a starting platform, maybe system wants some to four. You know, CRO is just starting out. And you know, it’s really good, you’re highlighting these things. And

Bart Schutz 53:44
yeah, the one system one part is not so hard. So you have this, you did two pre analysis, you you’re aware that there’s a large group that’s probably not so much in control, because they’re all over the place and maybe timing of the data, we have only two emotions and what we call choice architecture. Emotions, so me as a behavioral scientists are I find it a bit hard, because we nowadays because of Kahneman put them in system one, but it might actually be a separate entity, because we are aware of our emotions. We can even ask people about their emotions, right. So, if people start feeling uncertain, or they have a feeling of missing out there are they are aware of it, which makes it sort of in between right, because since you are aware, it’s also a system to conscious thing. But anyway, nonetheless, we put emotions on like the system or almost system, one side, and you have to find out for your type of business, what emotions are most important, and I think two of them are really interesting that dimension of uncertainty, because we see that as soon as people are fair down the funnel right in there, they’re almost starting to buy the product. We see uncertainty rising. And uncertainty is certainties about, you know, isn’t the right size, when will it be delivered? Can I return it without any costs, blah, blah, blah. So any experiment that focuses on reducing uncertainty is, tends to be in the E commerce world. Very big fields, we have a lot of clients where we don’t even call the emotion, the emotions, as a dimension, we just call it uncertainty so that the first focus on the uncertainty part, but if you’re like, if you have a social platform, fear of missing out, might be like the main emotional driver that you could do live experience with. That’s the emotional part. And then we get to the real system, one type of experiments, which we call choice architecture, nudges. These are typically all the biases and heuristics, starting with maybe the child Dini six or seven, but continued with, there’s a very nice Wikipedia page on cognitive biases and heuristics. There are hundreds of these effects. And if anything, is there’s not a real model for it. So we find it harder to apply things if there is no model. But on the other hand, they all work and they will also all work together, it’s not a bias overload that you can use, our unconscious part of the brain is able to process massive, massive amounts of information. It’s like 180 degrees different from our conscious ness that can only deal with one thing at a time. Our own consciousness can do with a million things at a time. So you can just test all these biases and heuristics. And read and find a lot of winners. Now, just to wrap it up, I dealt with five behavioral drivers or dimensions, which are categorized as system one and system two there on the system, two sides, attention, motivation and ability. Right, so BJ Fogg. And on the system, one side, we have emotions, and choice architecture by university. We have a database of all our experiments, categorized to these five and all the underlying persuasion tactics. Guess which one of these five brings us the highest winner percentage?

Richard Joe 57:35
Sure, Ashley.

Bart Schutz 57:37
Well, it’s the system one uttermost choice architecture, the bison arrested niches, that has a very significant higher winner ratio than all other four. So tension, motivation, ability, and emotions tend to be around the same level, right? Yeah, and our level is slightly higher than than most others, because we apply these behavioral insights. But choice architecture is significantly high. So if we would ever win a ratio message 35% at a client, the choice architecture is typically 50%. And again, if you realize that only 10% Better IDs, right? is already bringing you 40% more growth. Imagine what this significant difference does, right? So we are able to shift company’s attention in their experimental program towards way more testing all these biases and heuristics, right, and not every company because some companies are dealing with very rational users. b2b, let’s say you are a plumber. And you’re, you’re looking for a very specific type. That’s very goal directed. Now, it could also be a habit, by the way anyway. So it’s not it’s not the same for everything, but typically an ecommerce site, right, we find this. And then people start more testing these unconscious effects. And it’s, it’s a very, I think, everyone turns into a behavioral scientist in the end, because it’s so it brings you more money, it brings you more enthusiasm, more winners, right? And a lot more knowledge, the things to talk about with your family, friends and colleagues, if you realize that that’s where most mentors are.

Richard Joe 59:21
Yeah, I mean, in the experiments that we’re running, I’m deeply understanding that system one choice architecture is quite a quite a huge array ratio of the test winners that we’ve we’ve encountered. And yeah, I think I think there was a guy that wrote wrote a book called nudge, I have that book on my audible. I don’t I haven’t listened to it yet. But yeah, I’ve heard it’s pretty good. So yeah,

Bart Schutz 59:47
absolutely. Yeah. It’s a I think also funny thing is it’s Sunstein and Thaler that wrote that book right and since then, it’s actually I think he’s a lawyer so so they also do do with the Yeah, the more lawyer and ethical side of things, which might be a nice little bridge towards I think, another topic that we wanted to talk?

Richard Joe 1:00:13
Yes. Is your time good talk about the last topic we discussed before podcast, which is the ethics of, you know, influence behavior at scale. You’ve made note of this that, you know, this is probably a new kind of field of moral awareness of what, what, what kind of experience were we experimented with doing and, and to what audiences? And could you please enlighten our audiences about this area that you’re, you know, promoting?

Bart Schutz 1:00:47
Yeah, absolutely. So I’m very involved nowadays in with the government and the authorities that are looking over our markets to set up guidelines and handbooks on how to apply all this knowledge. The basic point about ethics in our field, is that most unethical practices are happening, intended and unaware. And I think this really accounts for like, at least 98% of the things we’re doing. So of course, there are a few bad boys. Maybe in some countries, there are more bad boys and girls, and anything else, but the what, since we are doing so many experiments and influencing people at scale in the digital market scene. There are also happening a lot of negative hurting side effects. But without the mass market being aware, and without having the attention to do it out. I’ll just give you one or two ab test examples from our own practice, right? Yep. So this is, I think, also one of the very nice things to work with, through behavioral scientists in your teams, is that they are, I think, slightly more early aware of possible negative side effects. And they studied behavioral science. So they are also very aware of how easy it is to influence people in a positive but of course, also in a negative way. So let me give you an example. We ran an experiment with a telco was on the page where you have to choose your plan. They had like six or seven plans, and they go from cheap to expensive. And from a few gigabytes to a lot of gigabytes, right? Yeah. So we tried an experiment based on the anchoring theory, if you switch it around, and you go from expensive to cheap, yep, from a lot of gigabytes to a few gigabytes, you’re probably going to influence the behavior. And the assumption was, I think, actually, the experiment was the other way around, but doesn’t really matter. The assumption was that if you start with the expensive one, right, then on average, they will choose a more expensive plan. Because if they saw 40 euros a month, and then they see 3530, it’s it becomes cheaper, every step is already cheaper,

Richard Joe 1:03:25
because because the sort of Derawan first due to the anchoring effect. Yeah.

Bart Schutz 1:03:29
And if you switch it around, you start with 10 euros and you go, I don’t know the exact plan rates, but 10 and 15 is already more expensive. 20 is already more expensive. So it becomes Sunday, on average, they choose a cheaper plan. What’s our assumption, but the assumption was also that because anchor on the cheaper plans, you might get actually more clients to choose for your brand, because this was a one of the prime more expensive brands in the Netherlands, right are actually in Europe. But we ran the experiment in the Netherlands. So we ran the experiment. And this is the second what we found. We found with that the anchoring on a small number. So cheaper plan. First, we we got more clients, but were they average lower plan. If we did the calculations for them, new clients are very important because you can onboard them on in your in your TV services, into home services, you know, the television program serves. You have a new client and you can sort of start building from there. So it’s very, it’s way more valuable to have a new client than a slightly higher plan. So that was the winning the winning variation with a smaller number first. We’ve been working for this client for a long period already and we’ve identified and are analyzing What we call vulnerable segments, always. So what are the vulnerable segments is everyone under 18. So adolescents, because, you know, they have a monthly amount of money to spend, let’s say 25 years that they get from their parents, and it includes or maybe 50, other includes their telco plan, very often, which is like maybe it’s a substantial part of their money, right? That they have every month. And they don’t have so much money. So for us, it’s a vulnerable segment, we did the analysis six weeks after, because we first have to get the data in, and we have, so they bought the product. And then if they use the product, and then we analyze all these negative side effects, and we found a what we call in science, shocking build effect, meaning you went over your data. And I’m not going to say you how much it was, but it was really shocking. It was a, these kids really had problem, they would not be able to pay for it with their one month, money. So they all had to go to the parents and say I’m sorry, but I bought a cheap too cheap plan. We thought like this, okay, this is a pretty negative side effect. So we went to this client, to the executives, it was like, so we saw this in the data, we can do two things, I don’t think of three things we can leave it as it is, we can roll back the experiment. Or we can do another randomized control trial purely on this on this group to find more evidence that this effect is really happening. And I think the very cool because I have lots of I think over 15 samples of that we went to a client with showing them a negative hurting effect like this. And all every time, always, they have rolled back the experiments. So if you look at this telco now, it’s the higher the most expensive plan first, although they know it’s costing them clients, they don’t want to have this effect on the on the analysis. Right. And that’s This is my point about ethics. There’s no there. There’s no such thing as a, let’s say, a dark pattern. I love the dark pattern. website.org Please have a look at the how you can sort of mislead people manipulate them. But I want to emphasize it’s not the psychological tactic. Usually that’s unethical. It’s the effect, that it’s also not your intention, or if you have a bad intention. Okay, chances are very likely that it’s unethical, but even they’re also weird, anyway. But if you do if you have a good intention, yeah. And then it doesn’t matter what technique you’re using, as long as the effect is is good. My wife is is her specialism is sexual abuse and violence, right? She’s working with people in, in war zones, right? A lot of these, mostly women are, tend to be reluctant to go into therapy, see us a tactics that we will typically identify as a dark pattern to get them into therapy. But in the end, they tend to treat it, they don’t have nightmares anymore. They’re able to sleep again, they don’t walk around like witches anymore. So the effect is really good. But she needs to get them into therapy, right? So she will typically say like, the Dutch government or your insurance, if you have one is paying for your therapy. So there are no, there are no costs involved. It’s not because you have to go into a clinic. And you do have to pay for your food, and maybe even for your bed. So there are costs involved, right? We would, I would call this a bait and switch technique. You’ve seen this. It’s a it’s a dark pattern.

Richard Joe 1:08:48
It works. It’s a classic pattern that works the good old bed and switch.

Bart Schutz 1:08:53
Yeah, she uses it. And I don’t think it’s unethical because she’s treating people and typically this, this type of trauma is very easy to treat, because it’s only one time trauma. So yeah, you get you get my point, right. Yeah,

Richard Joe 1:09:09
I get your point, though, the end result is that she’s helping these patients out.

Bart Schutz 1:09:13
Yeah, but I want the CRO seem to be aware that in this massive amount of experiments that that you have out there, if you start digging into the data, especially to vulnerable segments, or if you have possible negative side effects, you will actually find a lot of examples where you might have hurt people, right, where you might have brought them into financial trouble or where you might have ended up people fighting over decisions they made. And it’s up to you to look in the mirror whether you’re fine with it. There’s a lot of cultural differences even within my companies, some of the experiments that we analyze, yes, some people will say like, I don’t think this is good. I think there’s another word people said no, no. People are also you know, should be more in control. All their behavior, there’s a lot of discretion around it, but at least duty analysis, talk about it. And if you know, they need to separate a company side, you have norms and values there, right? That you’re living by as a corporate company, and bringing up these insights really helps to get your CRO efforts to the board level. Because you’re, you’re having discussions about your clients, and things you’re doing to them. And if you have a little bit of client centric company, it shows the value, not only by optimizing a click through rate or a conversion rate of a web page, that have the fact that we are truly unveiling what drives clients and customers to choose for our for our goal, but in even in light of the purpose of the company and the norms and values of the company, you’re starting to talk the executive C suite language then write for us, it’s a very easy way to to get there. And, yeah, I think we should do that more using your the level of increasing CTRs and CRS on pages.

Richard Joe 1:11:13
I think you’ve added another dimension to our, our role, as you know, CROs or experimenters that, you know, we do play a huge responsibility to our audiences, you know, to, to doing good in the world and, and not just focusing just on, you know, dolla Dolla dolla at all costs. And this is I’m guessing this is an ongoing conversation. So, yeah, thanks a lot for for coming on the podcasts. But we’ve we’ve talked a lot about, you know, that a lot of experiments are really, really

Bart Schutz 1:11:54
good questions, right? It’s very easy to find me on LinkedIn. But yes, without to know that I don’t care if it’s if it’s about, you know, finding the optimal alpha level, to listen to your boss better. And don’t miss out on any real sales ideas anymore. If it’s about rethinking experimental design, maybe it’s a step that we’re going to hold on to doing only one or two years. But if you’re one of the front leaders, he would love to learn more about it, please contact me. If you want to apply behavioral science, please read the books,

Bart Schutz 1:12:28
or follow one of the courses that are out there. I do a course on C Excel.

Bart Schutz 1:12:36
And if you’re into the ethical part of things, just first start digging into the data. And yeah, if you find anything, please share it because this is one of the aspects that really drives me as a behavioral scientist in this world of digital experimentation. Absolutely.

Richard Joe 1:12:53
Thanks. It’s very, very deep podcasts and a lot of thirst for food for thought. Thank you for joining us. And yeah, we’ll see. We’ll see another time. Thank you. Thank you.


If you liked this post, signup for Experiment Nation’s newsletter to receive more great interviews like this, memes, editorials, and conference sessions in your inbox: https://bit.ly/3HOKCTK


Connect with Experimenters from around the world

We’ll highlight our latest members throughout our site, shout them out on LinkedIn, and for those who are interested, include them in an upcoming profile feature on our site.

Rommil Santiago