A Conversion Conversation with GoodUI’s Jakub Linowski
Designing a good user interface is a challenge that companies large and small face on a daily basis. Today, I chat with Jakub about how his project, GoodUI, is leveraging experimentation to help designers around the world produce their best work.
Rommil: Hey Jakub — I’m stoked about chatting with you! I’ve been following Goodui.org for a while now. I’d love to know more about you and how you came up with the idea for this project.
Jakub: Thanks Rommil for letting me take part in your exciting chat/interview series.
As far as GoodUI, it’s becoming my main project with the core purpose of truly identifying which UI patterns have the most potential. If we can achieve this, then optimization should become a little easier for everyone as the higher probability stuff starts surfacing to the top and the less successful patterns sink to the bottom. We do this of course by collecting, comparing and publishing similar experiment results from a wide range of companies that are nice enough to share their data. You might also have noticed that we started publishing leaked experiments to learn what the big companies are testing and deciding on. Although these are exciting tests, they are with a lot less detail as you can imagine (as far as leaks go, I can only report on what is visible from the public’s point of view).
While doing this project, also I do keep the following guidelines in mind to help maintain a degree of honesty while working towards increasing a pattern’s accuracy of prediction:
- Default Neutrality — for any new idea or pattern we always tag them with a neutral probability — they may be good or bad. Only with evidence (A/B tests) does a pattern strengthen or weaken.
- Minimizing Publication Bias — that is we publish all sorts of experiment results independent of whether they win, lose, are significant or not.
- Replication — in order to generalize, we need to compare 2 or more similar experiments together and this is where our patterns come in. Patterns hold numerous similar experiments together, the more the better.
- Trade-Offs — we’re publishing multiple metrics from experiments and are realizing that some patterns may actually hurt some things, while improving others.
- Feedback Loops — the project is an active one and we are always collecting new evidence. We are open to the possibility of today’s positive patterns becoming negative in the future.
Your leaks feature many prominent companies. Have any of them ever reached out?
Sure they have. Publishing and collecting experiments is very much a social activity which I can’t do alone.
One of the more prominent leaked experiments from Netflix for example, did get their attention. I learned that the analysis was circulated internally after one of their engineers reached out and we ended up chatting. But for the most part these companies are not very much interested in sharing their more detailed experiments.
The most effective way to actually learn (with real test results) about what companies are experimenting with is by offering coaching. That is, I offer my time to guide companies on what to test in return for test results. This builds a beneficial relationship with a win-win scenario. Some companies that I’ve done/am doing this with include: Microsoft, Reverb, Yummly, Elevate, Backstage, Thomasnet, etc. Such two-way relationships are also more fun. 🙂
But yes, my end goal here is to inspire even more companies to run more successful experiments. I would like to scale this process.
After monitoring so many leaks — can you guess the outcome of any experiment?
Yes, getting better at guessing and predicting outcomes is one of the goals for the project. And no we’re not talking about being able to guess 100% of the experiments — that’s unrealistic. What I think we’re up against here is to do a little bit better than chance (ok fine, I actually hope we can do more than just a little bit better). Why do I think a 50/50 chance is our baseline here? If we look at some snapshots of past a/b test win rates from various companies, we might see something that looks very much similar to a classic random distribution.
That is, by default, around half of experiment ideas might end up being somewhat positive (independent of whether we can detect them or not). So I think it’s important to remind ourselves that just by random chance teams will continue discovering wins. I hope we can help teams improve these rates.
As far as our own prediction rates, one subtle signal that we might be going in the right direction was when we did a self-assessment in 2017. When we looked at 51 experiments that were all based on positive patterns (100% of them were predicted to be positive), it turned out that 71% of the experiments ended up being positive at the time of stopping. To me, I think that’s a good start in taking patterns seriously. 🙂
I’m dying to know, what test do you wish companies would stop running?
I don’t think it is in my or our platform’s power to dictate absolutely that a particular idea should always or never be experimented with.
If you’re asking me about what I think about those button experiments that some might discourage or laugh at, then I disagree with any forms of taboo. I think the right approach is to ask questions and continue to measure effects without any popularized stigma. Taking this approach we have discovered flat patterns, as well as ones with low and high effect potentials — that’s a good thing I think.
By knowing realistic probable effects, different companies should factor in the expected effect in deciding whether they will be able to detect it or not (based on sample size and power calculations). As an example, for some companies a predicted +2% effect will be a waste of time (which they perhaps shouldn’t run in isolation), for others it might be worth millions. That’s how patterns ought to be looked at.
And for those companies with lower traffic, the more optimal approach might be to take a series of low and high effect patterns and group them into a single leap variation — increasing its chance of detecting a gain. Hence in this view, even small and “laughable” effects might be useful when combined together.
So although it’s important to be honest here, the act of experimenting and measuring any idea shouldn’t be shunned.
In your opinion, how important is experimentation to the design process?
I see experimentation as a reality check on any work. Design on the other hand has a strong component of exploration, sketching, concepting and generating a wide variety of ideas. I really believe that teams that utilize both modes of thinking (exploration and experimentation) have a higher chance of success. Good experimenters will check B’s against A’s. Good designers will stretch the playing field from A to ABCDEFGHIJKL and allow a richer range of ideas to choose from in the first place. 🙂
GoodUI has brought together so many people in UI and Experimentation — what’s the most satisfying part of it all?
I think the most satisfying moment is when a new and comparable experiment result arrives and it matches a similar test result ran by someone else. This sounds geeky, I know. 🙂 But really, when two completely independent companies, from different countries, with different business models have a similar change with a similar effect, then something special happens. At that moment a pattern gains predictive strength, accuracy, and becomes ever more generalizable. That is how a good pattern becomes better.
Of course, the reality is that the opposite also happens. Things that should have worked, don’t. This might be somewhat stressful, but it’s still important to pay attention to. When patterns behave unpredictably, they might be hinting at underdiscovered conditionals at play — perhaps particular patterns only perform well in given contexts. This means more and important work is needed.
Are there parting words you’d like to share with our readers?
Yes. Don’t view your experiments as throw-away. Instead, find ways to remember, track and share them with others. A/B tests are valuable beyond the moment they complete as they are implemented or discarded. Your experiments contain immensely valuable probability that can be used by you and others to predict future experiments — slightly better each time.
You may also like