To build or buy? Solving the classic Experimenter’s dilemma with Convoy’s Chad Sanderson Experiment Nation

A conversation with Convoy’s Chad Sanderson about Experimentation

Those in this space eventually face the question of whether to use a third-party Experimentation tool or build one in-house. The choice is a tough one I recently spoke to Chad about his thoughts on how to decide as well as how he approaches amazing Experimentation cultures.

Rommil: Hey Chad! It’s been a while since we’ve last chatted. How’ve you been?

Chad: I’ve been doing well, all things considered. Convoy is weathering the COVID storm and freight appears to be rebounding. In my personal life, I’ve just recently launched experiment.expert, an advisor agency for experimentation culture, platform development, and A/B Testing. Working on some exciting projects with some very interesting teams.

Just as a reminder for our readers, could you share a bit about what you do today and what’s been keeping you busy lately?

Sure. I’m the Head of Product for Data Platform at Convoy. Convoy is a digital freight brokerage startup that raised $400M in our Series D last November. I manage our in-house experimentation platform, in-house machine learning platform, data warehouse, data discovery tooling, BI tools, and more. I spent most of 2020 building out an internal metrics, dimensions, and statistical algorithm repository to make metric & statistical computation self-service and flexible, as well as a suite of internal ML products around model experimentation and backtesting. A big part of our team’s goal is to reduce the productivity cost of experiments so data scientists and product managers can make feature decisions rapidly.

Very cool. Ok, let’s dive into it. Buy vs Build is a question that comes up a lot in our space. For those of us considering whether to work with a third-party vendor or build an in-house experimentation platform, in your opinion, what are some of the pros and the cons of each?

I’ve built internal platforms at Convoy and Microsoft, on-boarded 3rd party solutions at SEPHORA and Subway, and worked for a 3rd party tool myself at Oracle Maxymiser. There are many components to a buy vs. build decision that need to be considered extensively before choosing either. That being said, here are a few pros and cons for each.

Buy?—?Buying a platform after due diligence can be a very smart decision. The maintenance and ongoing development costs are handled elsewhere, the complicated logic behind assignment and managing metrics computation at scale is abstracted, and the UI is often easy to work with. It’s hard to downplay the ease of implementation. These tools can oftentimes be set up and ready to go out of the box in less than a day. However, 3rd party platforms also come with a variety of hidden costs. Many teams have to hire one or more product owners to manage the tool because it is complex. Important use cases are often impossible to implement which leads to those use cases being ignored, or teams building separate tools with ownership diverging. I’ve seen many organizations where an engineering team had an internal tool and the marketing team has an external tool.

A day?! Geez, if only I were that lucky lol. Also, I 100% agree with your point about use-cases being ignored. I’ve seen this time and time again.

With that said, how about build?

Build?—?When you build a platform, you can tailor it to your specific needs. Your platform can be perfectly integrated into your technology stack, compute huge volumes of data, automatically analyze experiment results, and generate interesting insights in a way that makes your analysts and data scientists radically more efficient. However, the team cost to support such a platform is expensive. There are many complexities in building an experimentation system that are not as simple as they appear, such as randomization (also called assignment) or metrics definition. There are also some tools with relatively novel technology or usability that would be hard to replicate regardless of the number of engineers you throw at it, simply because these solutions have been tackling very hard problems for many years.

Are there any gotchas or use-cases that we should look out for?

Assignment is a somewhat challenging problem that at first appears trivial. Simply flip a coin and assign the user control or treatment based on the outcome. However, if a user decides to dial-up or dial-down traffic customers could be re-randomized into a new treatment group, invalidating your experiment results. You can read about how Convoy solved this problem here: https://medium.com/convoy-tech/small-decision-big-impact-64eddd5fb22d.

This is a great point. Many 3rd parties re-allocate users as you change the delivery percentages. There are workarounds, but let’s just say, they’re not the most graceful.

Thinking about offline testing now. What considerations would you have for testing offline products?

Offline products require special considerations. Instead of asking ‘how do I run an experiment using my existing tool,’ you should start from measurement. For example, if you were interested in running a test across a selection of in-person stores to determine the optimal closing time, you would need to collect a set of metrics for each store relevant to the experiment you’ve designed. Once you have access to a reliable metrics stream, you’ll need to either sample the stores and randomize by hand, or build tooling designed to automatically take the many variables associated with offline products into account for a stratified approach.

Guilty lol. What else?

Finally, it’s important to consider the experiment design so you can quantify the level of risk associated with a non-controlled A/B Test and the impact that would be required to notice significant movements. Unlike digital experiments which are comparatively cheap, offline experiments can come with a large budget.

Speaking of budget. A common objection I hear is that, why would we build one internally if there is an easy to use third-party tool readily available? The cost of PMs and engineering alone makes building internally prohibitive.

It’s important to start from the problem you are trying to solve, instead of focusing on tooling and working backwards. At Convoy, we had several major problems that would have been incredibly challenging to solve through a third party SaaS: Many of our experiments required more complex statistical designs and quasi-experimental assignment due to the nature of our business (a two-sided marketplace). We also wanted our experiment results to tie back directly to the same financial impacts our Finance team measured for quarterly and annual reports. These metrics were frequently growing and changing, so data scientists needed the ability to define new metrics in a language they were familiar with (SQL or Python) and use them in experiments with minimal effort. We also had a need for offline experimentation support, considering a huge part of our business is centred around our Operations, Sales, and Account Management team.

At Convoy, we determined that experimentation was instrumental to growth. Therefore, we were willing to invest resources in a platform that could help us achieve our goals faster and more effectively. Other teams need to conduct the same level of due diligence on the problem they want to solve. They can then either go to a 3rd party with confidence knowing exactly how the tool will help them or can build just as confidently to solve their specific use case.

Sounds like you have a great culture of Experimentation over there, at least in my opinion. Actually, how would you define a strong Experimentation culture?

A culture of experimentation is a top-down science-driven approach to decision-making taken to its logical conclusion: statistical analysis. Because it is almost impossible to separate the impact of new features on a metric from market variability, metric-driven teams will adopt experimentation to guide future resource allocation.

Note that adopting a culture of experimentation does not exclude teams from having a culture of design or innovation. It is simply one facet of decision making.

Totally. So, how key is having an Experimentation culture in terms of scaling up a practice?

For larger businesses, extremely important. If decision-makers don’t care about statistical impact, separating improvements from random movements in the market becomes incredibly challenging. There are some teams whose results are so obvious and direct that this analysis may not be needed (Hardware with long release cycles comes to mind) but for software companies which adopt a more iterative approach to feature deployment this process of validating in production is crucial, especially for companies with many customers.

Do you have any advice on how to build an Experimentation culture from scratch?

Start by identifying the key business metrics that are important to your company and can be tracked over time. Require teams to take goals against these metrics in the long term, and to define goals against inputs to long-term metrics in the short-term. This can be accomplished through Weekly Business Reviews and (in the case of startups or public companies) shareholder financial documents. Encourage leaders and product teams to hold their organizations accountable to their metrics. Ask for proof when claims are made that a metric has been increased, especially if the claims are large. Promote good science. Ask questions a scientist might: Where did you get this data? Is it trustworthy? Convince your leadership of the importance of a scientific approach.

“Convince your leadership of the importance of a scientific approach.”

And how would you suggest folks nurture and maintain an Experimentation culture?

The good thing about a scientific metric-driven culture is that once adopted, it’s very hard to go back. Teams act with more purpose, they move quicker, they are more willing to fail and are better at recognizing the signs of failure early. It’s always good to make sure your team knows they are not judged by their failures, but that failures are a natural part of product development and can always be improved. Crucially, you must make sure everyone has a voice in this landscape, especially the teams who may not at first seem to benefit from it. UX design, for example, often relies on customer metrics (errors, action completions. etc) during user testing and user research which do not translate 1:1 with KPI improvements at scale. It’s important to remember that experimentation is merely a validation tool for features that, ideally, should already have been tested in a qualitative setting. A good UX design that doesn’t pass the experiment sniff test shouldn’t be rejected, but it is a great jumping-off point for asking ‘why didn’t this work the way we expected it to?’.

Finally, it’s time for the Lightning round!

Build vs. Buy?

Depends

If you couldn’t work in Experimentation?—?what would you do?

Build virtual reality infrastructure

Again, these answers surprise me every time. That’s interesting!

Describe Chad in 5 words or less.

More fun than you’d expect.

Chad, thank you for the chat and as always, thanks for joining the conversation!

For the rest of our conversations with Experimenters from around the world, follow us on LinkedIn.

Connect with Experimenters from around the world

We’ll highlight our latest members throughout our site, shout them out on LinkedIn, and for those who are interested, include them in an upcoming profile feature on our site.