AI and the Experimentation Lifecycle featuring Joey Herzberg
AI-Generated Summary
In this episode, we have Joey Herzberg who shares his thoughts on how AI fits in the experimentation life cycle, the utility of keeping easily referenced records of test results, and the importance of experimentation strategy with regard to sample ratios.
Audio
AI-Generated Transcript
(5) AI and the Experimentation Lifecycle featuring Joey Herzberg - YouTube
(00:00) as far as like using AI to you know take the place of a human to call a test I just don't feel like we're there yet and once again I feel like there's so many points that you have to take when calling a test that you know you have to know that quantitative and qualitative background data that goes into the test
(00:16) and you know a lot of times the AI just isn't going to know all that information hi my name is Madara and I'm a senior optimization specialist and your host for this episode of The Experiment Nation Podcast on today's episode we have Joey herberg who shares his thoughts on how AI fits in the experimentation life cycle the utility
(00:37) of keeping easily referenced records of test results and the importance of experimentation strategy with regards to sample ratios we hope you enjoy the episode hi there and welcome to the experiment Nation Podcast this is the interview Series where people in the cro space talk to other people in the cro
(00:59) space my name is Madara I work with the upbound Group LLC I'm a senior digital optimization specialist and I am joined here today with Joey herberg who is a senior website optimization manager with AWS hey how's it going thank you for having me today book toad it's h a pleasure to have you uh really exciting opportunity to get to
(01:24) speak to you and interview you you're kind of where I want to be professionally so yeah if you want to provide me a stepbystep instruction manual on how to get there happy to have it um so you started off kind of in the early 2000s in the realm of graphic design right and have weaved your way here over the years
(01:46) you want to talk me through a little bit what that professional Journey looked like sure of course um yeah early on in my career especially my Early Education I really enjoyed graphic design uh specifically print design doing like newspaper ads and that sort of stuff and uh I did that for many years and really
(02:07) enjoyed dabbling in Photoshop and the Creative Cloud and then I soon realized you know I really enjoy working in digital so it seemed to me the print piece was kind of slowly dying away and I was like I better pivot and find you know my next big thing and it looked like digital was that answer so I moved from print to digital and
(02:28) started uh designing and coding front ends to websites so you know it started pretty simple just you know simple landing pages that sort of stuff and then I worked my way up to actually uh getting a gig at a fintech company and redesigning the front end of their website and yeah I really enjoyed it I
(02:49) really like just interacting with the customer and seeing how they were interacting with the the website and I mean that through analytics so once I you know started creating websites I really wanted to know how they were performing so I started using Google analytics and you know heat mapping Solutions those sort of things and
(03:11) really studying the user journey and I quickly realized you know there's got to be ways I can improve this so I started attending seminars and conferences and I think it was around 2015 time frame somewhere around there I went to one conference in Atlanta and started Le learning more about cro and from there I just took off I uh
(03:32) went back to my employer and pitched you know uh buying some you know optimizely at the time so I was like hey we really need to buy some software to uh really dial in on the customer experience so I you know made a pitch deck and long story short we purchased optimizely and then um you know I ran their cro program
(03:53) and then over the years I perfected the the talent more and more and eventually got to where I am now which is at AWS yeah it seems like I've talked to a number of cro people who I don't think intended to enter the space but because of adjacent skill sets and their own inherent curiosity kind of found their
(04:10) way here plus this burning desire to make things better right yes so absolutely uh I'm kind of in a similar boat just for some background I studied originally software engineering in school and then I did an internship and I was like nope I'm good I've seen how these guys work and I'm out I don't want
(04:28) that 70 hour work week as a you know like a senior manager so I I jump ship to design ux design specifically and I was very curious about ux research so I took a lot of classes around like attitudes and behaviors and strategic design and things like that um and then I graduated and I applied for the
(04:46) designer role that's on my team currently and I did not get it and then they called me back a couple months later and we're like hey there's this role that's called testing and tagging can you do that and I was like What are the what's involved they're like your your technical skills fit I was like okay like you need to know GTM
(05:02) because it was like you know tagging architecture for Google analytics and then experiment uh design experimental design and like research and building AB tests with front end Dev and I was like oh okay the latter one I can probably do right now uh what is GTM so that weekend I went and I like did the Google Academy
(05:20) course Google tag manager ended up uh uh I ended up doing that Google Academy course and uh I went into the interview like Monday that 3 days later and was just like I kind of know what I'm talking about a little bit what's a custom Dimension let's find out together so hey B I end up here in the space yeah
(05:38) truly that first year I think I learned as much as I have learned uh or as much as I will learn across the following several years so uh Sometimes the best way I felt like oh yeah my experience was very similar with the front end coding because I just had a manager at the time was like this is what I want
(06:02) you to do and I had no idea how to how to code anything so I had same thing I had to take some classes do a lot of shadowing and then you know next thing you know I'm you know deploying sloppy code I eventually got more proficient at it but you know you gotta crawl before you can walk so listen the number of
(06:20) times I deployed what I thought was a simple like script fix through GTM that I forgot to wrap in a TR catch and a developer to me and be like hello so with AWS now you're a senior website optimization manager what is kind of your day-to-day your workflow what is what is what is it Joey does in a day so
(06:37) a lot of different things um the main mission is to promote the website testing Channel at AWS so basically influence other teams to run tests and optimizations within that channel um keep up with the different programmatic um measurement pieces so you know test velocity win rate I'm trying to think
(07:01) what else um just how many net new parts of the business are adopting testing as part of their optimizations so you know a lot of that just promoting experimentation but then like I said earlier there's all the different programmatic pieces like checking in on test is to see you know peing to see where they're at to see if
(07:23) they reach significance um you know turning off tests if iterating writing results out so definitely just all the things you know that is incorporated into running a successful testing program yeah that sounds only very familiar so uh yeah I I think that's probably true of a lot of testing programs uh do you mind me asking what
(07:47) kind of like staff size do you have under the testing program currently um currently it's myself and a PM and then we have multiple stakeholders and some of them are pretty proficient as well as in in running their own tests so I train other teams to basically be as self-sufficient as possible it's not a self-service
(08:10) model yet but we're trying to get it there that's really interesting we we flirted with uh empowering stakeholders to build their own tests but uh we ran into difficulties of like uh consistency and stability of of output you know so yes you have to have guard rails in place and you know kind of like that one
(08:30) gatekeeper who basically you know makes sure everything is looking good before it gets released but yes that's a pretty big important part of that so I did see on your LinkedIn that you recently attended the uh uh Adobe Summit yes anything interesting that you took away from that that you would want to talk
(08:52) about I know the big buzzword of the year is personalization and it's something we're doubling down on hard at upbound so right right yes so absolutely personalization was a big buzzword in addition to AI you know and Adobe does a great job of just basically showing you the best case scenario if you have all
(09:09) the pieces of their Tech stack set up just so you know it's like a perfect storm and the reality is most companies aren't going to have that perfect setup you know um I definitely would like to be at that future State one day but um I do enjoy using all their products and tools especially Target and Adobe
(09:30) analytics but um overall they did a great job of just showcasing upcoming new technology yeah so there's a fire a product called Firefly that basically uses AI to create really awesome looking images and stuff so um I feel like the AI thing is kind of scary but kind of cool at the same time I don't really
(09:51) know how to feel about that yet um kind you know the whole Terminator cybernet thing but um hopefully it doesn't get that bad but uh but yeah the conference was awesome adobe's conferences are always great if you've never been I highly encouraged to attend one they have excellent speakers they have you
(10:07) know many of the engineers and folks who build the software sitting in the session so you can ask them questions and then of course they have like their sneaks which is their sneak preview of all the kind of like upcoming underdevelopment software they're working on phenomenal I had the opportunity in October to go to Opticon
(10:25) in San Diego oh yeah I had a pretty similar experience where because we are optimiz Le customers as well at at bound and so um we've been on their web experimentation platform for a few years and you know a couple other small things and uh there was definitely that like oh look at the dream state of if you were
(10:44) 100% in the optimizely ecosystem with our with our data platform and our CMP and this and that look at what you could do it'd be unstoppable and I'm like that's amazing we're we have several contracts I don't know how to tell you this you know so um but yeah there were a lot of really cool things that came out of the
(11:02) conference um very very interesting things coming up the AI piece is very huge so we'll use that as a really clean segue I think seg there yeah there there's nothing quite like a transition that you acknowledge out loud right so uh in in the world of AI right now I'm curious to know if and how you are using
(11:20) it in your program currently uh and if there are gaps or opportunities you think that AI can really fill this is Rommil Santiago from experiment Nation every week we share interviews with and Conference sessions by our favorite conversion rate optimizers from around the world so if you like this video
(11:38) smash that like button and consider subscribing it helped us a bunch sure so I just started using some AI to assist with writing out test results so when I share out my test results they're usually long form you know with basically a paragraph that goes through all the metrics and how they per form if
(11:58) they were significant or not were they directional um you know did we see any um harm to specific metrics that we you know basically think of as like Do no harm metrics so all that kind of stuff it's like a paragraph that gets written so I'm using AI basically to feed it the results out of a table form and then the
(12:18) AI scrapes the table and writes out the results for me so it's like a time-saving measure um although I'm not using AI to call test test results I I don't feel like you know I feel like that's kind of like a slippery slope where you could just start feeding AI a bunch of test results and it could just
(12:36) tell you what's you know to the control when to the variant wi do you you know does it recommend you to take another step so um I feel like there's ways you can use AI to automate your workflow to get more test out and to help with that flywheel but as far as like using AI to you know take the place of a human to
(12:56) call a test I I just don't feel like we're there yet and once again I feel like there's so many points that you have to take when calling a test that you know you have to know that quantitative and qualitative background data that goes into the test and you know a lot of times the AI just isn't going to know all that information it's
(13:13) just basically G to know whatever you feed it I would agree with that truly so strongly uh we're kind of in a similar place in our program where we definitely see the utility of AI I was doing a similar thing of trying to train a custom GPT uh to parse our test results and just summarize them really easily and
(13:33) then also to help with the briefing process because in our workflow we have to develop like a business brief that at the end of a testing cycle with an enhancement gets handed off to the product team to develop and so there's a fair amount of kind of busy work involved with that unfortunately and we
(13:50) don't have dedicated project managers or anything so there is this challenge inherent to like okay I want this spend my time on the most valuable work let's help AI cut the chaff right so that's definitely kind of kind of where we're at with that as well um I think the generative AI piece for copy for for
(14:10) assets for um even like you know simple code blocks although it often does get some stuff wrong or just makes stuff up which is very funny um I think that's those are all places where you can speed things up a fair bit there's one was one other thing I was I wanted to share about a I it's um I'm also using it to
(14:28) like you know with copy to basically rewrite hypotheses so I have a really specific format for hypotheses that says basically like you know if we introduce this change then we will see an X lift or increase in a specific metric because of X Y and Z and many times the stakeholders i w work with will not you
(14:53) know fully bake out that hypothesis statement so what I do is I take it put it into our model and it spits out the hypothesis in the correct format and that way when I share it out you know it's followed following the same format that I use for all my tests that's really interesting I do think that with
(15:10) intakes from stakeholders uh and organizational Partners one of the biggest challenges is like goal setting and expectation setting in terms of how we're going to meet those goals so I've definitely run into a similar thing where it feels like you know we'll get requests because people know what we're
(15:27) Technic capable of but really it's like they don't fully know what they're looking for within that and so I think having that structure to help clarify something and distill a thought or an idea into like well this is what we're changing this is what we're hoping will happen because of that change is really
(15:43) really valuable right exactly yeah you got to be very concise and clear with your hypothesis statement and of course back it up with data and you just want to make sure that it's you know it's always the foundation for your test and if it's not fully baked out then a lot of times your test you know you won't
(16:00) get the results you were thinking you were going to get oh for sure for sure the other challenge I run into sometimes is when uh I know something might not test a certain way but someone's really expecting results and you know they're in a position where you can't necessarily say no to it so you have to
(16:17) be like we're going to run this and I'm going to level set with you this is not going to win but I'll run it because I think you need to see that for yourself but how like like there's a way to sort of Massage that statement so it's uh you know it's an exploratory thing and not like a better kind of thing yeah I I've
(16:33) been surprised I've been surprised I've had like marketing stakeholders who come in and they're like we have to do this it's much better practice and I'll be like I have tested this area a hundred times you are dead wrong and then I'll try it and it like doubles conversion on a certain like proxy metric and I'm just
(16:49) like oh I forget people know what they're doing people are really good at their jobs not just you know I'm we're not the only expert in a space crazy I should remember that well that's where a really good comprehensive test archive comes in handy that you know you can tag or flag those winnings and or those
(17:08) learnings and then that way you can easily call them up when you someone requests a very similar test you can like hey I already did this on a very similar page here's where the results happy to help you coordinate this test but you know I found very similar winnings with a you know another test we ran so that's kind of saved me a couple
(17:26) of times that is amazing because we're in the process of fleshing out our sort of repository as we're calling it but um it it is a challenge to find the right tool and the right format and the right you know ontology and categorization to make sure that your your categories are correct your labels are correct how you
(17:45) folder everything so yeah so uh kind of high level if you don't mind me uh asking what would be your approach to that sort of categorization and cataloging definitely um well the first thing is make sure it's easy accessible to all your different stakeholders and partners and customers so everyone can
(18:03) see it and everyone can access it uh the second part is try to be as intentional as possible with the different categories within the archive so you want to make sure you have the hypothesis you have the business unit you're running the test for um obviously the audience the the findings you know what
(18:23) was the winner or what was the outcome and then some kind of brief indicator of how the test performed and next St so and then if they want to you know tease them with the information if they want to go more I would link to our actual uh test plan itself that has all the full details but I always think of the
(18:41) repository as just like a a quick snippet of how the test perform that someone can easily access and uh filter out based on you know certain business criteria no that's really cool we're we're kind of running into that challenge as well where we would like to label things to where we can sort of see
(19:01) the intersection of ideas as well where we're like well this touches our check out flow but it also touches these like ux Concepts so we know that these patterns are in use in these places and so on and so forth uh managing that that architecture becomes incredibly complex really quickly and we're running into
(19:19) the need for like almost like a data dictionary of just how we've organized the document so like documentation for the document which is in itself I think I think it's very indicative of kind of the space that we're in right it is it is lots of acronyms lots of nuances you know and all the business metrics you know the performance kpis
(19:41) all those you have to have some kind of definitions for all those especially for someone who's new to the team oh on our SharePoint we have a a site that is just are three-letter acronyms unique to the business so some of them are standard industry three-letter acronyms that we have repurposed and it's oh wow deeply
(19:59) frustrating because you will you will hear both in a call sometimes and you like come on come on yeah yeah you're like what is that ABC now there there is something interesting there about like the potential use of AI and uh documentation and and in your catalog to potentially be able to you know even summarize your
(20:17) your repository for you in places or to scrape it for you and pull information absolutely yeah having an AI attached to your Repository so you could just type in a query you know like what is the most popular type of test We've ran this year you know or what is the you know the trends we're seeing across these
(20:39) type of pages on the website you know just those quick queries that would take you minutes to search for would just be easy to bring up with AI definitely and and I think maybe we should talk offline about maybe uh creating a product here no I'm just kidding hey you know that's how these things usually start um no my my boss
(21:00) and I have had that conversation a few times of you know like there's an idea here we you know if if any of us had the time and the ability to invest in it like resource-wise you know right it's a pretty Uncharted space right now it is it is because I don't you know I've worked with a couple of different over
(21:18) the years I've worked with a couple of different uh software Tech stacks for testing and they're all great at you know running the test and presenting results but they're all missing that one component which is like the categor categorization of of results and archives and all that kind of stuff oh for sure um the last piece on
(21:37) AI that I was making notes on that I'm curious to get your thoughts on uh I was seeing some stuff online about people who are using Ai and generative AI chat bots in general to uh sort of act as a sounding board when they are strategizing when they're planning when they're coming up with ideas kind of
(21:59) like what you were describing about like distilling hypotheses down to something that's more templatized like I I seeing that there's a trend of people beginning to use it creatively as well uh do you see a space for that in in your process and like what are the potential dangers inherent to that yeah I mean it's kind
(22:18) of like I mentioned earlier that product from Adobe you know it produces some really great image and assets for advertisers to to spin up on the Fly I mean I think the downside of that is obviously the creators you know those who spent years honing their talents with you know perfecting themselves at
(22:35) Adobe Illustrator Adobe Photoshop and you know sketching and drawing vectorized images so I think there is you know a fine line to walk with some of that um now in our space is a little different you know we're just trying to find tune our hone in on really good hypotheses and you know maybe using the
(22:55) AI like you said to to understand kind of you know opportunities on the website you know feeded a bunch of data here's how people are per folks are performing on certain pages and the average balance rate is XYZ and on this page is higher how can we you know create a hypothesis to fix that so you know it's a little bit
(23:16) different compared to like I said the the creatives who are actually creating images and content but um I do think there's opportunity to hone in further with AI with testing it's just a matter of training the models and having them more proficient in you know your specific website with your specific data
(23:35) set and finding a way to feed it into the AI yeah and and what's Curious is like if like say Adobe or optimiz Le or whoever were to and I don't know how this would work from a data legality perspective but if they had you know all their program different programs that are being run through their system and
(23:58) their AI was trained and like knew about experimentation specifically and it knew your program but it also had access to Industry level data to be like hey other people in like the finance area are doing these kinds of tests right now or are personalizing these kinds of hypotheses exactly uh uh could be that
(24:14) could be a game changer really of like keeping you really competitive and bleeding edge yeah cting edge bleeding edge Cutting Edge Cutting Edge yeah no agreed that's that's a great anal ology for the whole situation and I mean there's lots of opportunity out there and I really don't I haven't seen one tech company with a testing product
(24:34) really you know announced any AI products yet you know outside of what you mentioned I'm sure stuff is on Horizon one of the learnings I had uh at the conference I went to last year was around like testing like the the power of tests and like the statistical necessity of like even splits even power splits and so on and so forth so in your
(24:56) experience kind of uh what what makes or breaks that in relation to an experiment or a successful experiment I should say yeah so you know many times you don't want to have an inconclusive experiment you know you want to have some definite results that you can use as recommendations or learnings that you
(25:17) can iterate on so um an underpowered test many times will result in inconclusive results and a lot of times just from the sample size not being large enough or the the change that was proposed was just not strong enough to initiate a reaction from a visitor so I always try to get folks to who are you know anyone
(25:39) any my customers or stakeholders who are interested in running a test always make sure that they scope out their data and their opportunity before coming up with a hypothesis and by that I mean just making sure they have ample sample size I mean excuse me traffic yes to actually produce that sample size another thing
(26:00) is you don't want to have to run a test for like months and months you know just that's kind of like you know gonna go against your whole thing you if you're running a testing program you don't have months to run a test so um it's good to scope it out prior to even come up with a hypothesis to make sure that you have
(26:14) enough traffic and it's not going to take you months to actually reach significance that's one of those things I'm learning when it comes to prioritizing and building a road map is for the lower traffic areas of our products it makes the most sense to a run as close to like a 50/50 split as possible
(26:31) to like maintain the power of the sample size but then also to just take bigger swings in those regions because like you know how it is with like smaller changes it's harder to eek out major conversion improvements or degradations rather you know any more valuable to just kind of go all the way with something a little bit out
(27:00) there and then depending on how that performs either reel it in or scale bigger exactly yeah those big swings are always worthwhile you know um I sometimes when someone comes up with a crazy idea I'm I'm never you know too hesitant to not do it I'm always like okay let's just make sure we have the traffic to run it and if all the data
(27:22) looks good we and we you know can run it then I'll go for it but um I guess the other thing I want to mention too is just like srms and all these biases that take place were you going to bring that up or is it okay if I bring that up no please feel free this is as much of your conversation as mine awesome so yeah all the different biases
(27:40) that take place that can potentially ruin your test results or you know make it hard to come up with like a definitive result so many times um and this happens in all programs big and large but there are going to be biases why it's important to have some kind of fundamental knowledge of sample ratio
(28:00) mismatches um to to basically look out for those flags um because not everyone has software that'll flag an uneven split so it's good just to make sure that you have a fundamental knowledge of what a SRM is and what it means to your test results um so and also what is the threshold for an SRM because if one is
(28:20) introduced into your results it definitely can like I said in some instances is make your results you know completely unusable and oh yeah it makes them real swingy in a real weird way yeah you might think oh I have a really big win here look at this you know but then you you say wait the let's look why
(28:38) do this visit count so off so um you know having to put your unique visitors or your visit metric into sample ratio mismatch calculator is one way of doing it another like you said is just having that automated piece that looks for it and flags the test um but yeah so it's definitely something to be aware of when
(28:59) you're running a testing program yeah and one of the solutions potentially there if there's not something inbuilt to your testing tool you know if you're using an external session recording slash Journey Analytics tool in the realm of content Square Heap Quantum metric Etc you know there's ways to set it up to where it
(29:20) can flag like if you're feeding it your experiment data in terms of like you know if there's any sort of ID or or identifying details that can be passed directly to that tool very often they're set up in a way where you can alert based on changes in certain kinds of volume so if if you're able to set up
(29:39) that monitoring there that could potentially be a solution yeah no definitely could be a very valid solution and then I mean the essence of SRM is Al also could be an indicator of something nefarious going on in your Tooling in your setup so I always recommend folks also to run AA tests frequently just to rule out you know any
(30:00) underlying issues with their Tech stack um because even the best of the best still might have some kind of you know issues going on behind the scenes you're not aware of and the only way to really smoke that out is to uh you know run these AA tests yeah that makes a lot of sense and it happens all the time like
(30:20) random API service fails or um just just like one time our our c P randomly got like reverted to an older version and started blocking a very vital script that we were using and right uh from some CDN that we we had to have uh whitelisted and so like yeah these these sorts of things definitely happen um and
(30:40) it's it's a really good monitoring uh idea so uh beyond that I like to finish off so whenever we've done uh interviews in my Organization for new roles I like the friendly thing that I like to do is I have some rapid fire like binary questions that I ask so I thought I'd bring that to the table here so we'll
(31:00) start with uh milk or dark chocolate uh dark chocolate excellent choice cats or dogs uh cats phenomenal tea or coffee coffee all day long okay all right all right I don't mind coffee I am a tea guy myself DC or Marvel if you have an opinion um Marvel for sure although I'm not very happy with you know the latest last year
(31:25) or so so Marvel movies but hopefully Deadpool will bring us all some Joy definitely and then also I I like to say to people they gave you 20 years you can give them a couple back like it's okay they'll pick it back up eventually right they have to with that many billions of dollars they have to so exactly um and
(31:42) finally favorite Cuisine uh Mexican phenomenal as someone in Texas I agree with that a th% so uh I think that's all the time that we have it's getting late I want to let you go and uh uh live your life so I really appreciate your time today Joey meet to talk to you yeah awesome thank for having me this is
(32:05) Rommil Santiago from experiment Nation every week we share interviews with and Conference sessions by our favorite conversion rate optimizers from around the world so if you like this video smash that like button and consider subscribing
If you liked this post, sign up for Experiment Nation's newsletter to receive more great interviews like this, memes, editorials, and conference sessions in your inbox: https://bit.ly/3HOKCTK