Your AI Metrics Could Be Lying to You: The Vanity vs Sanity Problem in UX Measurement Artwork

UXchange

Welcome to "UXChange" the podcast where we (ex)change experiences! I am a firm believer that sharing is caring. As we UX professionals are all aspiring to change User Experiences for the better, I have put together this podcast to accelerate learning and improvement! In this podcast, I will:- Share learning experiences from myself and UX professionals- Answer most common questions- Read famous blogs- Interview UX Professionals- And much more!For more info, head over to ux-change.com

All Episodes

UXchange

Your AI Metrics Could Be Lying to You: The Vanity vs Sanity Problem in UX Measurement

September 04, 2025 • Jeremy

Send us a text

I break down the difference between vanity metrics (what looks good) and sanity metrics (what actually matters), plus the new measurement challenges when users collaborate with AI instead of just using a product. Essential for anyone designing or evaluating AI features.

Topics:

Why traditional UX metrics fail with AI systems
The dangerous difference between vanity and sanity metrics
How collaboration changes everything we measure
Essential metrics you're probably missing

Support the show

Hello everybody and welcome back to UXChange, the podcast where I exchange my user experiences to change them and yours for the better. I'm Jeremy, a mentor and senior user researcher with over eight years of experience and a background in neuroscience. Throughout my career, I've had the privilege of engaging with hundreds of users around the world, spanning across Europe, the USA, Africa and India. This has allowed me to deliver actionable insights and have improved user experiences with nutrition, automotive, delivery and e-commerce sectors. I've created this podcast with the aim of providing my mentorship and educational content, offering actionable tips, knowledge and guidance in the realm of research, specifically focusing on UX research to anyone having an interest in this area. And I hope that you will find this episode insightful. [Music] Hey you, hey everyone. Happy to be back. Sorry I was out for some time. Took some time with family, with the summer and so on. And so I was not available, but now I'm back to it. And today I want to resume on this topic of AI in the field of user experience with two episodes that I'm going to make very quickly. I'm not gonna pre-process it, post-process it, sorry. And so I'm sorry if I stumble, but I need to get these thoughts out of my head. So let's call it the brain dump, the usual brain dump. And so thank you in advance for your understanding if I stumble. So let's start with one thing. How can we measure the user experience of AI? How can we measure a great user experience? what does a great user experience mean in the field of AI? What is a successful user experience, a successful interaction? What I have been pondering about this, and so there are multiple sides to it, and first and foremost, I took some decisions and I'm gonna make the episodes shorter, so I'm gonna go straight to the point. There will be half the duration each episode with the hopes that I'm able to commit more in the long term, So that's why I'm making this decision. Let's see if that holds in the long term. Anyways, so let's start. So what do we mean by measuring the user experience of AI? So like anything, what is the value of measure, of measuring? Why do we measure something? Usually we measure something because we wanna track this thing. We wanna track an improvement. We wanna track if what we are doing is the good decision. And so the measurement aspect is inherent to a more global aspect, which is about improvement and about experiments. I don't know if you've read the book, Lean Startup. I really love it. And so in the book, they mentioned that to progress quickly in a given direction, we will want to improve a product, a good way to see it is through the experiment mindset, which is you want to change your, an outcome. You wanna change an outcome. Let's say you wanna change the number of books you read in a given amount of time. So you have to come up with some hypothesis. What do you think will help you change this behavior? Is it displaying a book on your bed table besides you so that every morning when you first open your eyes, you see the book and so that will lead you to read more books. So that's one way to see it. So you have a goal and then you design an experiment And then you need to track. So you need to track the before and you need to track the after and you need to change only one variable between each, which is put the book besides me and then I observe the results. So that's the classic experimental mindset. I have a hypothesis and I need to test it. And so the hypothesis is by having a book on my bed table besides me, I will increase the probability that I will be willing to read it. And so I will increase the amount of time I spend reading. So that's the thing. That's what the measurement aspect is part of. It's part of a more of a bigger strategy. And so my overall point is that how can you measure that the user experience of artificial intelligence, the way we know it today, at least when we interact with it in consumer products, is valuable, is worthwhile, and is satisfying for an end user. Well, my point is that it's no different to what we know so far. It's no different. It's the same. You need to posit and hypothesis and you need to validate it or invalidate it through experiment. So imagine, you are thinking that by implementing an artificial agent, an agent, an intelligent agent, in your customer service, you will decrease customer churn, meaning these are the number of customers that leave your service or your product or whatever. So that's your hypothesis. So what do you do? You put in place an agent instead of a customer support rep, and then you need to measure. You need to measure. The only problem that I see with that, not problem but challenge is that, yes, the mindset is the same. It's really about hypothesis, measuring, experiment, sorry, experiment, measuring, outcome, and then conclusion. So the conclusion could be, oh, we have the outcome, we have the problem being solved in less time. So that means we spend less money. And on top of that, we are not hiring someone to do the work. and so we spend less money on it. Okay, so that's one way to look at it. So that's the main outcome. And that depends on what does success mean to you? What does success mean to you? What is success? Is it handling the requests in less time? Is it having a higher customer satisfaction? Is it having more returning customers? Is it having over time less and less of these problems being solved? It all depends on how you define success. So it's the same approach, whether it's artificial intelligence or not, it's the same approach. Hypothesis experiment, measure, and decision. Should we keep or not the AI agent because it's either improving or metrics or it's degrading or metrics, let's say. So that's one side of it. But then what I wanna posit is like anything, we have two types of metrics. We have vanity metrics and we have sanity metrics. What is vanity metric? A vanity metric is something that is usually used when you want to showcase or share to someone the value of what you're presenting. Imagine, you have a startup and your main task is optimizing, let's say, the number of conversion in e-commerce websites. So it's an agency and you optimize the conversion on e-commerce websites. And so you help them achieve more conversion, meaning more people add to the cart, more people buy. Imagine. Okay, so there are a trillion ways you can do that, right? But you can do it, for instance, with dark patterns. So you could imagine highly, highly recommend at the moment when you add a product to the cart what we call an upsell. So you add a product, you bundle it, and you say, look, these two work together and so on and so forth. So there are multiple ways to do it, right? And you could add it automatically. imagine, you could add it automatically, you could say, oh yeah, we see that our customers tend to buy them together. But if there is no way to disable it, it's normal that they are bought together more frequently. Or you could say, oh, very few of our customers left the subscription plan this month compared to last month. Yes, if you implemented a way that it's hidden, that makes it very difficult for them to leave, it will not help your reputation. Meaning, the vanity metric is what you have at the surface. Usually, it's not always the case, but it's usually what you have at the surface, and it's what, like the name implies, it's vanity. It's like, normally it's, let's say, the top metric that is used to qualify the value of your service, company, or product. So usually it's conversions, it's customer lifetime value, and so on and so forth, right? But these are vanity. But below that are a set of processes that happen for your ultimate outcome to happen before that. So for a customer to purchase something, to add it to the cart and to finally purchase it, there are maybe 20 actions they have to take. And in each of those actions, they are thinking, they are feeling, they are understanding or not. And so all of that is also super, super important. So what I'm about to say is that you have vanity and you have sanity and the sanity matrix is all the rest because if you only care about, let's say, the end of the funnel, something could be wrong before that and you could not be aware of that, right? And so you need to have the complete picture. Let me make an example. So if you say, if you say, if you have a product, right, imagine you have a, let's say a spreadsheet or whatever kind of office software that allows you to do your tasks, right? Imagine you have a CRM, let's say. You have a CRM, so it's customer relationship management and you, it's a place where you put all your leads and you have an indicator of how, let's say, how important is the lead to you, right? So you have this kind of software or flow, and then you think that adding AI to that would significantly improve your workflow. So instead of having to qualify the leads yourself, you would use an AI agent to do that. So instead of you going through each lead, person A, oh, this person is 20, this person is 50, well, use an AI agent, who does it for you? Okay, so that could be one way. And you could think, you could think the overall outcome, I saved some time to do it. But then, am I trusting it? So that's one aspect. Am I trusting it? What is the cost of a mistake? Because it will make mistakes. It's like if you hired someone to do the job, so it will make mistakes. Do I want to relinquish control to it? And so on and so forth. So there are multiple aspects to it that you need to have in mind. Or imagine you're a finance executive and you have a software which helps you with your finances. And you call an agency to implement some AI, or let's say the software agency themselves implements some AI in the software, but they put it kind of everywhere, imagine. And so you could say the vanity metric, which is the adoption rate, so the rate of use of your feature could indicate that it's more and more used. But in the end, what is happening in the background? It's used more and more, but is that really what matters? My point is when it's being used, what do they think about this? So it's being used more and more, but first and foremost, over what period of time? Is it only first uses? Are people coming back to it? Maybe they are not, so you need also to track this. And then if they're not coming back to it, What is it due to? Is it because of trust? Is it because of past interactions that went wrong and so on and so forth? So what I mean is measuring is not only measuring the outcome, you need also to measure what happens before or at least what's below the surface. So, and then there are multiple sides to it. And that is what distinguishes artificial intelligence to traditional user interfaces. A traditional user interface, you will measure only probably the usability, the task success rate, the time on task, the number of errors and so on. But there is way more to that when you think about artificial intelligence. And I want to really defend that deeply. I used to think in the end, it's only a product and we only want to answer users' needs, yes and no, because artificial intelligence is meant to be something that approximates user intelligence and human intelligence. So at some point, we are not only placed in a position of using a product. It's not me using you or me using a product. It's me collaborating with you. More and more, it's becoming that. And the guidelines, for instance, Nielsen and Norman Group recommend that sometimes when your users don't know if they would understand what AI stands for, you can introduce the idea of talking to an intern. So you have an intern that's learning by your side and you need to explain them everything, how it works, you need to give them context and so on and so forth. So, my point is, it's really about collaborating. So there is a whole lot more of metrics that enter into the picture on top of traditional usability metrics. And these are, for instance, trust, notion of control, notion of reliability, notion of what was it, yeah, transparency, explainability, all of that. So you have a lot of metrics that enter into a collaboration, the cost of making mistakes, and so on and so forth, because it's probabilistic. It's not deterministic. That's my main point. When you use an interface, you click on button A and you have outcome B and you know what's gonna happen. When you use an artificial agent, which has an internal decision making rule engine, which is not, let's say, well, deterministic, it's really based on probability, you will need a whole new set of metrics to assess that. And so that's my point. Even before it was important to measure user experience prior to measuring business outcome metrics, like adoption, retention, conversion and so on. But even more so now, because we are using tools that are more collaborative in nature. I hope I made my point across. I hope you like this episode and this kind of format. I will be making a new one anytime soon. Thank you for tuning in hope to see you in the next one, bye bye.