UX - The User Experience Podcast

Human Judgement, 0 Click Future, and Chatbot Manipulation

Jeremy

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 39:02

I'd love to hear from you. Get in touch!

he Case For Human Judgment In The Agent Improvement Loop — LangChain

  • LangChain's argument: if agents are only trained on documented knowledge, their performance will plateau — the differentiator is capturing the tacit expertise that lives in people's heads
  • Tacit knowledge is the problem — a lot of what makes great teams great is never written down, and even if you tried to write it all down, you'd still miss the translation gap between what someone thinks and what they can express
  • The recommendation: design feedback loops that encode human judgment over time — humans help design and calibrate automated evaluators rather than manually reviewing everything forever
  • Once you've done something well manually and it's repeatable and standardised, automate the evaluation — but a human still needs to define what "good" looks like first
  • My take as a UX researcher: you bring thinking to the table — every time there's a judgment call, that's where you come in — boring, repetitive, and non-critical tasks are what you delegate
  • New AI-specific criteria to prioritise in your research: trust, transparency, verifiability, and controllability — these deserve more weight than they would in a standard usability study

Sierra's CEO Says The Era of Clicking Buttons Is Over — TechCrunch

  • Sierra builds customer service AI agents for enterprises and argues that natural language will replace click-based interfaces entirely — no UI required
  • For long-term listeners, you know what I think about this — and I still think it
  • Voice and chat are still interfaces — a user interface doesn't have to be visual, but it's still something between you and your goal, and it still constrains how you interact
  • Counter-questions nobody seems to be asking: how do you initiate an action without clicking? How do you rearrange things? Correct errors? Stay in control? And how does this apply across healthcare, legal, IT?
  • My honest position: technological innovation adds up, it doesn't replace — I still take notes by hand even when AI is transcribing, because I need to own the process
  • The times I was building my website and it was faster to move a div myself than to explain it to an AI — that's not a niche edge case, that's a daily reality for most users
  • Bold claim, may work, but show me the user research

Chatbots Are Great At Manipulating People To Buy Stuff — The Register

  • A pre-print paper tested 2,000 e-book readers across three conditions: traditional search, neutral chatbot, and chatbot instructed to persuade
  • When the agent was instructed to persuade, 61% chose the sponsored product — nearly triple the 22% rate under traditional search
  • Simply chatting without persuasive intent performed no better than search — it's the persuasive intent that drives the effect
  • Even after being debriefed, less than one in five participants detected any bias — the conversational format makes it harder to notice you're being sold to
  • My methodological question: can you truly isolate persuasion from the chat modality itself? My hypothesis is no — persuasion through conversation may be categorically different from persuasion through a static page, and comparing them assumes they're equivalent
  • Not surprising overall: remove the communication barrier and let technology speak your users' language — of course conversion goes up

Support the show

Help me improve the show HERE

SPEAKER_02

In today's episode, the importance of human judgment in the agent improvement loop. The error of clicking buttons that would be over and chat buttons which are great at manipulating people to buy stuff. Thank you for coming back as always. And today we have three articles. I'm sorry, I took a bit more time. Yesterday there was no episode. There might be two today, instead. So yeah, we'll go over three main news today. The fact that uh well the importance of having human judgment to improve your agents, because Langchain seems to make the case that agents are trained not only, sorry, if agents are trained only on documented knowledge, their knowledge will plateau. And so the differentiator seems to be capturing the tacit expertise that lives in our team's head. So yeah, it's the idea that we should be designing feedback loops to encode human judgment over time. So that's the main idea. Then the second one is about the fact that apparently uh the era of clicking buttons would be over. Because we would be able to describe a task in natural language and autonomously have an agent to execute it, no UI required. For the long-term listeners of my podcast, you know what I think about this, but still, we will deep dive into it later. And finally, apparently, chatbots would be great at manipulating people to buy stuff. Let's start. So, when it comes to human judgment in the agement improvement loop, um so I am discovering this article as I speak, so I'm gonna try to skim it and comment it live. I didn't have time to review them um before this time, so sorry in advance if I make some mistakes, but at the same time, it brings the it brings the let's say the positive aspect is that I am giving raw comments. Okay, so apparently Langchain on their blog recommends to have well the human judgment built into the improvement loops of our agents. As AI agents would work best if they reflect the knowledge and judgment the team has built over time. So a lot of the knowledge that the agents should have is documented and easy for an agent to retrieve as is, but apparently, um, well, well, apparently, no, we know that this is the way it is, that a lot of great organizations rely also on tacit knowledge, knowledge that is not documented, that lives inside employees' minds. That is a great point because that is something that I keep saying, which is ultimately if you want to replicate, and this is something I said in another episode. If you want to replicate how a human being works, you will have this asymptotic relationship. You will try to get as close as possible, but you will never have a perfect replica because that requires, let's say, to create a whole human being and have it to work exactly the same way as you do. So, for instance, what are all the steps involved in replicating how a human being works? Well, one of them is documentation, and as Leng Chen Blon blog is explaining, we for instance may be bad at documenting stuff. And even if we use agents to document it, well, the fact of documenting how we use agent to document it, maybe we are not so good at that, anyways. I'm I'm getting a bit sidetracked, but you get the idea. So a lot of our knowledge is not written, and a lot of our knowledge is experiential, and even if the aim was to write it like word for word, we would still miss some of it. We would still miss the the well, there is always a translation function between what we think and what we want to express and how we express it. So ultimately, sometimes it's easier for us to express feedback than to write a whole article or document or thoughts because that's the way it is. So that's what Lang Chen is saying. So they say, quote, ensuring this wisdom makes its way into an agent, requires an improvement loop that incorporates input from domain experts. Okay. So they have a real life inspired example, Copilot for Traders. I will not go over it for today. So they give some recommendations as to workflow design. Quote, there are benefits to using deterministic code to define parts of the workflow, lower latency, fewer tokens, and the guarantee that critical steps actually run. End quote, and new quote will need input from risk and compliance experts to create automated checks that informs the firm's standards. Okay, so basically they recommend to run evaluations to determine the performance and risk of the tool design. They go over the idea of context. So you're not without knowing that the industry has moved from prompt to prompt instructions like back and forth communication with an agent to give in more context to this agent or to this LLM so that they can leverage that context. And maybe I don't know exactly the specificities, but I think this is more efficient in terms of in terms of um tokens. And so for those who don't know, the tokens is basically I'm making a huge shortcut here, but my understanding is that it's the currency through which you interact with LLMs. It's like the energy you spend when interacting with an LLM. It's like the cost of it. And so or usage or interactions with LLMs are measured with this cost. Uh so if I ask something, maybe it's a short question, it generates it it costs some tokens as an input, so it's what I input to it. Then it there is a processing stuff, there is the analysis phase and so on, and then it answers back to me. This is the output, and it does also require tokens for the output. That's the extent of my understanding of it. I'm not a professional when it comes to the AI, the AI uh way of working. I'm a professional when it comes to user experience. So I'm trying to understand a little bit more the technology day by day to see how does that impact my profession and and the products that that I'm trying to develop, right? And so, yes, we are moving through giving more context to these agents because apparently it's more efficient in terms of tokens, and maybe not only tokens, maybe well, well, efficiency in itself, like um being more efficient at your task, basically. And so apparently, anthropic skills is a standard that has grown quickly in popularity since launching in October. And so that's a good example. And so Lang Chen is explaining that an effective agent design, sorry, an effective agent design involves deciding what knowledge the agent should access and organizing it so the agent can retrieve the right information at the right moment. So I would say I'm almost not surprised because if we are to emulate a little bit how humans work when they do their job, and and maybe we should step back here because that's a huge assumption. Do we want to emulate how humans work? And if so, when and how? That's a question that I want to ask today. And yeah, so as as a human being, like there is an action loop which is uh we have perception, we have decision, and then we have action. And so that's the same. And the knowledge comes in this process. So we have some knowledge, and this knowledge will help us choose the right tool, the right course of action, and so that's the same for agents, but but it goes further than that. It's the knowledge, it's also deciding which knowledge to use to accomplish a task, and so that's the same. And but for an agent, how does that translate? That translates into the context because that means I can train an agent to perform a task, but ultimately I need to choose what I feed it so that it's trained on a particular task. I cannot fit it everything about me and ask it everything. Ultimately, there is some sort of granularity. You have an agent for one task, you have an agent for another task. So, for instance, it could be an agent to do some desk research in user experience research, they could be an agent to do some to do some uh crafting a script, and here I'm distinguishing agents from skills. So see skills as let's say a standard operating procedure. So, what you do in your day-to-day job, you have some tasks that you do repeatedly, that they're always the same. It's very standard. You have some input, you have some output, and you have a procedure to follow. This can be encapsulated into skills. It's like you can write that down in a file that describe this sorry, describes what you do, what do you pick as information as an input, and what do you provide as an output? So it's basically that, that's a skill. But then an agent is not necessarily using a skill, it could, but not necessarily. So, for instance, you can have an agent sending an email. I'm not sure if a skill could do that, but basically, yeah, if an LLM could have a skill to send an email, um maybe as it is an action that it it performs, maybe it's already considered as an agent. I am honestly not sure here. This is open question, anyways. So you already can see probably the distinction between agents, skills. Um, so yeah. And yeah, so coming back to the Line Chain blog, they are saying that the most successful teams follow a tight iteration loop, they quickly build an agent, deploy it to production and or production-like environment, and collect data at each step to guide improvements. It's impossible to know what an AI agent will do until it runs. Putting your agent in front of users is the only way to collect the data you need to make it ultimately successful. So that's very interesting. Um yeah, ultimately, I think I think everyone should know about that. Everyone should read these kind of articles. It's like we are sold a lot of the times about on the idea that well, agents are the future, LMs are the future, and sure they are, but it comes with risks, which is all the uncertainty and all the risk associated, people should know. And it's true, like an AI agent behaves in a way that it has some uncertainty associated to it. So we need to know about that, and and all this uncertainty, that's why I'm advocating so much for in another episode. I talked about the necessity to evaluate our products before we ship to the users because some of the stuff is like so obvious that it's not going to go right. So it's obvious, and we should we could we could solve most of the problems before having a user's uh set of users sorry see that. So that's one thing, and then there is a whole set of new criteria and a whole set of new metrics and a whole set of new questions that you need to address with your users because of the uncertainty. So I'm not saying that how can I say that we should revamp the way user experience research considers assessing technology and products because ultimately a product is a product, there is a set of standards and it will be always the same. But ultimately, some more way should be given if the technology or the product looks a certain way. That's logical. It's like taking a smartphone app versus a desktop um computer, uh desktop uh application. So it's the same. I will not ask the same questions to my users if I am testing a mobile app versus a desktop app. I will ask them, for instance, and what do you do mean while you use your smartphone? Because I can already, for instance, conceptualize that they use their smartphone while they're walking on the street. It's possible. It's a possibility. So, for instance, I will emphasize more the topics of uh let's say, let's say the flexibility of use or the moments during the day that they use it, and so on. Whereas if I interview a user using a mobile or a desktop app in their day-to-day job, I already assume that they will be uh at their desk performing their job. It might not be the case, but you get the idea. It's like there is more emphasis that is put on some other topics, on some topics or others, depending on what you assess. And for AI, particularly, these topics include trust, transparency, verifiability if that's a term, controllability if we consider agents. So we need to change, we need to shift gears. Sorry, we need to attribute a different way of for some criteria versus others, depending on the technology that we evaluate. I hope I made my point clear. And so, yeah, basically, they also recommend to implement automated evaluation aligned with human judgment. So, I guess it's the idea that once you have done something great once, like manually speaking, you can automate it. It's the range. So, even things that involve humans, if you know that it's highly repeatable, standardized, and let's say that you control, you have some guardrails in place, and you you have a repeatable process and output, there is no reason why you shouldn't automate it, including evaluations. So they say teams get more leverage when humans help design and calibrate automated evaluators rather than manually reviewing large volumes of agent outputs. And so they they have an evaluator playground, and they evaluate so they give they they they allow the human to perform evaluations, the LLM to perform evaluations, and then they measure the level of alignment. So that's the process, and then so you have an input, you have a reference output, and you have an output. So basically, question how does Microsoft's market cap compare to Google? You have a reference output, and then you have the real output, and then they measure the accuracy, which is a measure of how close it is, how close the output is from the reference output. Okay, and um so that's that's that's the automated aspect of it, but it's mainly the idea that you need to place the human in the loop, basically. So to to sum up the rest of the article, it's mainly that. And so yeah, I can only vouch for that, to be honest. As a as a user experience researcher, would you have expected me to say otherwise? Humans need like LLMs and AI is great to automate some of the tasks, but of course, it should not substitute to our thinking. You should think for yourself. So that is the thing you bring to the table. You should not outsource the thinking. So every time there is a bit of thinking to do, that's when you come in. And every time there is something boring, repetitive, and so on to do, well, delegate it as long as it's not critical and not and not um yeah, not critical and not prone to error. Okay, the other article shares that Sierra's Brett Taylor says the era of clicking buttons is over. So this is from TechCrunch. So this is a startup that builds customer service AI agents for enterprises, is convinced that the way humans interact with software will change in the near future. So this is apparently an agent as a service tool. They intend to replace traditional click-based web applications with natural language. Users simply describe what they need. The idea of replacing software with language-driven prompts is intriguing in large parts because many of the tools currently used in enterprises are not used regularly. You sign into workday quotes. I truly think that's where the world is going. Okay. Umftwares, they want solutions to their problems. Okay, so sorry, I need this might sound like ranting, to be honest. I don't know. Sorry in advance. Um you're free to leave now. You have been warned. So that's I think ironic. Because we have a startup saying that people want solutions to their problems. They say most companies don't want to make software. So they are saying that companies make software as, let's say, a solution to a problem, but that the problem still remains. So the software is not the right solution. And so they assume that having everything through an agent or through voice or whatever is the solution. Okay. So what am I seeing here is a pattern. It's like, what is software? It's a way of interacting with a product or with it's a way of conducting your task, right? It's a way of conducting your job. It's the interface, it's what's between you, your task, and your goal, and the accomplishment of it, your outcome and your output, and then your outcome. Okay. And you have the software in the middle. The software is the way it is because there were some needs to accomplish a task, and then the software was made because step by step it helps you to accomplish your task, let's say. So that's more or less how we see it. And what's interesting is that what I read kind of between the lines in this kind of articles is like this solution is bad, or solution is better. And it's like really an extreme point of view. It's like all UI is bad. And by UI, be careful because voice is also a UI. Voice interfaces are still interfaces, like you still have something between you and your outcome. It's an interface. When you speak to Alexa or Google Home or whatever you want, Siri, it's still a UI. It's a user interface. User interface doesn't mean that it's visual, it's a user interface. So a user interface will highly how can I say, will highly constrain the way you interact and the way you accomplish your task. And so I think the best way to see it is to make it as seamless, transparent, easy to use. As possible. In this sense, I do believe that having more things through voice will make things a bit easier. Yes. But so because we are maybe more used to speak in natural language and to speak than to write. Writing is a bit more complicated than speaking. Um that's okay. So just to mention that I do agree with the fact that having things through voice might help our users. Maybe in a very far long, long, long time. Because we would have adapted to everything being through voice. But I cannot. Okay, I don't know. I cannot for now envision a world without a visual interface. We are visual creatures. We see the world through eyes. And that makes up the most of our cortex. This is how it works. We we need to see things. We perceive the world, we have an integrated perception, it works this way, so it integrates various senses and memory and thinking and proprioception and temperature and so on and so forth. Yes, of course. But most of it is dictated by the visual cortex. Most of it. So okay. I see in this kind of articles a a take which says, no, no, this modality is bad. Let's take it away and let's substitute it by this modality. So I will I will force my users to not use any visual thing, but I will use only vocal. This is my understanding of how it works. Uh vocal. Maybe not vocal. I sorry, uh, maybe they haven't said they have not said vocal. They they just said no clicking. No clicking. Well, even that, like, I'm not sure of that. I don't know. I may be wrong. I may be wrong. And if it works, well, happy for them and happy for for us because it means that we would have discovered that we don't need clicking after all. Um, but even if you reduce the number of clicks, does that mean that this thing needs to go away? Like, I'm not sure. Like, really, have we done user research to confirm that kind of statements? So, for instance, let me give you some counterexamples. What is a world without clicking? What is a world without clicking? How do you initiate an action? How do you how do you how do you rearrange stuff? And I'm not saying visually, but how do you rearrange stuff? How do you move stuff around? How do you control stuff? How do you come back when an error was made? How do you I don't know, and how does that apply in various industries? Healthcare, legal, it, whatever. I I don't have all the industries in mind, but how does that apply? So I'm really not sold on this idea, at least for now. I'm I'm I'm I'm really seeing that's my take. I'm really seeing technical technological innovation as something that builds up and adds up, but that doesn't take away from. For instance, I really do believe in the power of writing by hand. And this is not because you put in front of me an AI agent that listens to everything in in the conversation, that I will not write my, I will not take notes while I'm listening to someone. Like I'm really, really, I do need to write. Everyone works differently, but speaking for myself, I need to write. And I'm not saying that I will write every word, but I need to write bullets because if I speak to someone for an hour and I know that someone is taking notes on the side, you could argue, yes, it's taking notes, and it could display the notes to you in real time so that you can see and you don't have to. I don't know. Uh there is always something that makes me think it's not enough for me. I need to be the owner and I need to be the one in charge of deciding how I do stuff, how I do it. Um, I'm really open to companies showing me the way for some new ways to do stuff, to be honest. That's the way the iPhone came out because we were not used to having uh both music and internet and phone in the same device, and that changed our mental model, yes, and that's great. But ultimately, I want to remain in control, so I'm open to seeing how things work, but I am the the the ones the one who decides, basically. So taking away one modality and assuming that everything will work a certain way, that's bold, that may work, but I'm not sold on this idea because humans need to stay in control of some stuff, humans need to intervene. Um, and right now, like clicking, like it it it means that something is visual, it means that you intervene and that you correct stuff, that you I don't know. Um, and for instance, if we compare with what we have right now, we have chatbots through which you interact when you ask stuff and the answer back. Like the amount of times when I tried to develop my website and when I saw that moving a div was easier, moving it myself or changing the color of the text, even digging into the code, was easier than asking the AI to do it. That's an example, and it might change in the future, yeah for sure. But ultimately, imagine language is good, but sometimes it can be a barrier. Sometimes it can be a barrier. Like, if I know that to do something I have to act on XYZ, it's better for me to do it than to having to explain it to someone else. Or maybe the company here in this case envisions the possibility of having everything encapsulating in the encamp encapsulated in the background with an agent and having it surface the things to the user at the appropriate time. And for that, I want to refer to my other podcast episode in which I mentioned it was not me by the way, it was Victor Yoko who mentioned about the agent transparency. So, anyways, that was my feedback on this idea of no-clicking, and my thoughts can be probably a bit uh scattered. I'm sorry for that, but yeah. And then we have an article about so from the register, chatbots are great at manipulating people to buy stuff. Large language models can be very persuasive, and researchers say that's a problem when they're used to create advertising. Before moving forward, I want to say, and I want to stay humble here. Of course, I don't know all the specificities of it, I know a little bit about neuroscience and so on, but but this is just my take. It's just a quick opinion, take it for what it's worth. But it's almost not surprising if you put in the hands of people a technology that speaks the language of these people that sorry uh kind of removes the barriers in communication. Like imagine you compare you compare a website that showcases articles with a salesperson in his boutique, the same thing, and let's say we measure the efficiency of it, of each process. In one case, you have a set of products on a website there and waiting to be to be purchased, and in another, and you have all the marketing put in place, of course. You have a landing page, you have uh the good copy, the value prop, and so on. Yes. And on another, we have the boutique with all the products physically, and you have the salesperson there, a human being talking to you and telling you why you should purchase it. What do you think will be maybe closing more sales? I don't know, that's an open question. I'm not a salesperson. My brother is a salesperson, so maybe I could ask him, but that's that's the idea. So I think I my my hypothesis is that if we compare the percentages of closed deals, it's a hypothesis, the salesperson will close more because they speak the language of humans better and they help them in their decision and so on. And I think that the chatbot could be at the in-between. I'm not saying that they could substitute the salesperson, in my opinion, by no means, but but yeah, so that's not surprising. You remove a barrier and you enable technology to speak your user's language. Okay, so coming back to the article, in a pre-print paper titled Commercial Persuasion in the AI Mediated Conversation, the three researchers tested the impact of AI-based promotions. Okay, so apparently they did an experiment involving asking 2,000 ebook readers to browse a catalog of titles available on Kindle Reader and to select a book. There was a book which was designated as sponsored in the back-end system, but it was not disclosed to the participants. And then the researchers used three scenarios: a search placement condition similar to web search results, a chat placement condition where participants engage with an LLM, and a chat persuasion condition where the interface was the same, but the LLM was instructed to nudge participants towards sponsored results. And of course, they randomly assigned the models, so GPT, Cloud, Gemini, Deep Seek, whatever. When the agent, quote, when the agent was instructed to persuade, 61% of participants chose a sponsored product, nearly tripling the 22% rate under traditional search. There you have it. Quote, simply chatting with an AI performs no better than search. It's the persuasive intent that drives the effect. Oh, okay, so maybe that modulates a little bit what I'm saying, which is it's not just the fact of chatting with an AI. Without persuasion, you need to be persuasive. But yeah, but okay, there's a confounding effect here. I might be wrong again, and I'm thinking out loud, as always, as a good UX researcher does or knows how to do, which is okay, you're saying that, okay, okay, okay. You're saying that when you search, you have a 22% uh conversion rate, and when the agent is instructed to persuade, you have a 61 conversion rate, okay? And simply chatting with an AI without persuasion performs no better than search. It's the persuasive intent. Okay, okay, but now I want to say, how do you measure the persuasive intent without talking? Is it possible to isolate the persuasion in the talking? Yes, they did so by chatting without persuasion. Okay, but can you say that you persuade the same way without chatting? Maybe the persuasion effect you have in chatting can only be in chatting. That's my point. So I may be wrong and I have not read all the details of the article. Okay, Ribeiro added that transparency revealing that a result was sponsored, didn't materially change things.

SPEAKER_01

Okay.

SPEAKER_02

The sales rate, the rate at which participants chose to retain their ebooks even after being debriefed, and offered the$1 in lieu of the ebook was 33% for traditional search placement. Okay. So apparently a neutral recommendation with the chatbot was a version that underperformed traditional search in sales rate. Okay, um this quotes this and the conversational format make it harder for the average person to detect and process AI embedded advertising, and our results confirm this pattern. Even with the model aggressively instructed to persuade, less than one in five people detected any bias from it.

SPEAKER_01

Interesting.

SPEAKER_02

I would list quote, I would list things like psychophanty anthropomorphism and what we observed in our study, a kind of selection bias in which models selectively downplay less commercially valuable options while highlighting sponsored ones in a way that is tailored to users' preferences and profiles. Okay, really interesting. So, yeah, uh, I do still hold my opinion in terms of experiment design. Okay, you compare search only, you compare chatbot with neutral, first define neutral. Yes, okay. Uh, I need to dive into the study, that's really really interesting. And then um chatbot with persuasion, but how can you say how can like I feel we lack a comparison modality which is persuasion with without chatbot? And and at least at least controlling that dimension. What is the the level of persuasion? Maybe we can rank that, maybe we can rate that to make sure that we can achieve the same level of persuasion without a chatbot. And my hypothesis is that it's not the case, because it's never the same if you can have someone speak to you in your language than if you have something in a shelf standing and waiting for you to purchase. Uh, I don't know. Anyways, so that's it for today. I hope you enjoyed this episode. I hope you learned at least one thing, and if not, at least that you enjoy my thinking out cloud process. And um, this is new. I added a way for you to give feedback in a form. At the uh you can see in the show notes there, there are some there is a link. I would really love for you to give me feedback. It can be on anything, it can be on the content, it can be on the format, it can be on let's say the style of these episodes. Do you want me to bring some people over? I have some people reaching out to um be showcased in the podcast to have some conversation. I would love to to do that, but but I would also prefer to do it if it matches the audience needs. So let me know. I would love to hear from you. And if you want to appear in the podcast, it can be anything, it can be an unscripted conversation, it can be anything. I would love as long as it's focused on user experience, I would love that. So please, please, if you like the show, please give feedback and subscribe. Thank you so much. Have a great day, bye.