cover of episode 24. Artificial Intelligence: What Is It? What Is It Not? (feat. Susan Farrell, Principal UX Researcher at mmhmm.app)

24. Artificial Intelligence: What Is It? What Is It Not? (feat. Susan Farrell, Principal UX Researcher at mmhmm.app)

2023/1/6
logo of podcast NN/g UX Podcast

NN/g UX Podcast

AI Deep Dive AI Chapters Transcript
People
S
Susan Farrell
T
Therese Fessenden
Topics
Susan Farrell: 本期节目探讨了人工智能的定义、现状及其在用户体验设计中的应用。Farrell 强调,当前的人工智能并非真正意义上的智能,而是对现有信息进行重新组合和模式识别。她指出,人工智能系统的设计需要权衡利弊,例如在可控性和学习能力之间做出选择。此外,她还强调了数据质量的重要性,以及如何避免人工智能系统产生种族主义、性别歧视等不良输出。她认为,在设计人工智能系统时,需要进行风险评估,并考虑到恶意用户可能造成的损害。她还谈到了‘人工在环’的模式,以及如何平衡人工和机器之间的协作。最后,她建议UX人员应该努力防止人工智能产品出现问题,并在无法避免问题时,要勇于指出问题所在。 Therese Fessenden: Fessenden 与 Farrell 的对话围绕人工智能的定义、应用、风险和伦理问题展开。Fessenden 提出了一些关键问题,例如人工智能的智能性、设计权衡、以及如何评估和减轻人工智能带来的风险。她还探讨了‘人工在环’的概念,以及如何在人工智能系统中平衡人工和机器的作用。此外,她还关注了人工智能系统可能带来的社会影响,例如对就业的影响和对数据隐私的威胁。最后,她总结了与 Farrell 的讨论,并强调了在设计和使用人工智能系统时,需要保持批判性思维和伦理意识。

Deep Dive

Chapters
Susan Farrell discusses the current state of artificial intelligence, what it is and isn't, and how it differs from human intelligence. She explains that AI today is a mixed bag of technologies including machine learning and automation, but it lacks true intelligence and self-motivation.

Shownotes Transcript

Translations:
中文

This is the Nielsen Norman Group UX Podcast. I'm Therese Fessenden. We're finally back with a new episode to kick off the first Friday of the new year. But before we get into the details about today's episode, I wanted to share some exciting news. We're hiring. We're hiring user experience specialists, which means we're looking for designers and researchers to join our ranks, both entry-level and experienced professionals.

So whether you're graduating grad school and just getting into the UX field, or if you've worked in UX for the past few years, we encourage you to apply. The deadline to submit applications is Monday, January 30th. To learn more and apply, check out the announcements on our website, www.nngroup.com. Now, onto today's topic, artificial intelligence.

The term artificial intelligence, AI, is having a bit of a boom with the explosion in popularity of tools like ChatGPT, Lenza, DALI, and many others. This is naturally an exciting time with prospects of increased productivity, creativity, and more interestingly, automation of tasks that in the past would have been considered drudgery, tedious, or to put it bluntly, boring.

These praises have been equally met by skepticism and criticism, with cautionary tales about AI misinformation, plagiarism, and other risks. To sort through the mixed feedback, I spoke with Susan Farrell. Some of you may recognize this name because Susan is what one might call an energy alumnus. But to be honest, that would really sell short her 18 years of researching and consulting work with Nielsen Norman Group.

She's authored many articles and reports with us over those years. But since 2017, she's been principal UX researcher at AllTurtles, an early stage product studio where she's been spending much of her time researching various products powered by AI. Susan and I discuss what AI is, what it isn't, and the benefits and risks that come with these new systems.

Susan, welcome to the podcast. So to start, could you tell us a bit about yourself and how your journey in the field of user experience got you where you are now, researching AI products? Sure. Well, I started computing in 1983 when I went to college and needed a word processor.

But it wasn't until 1991 that I started working in multimedia on computers. First at Georgia Tech, then at SGI and Sun Microsystems. I became fascinated by user interface design and taught myself

by reading books and magazines, and by going to conferences like CHI. When I found Jacob's newsletter, I got interested in web usability testing, and I started working with Jacob in 1999. After 18 years of interesting client consulting in 2017, I moved to a product studio called AllTurtles.

At the time, they had several chatbot-based products in the studio, which gave me a chance to test some of those with users. It turns out that people like talking to machines in some situations, but it's complicated to make a chatbot that meets people's needs and expectations. And at least in my testing experience, people especially don't like computers pretending to be people or people pretending to be computers.

Yeah, I can imagine it can be hard to overcome the uncanny valley. So for those of you who don't know the uncanny valley, it's that expanse of product fidelity, for lack of a better word, whether that's images or text, where it's high enough fidelity to seem human-like, but not high enough to be convincingly human. So it can kind of feel uncanny or it feels kind of creepy. And I guess more importantly, it's

That creep factor kind of prevents trustworthiness because it seems like an impersonation of some kind. Either that's an impersonation of a computer or an impersonation of a human. Well, AI is an interesting term in that we've used it for lots of things for a century or so.

And what it actually is keeps kind of receding into the future. So whatever we have now is not AI, but it's coming soon, coming soon, coming soon, kind of the same way that VR has kind of been doing that. But various technologies have emerged on the way. It's important, for starters, it's important to understand that it's actually not intelligent, right?

Right? So at least not yet, not in the foreseeable future. Machines don't so much create as they remix and find patterns and relationships between things. What we call AI today is a mixed bag that includes machine learning, machine vision, language models, image recognition, game playing systems, and other types of automation and augmentation.

So we're not talking about Star Trek here or intelligent robots. We're still talking about machines that do clever things, some of which can learn to do things in a more clever way.

You mentioned that they're not intelligent. Could you elaborate a little bit more about what intelligent means in this context? Well, that's a hot topic, and I don't know if I want to wade into it. The main problem is that we don't know how to define human intelligence. So it's very difficult to establish some kind of baseline and then apply that to machine systems. But there's a sense that intelligence

In order for a machine system to be intelligent, it would have to have its own motivations and it would have to have a sense of itself. I see. So it seems like the key distinction is artificial intelligence as we currently have it doesn't necessarily have its own motives. So the AI machines, as you're saying, are capable of doing clever things and maybe doing them quickly, but that's about it. Is that a fair assessment?

Yes, they're augmenting things that humans do or want to do and automating that, hopefully to take the drudge work away from humans. Yeah, I bet that's really top of mind for a lot of people. I mean, obviously there are a lot more current examples like ChatGPT or Lenza and many other organizations working on tech like this, you know, to remove the drudgery of some common tasks. I'm wondering though, while there's a lot of promise in, you know,

maximizing productivity, among other things. There are probably some trade-offs as well. As you may know, we teach a course on design trade-offs where we talk about how every design decision comes with costs in some way. So what do you think are the biggest trade-offs when using AI in a design? Well, the biggest risk is probably the usual over-promising and under-delivering. What people often want and expect from these systems is beyond the technologies that we have today.

Just as you alluded to, the science fiction has kind of set us up for failure already. With chatbots in particular, you have to choose between making a system with a narrow purpose and controllable output that takes lots of scripting

Or using a large language model that learns but tends to absorb things you might not want in your product, such as racism or X-rated language. I see. So it seems like there is a lot of potential for unintended consequences in how a machine learns.

basically. The input seems to be a pretty important factor. That's right. One of the first rules I learned about computers is garbage in, garbage out. Because computers can act only on the data and instructions you give them, it's very important to have the data and instructions be of very high quality. We see a lot of quality shortcuts today in how large language systems are being trained.

You can't have a system just ingest convenient parts of the internet and then expect it to output something great. We also see a lot of mediocre output from these systems being deployed on the web, so it's important not to be a naive consumer of that. And in a continuation of garbage in, garbage out, we're in a tricky time in terms of bad information outputs becoming the next wave of bad information inputs.

And it's not just quality of data input that you have to worry about. It's the consent of the people who made that data that's often lacking. The rush to train these systems has left a lot of human beings as well as companies in legal and ethical dilemmas. Yeah, I bet.

And we've seen some awful things come out of these systems, too, such as racism, sexism, violent imagery and other amplifications of big problems that human societies have. I don't know how to mitigate some of these risks, but it seems worth spending the time to do threat and risk assessments before designing. For example, what could possibly go wrong? Yeah, often the...

Famous last words, if not taken seriously. Yeah. What are the likely consequences of that? Who could be harmed? How might we prevent that? For large language models, organizations tend to train them to a certain point and then clone them and then deploy one and see what happens when it's in the wild. And if it fails, they fall back on a clone and try to learn from their mistakes in training the next one. And in that way, they

mitigate the enormous cost in developing these models because they often fail.

and then have to be redone. You don't want to redo them from scratch. Right. It seems like when teams start building these systems, there's a lot of learning that happens. And in a way, I mean, it sounds a lot like when you're training or educating a person. You don't necessarily want to throw out all of the potty training and elementary school along with whatever advanced topics you're teaching now. I imagine that's got to require a lot of work to retrain students

Some of those more basic pieces of knowledge that help it function. So is version control sort of what is helping to keep this technology moving forward while still keeping unwanted behaviors, for lack of a better term, in check? I don't use these big systems myself because they are extremely expensive. Only a few companies are able to develop them.

But from what I read, they are extremely expensive to develop. They use a lot of energy, a lot of people time, and it just takes time, time elapsed to train them and get them ready to go.

So that's why companies try to mitigate their risk by cloning them and having base models, I guess I would call them, ready to go so that they can afford to try and try again. But it's really important to realize that you can't just inspect one of these big talking language models to find problems to fix like you can with debugging code. You pretty much have to throw them away and start over. And that's because the complexity is ridiculous.

They have zillions of connections among zillions of data points, and nobody's able to do surgery on that.

So preventing problems of input and looking for problems in output is what UX folks have to do. We should try to look to the academic researchers and organizations that are publishing on ethics and AI for guidance, for example. Yeah, it definitely seems like a risk or threat assessment would be especially important now. Granted, they've always been important for designs in general, but...

Especially with what you mentioned earlier, the fact that AI tends to be automating the drudgery of certain tasks, but in doing so, that automation ultimately means that a design is being implemented or that actions are happening at a scale much faster and much greater than any individual person would ever do. So it seems like there are a lot of risks that a team would really need to assess because the damage...

is really magnified when operating at this scale. Yeah, and nobody wants to talk about that when you're developing products. So it often falls on UX people or privacy professionals, or if you happen to have some in your company, which is kind of rare, the ethicists, so that people can really look at the risks properly. Security people are often quite helpful in this regard to team up with.

because they're used to doing threat analysis of various kinds and they're used to thinking about bad people doing bad things with computers. And that's kind of the mindset you need to be in because it's not just what could go wrong in the development and you have to figure that not all people that encounter the system are going to have good intentions. And with some of these systems,

That interaction with people is how they learn, and so you may need to be careful how you allow people to interact with them. So we see a lot of systems today where the access is pretty tightly controlled and where there are constraints in place about what you can talk to the system about or what kind of images it will accept.

and so on, because people have seen abuses in the past and they want to prevent them in the future. I can definitely see that as being a crucial element, whether that's a chatbot or any other form of AI to get it up to the level it would need to be to operate effectively. Now, these risks, they do seem pretty significant. What are your thoughts on the benefits of AI? We mentioned automation being a big one, but what other benefits can we see from AI?

Well, in a way, these are just better tools. So not only are they automating some of the boring stuff for us, but they're also extending our reach in terms of what we can do. But today we still have what we call humans in the loop.

And these are often sweatshops where people do image labeling, word recognition, content moderation, and so on to correct and augment the machine output so that products seem smarter than they are.

So we're a long way from robots taking our jobs in most cases, but we need to look out for those exploited workers. But it's great to have machines help with medical issues like cancer detection, drug discovery, and precision surgery. And today we also enjoy the first-pass machine translation, captioning, and transcription services.

So machines can help with writing tasks, but you don't know what you don't know. So it's hard to be sure that the output's correct and that references, quotes, or history haven't just been invented. So a lot of these systems just make stuff up. And so you have to be kind of skeptical of these tools. They're very shiny but flawed. That's really interesting.

This makes me think of this term. Actually, we tend to use it in user testing a bit, but it's a method you might know of called Wizard of Oz type of testing, which is where there's a prototype. You might be showing it to someone, but there's an actual person who's working some magic, quote unquote, behind the scenes.

Taking the inputs that a user gives and preparing a screen or output in real time that looks convincingly real based on what the user did. So even though it's not necessarily a functional prototype, it seems to work in a convincing way, even though it's a human being doing it and not the system doing it.

Now, with humans in the loop, is this the same thing? Like, is this human in the loop? Is this happening during the testing phase or is this happening like live? Well, there are several different ways that humans in the loop can work with systems. One is fake it till you make it. And that may be testing.

prototypes, for example, like you've talked about with Wizard of Oz method. Or it could be that as the system's developed, it needs some help. And so humans jump in. For example, when the post office in the United States started using optical character recognition to recognize addresses and route mail automatically,

They touted this as a wonderful breakthrough, but it turned out that if your handwriting was messy like mine, it would kick your mail out of the system and send it to a sorting center where people would transcribe the address into the machine.

And they kept this secret for a couple of years because they didn't want anybody to know that their system wasn't smart enough to just do the job. And these people who are in these sorting centers were under a lot of pressure to move as fast as the machines move. So, you know, there's a fine line between helping the machine and having the machine make your life miserable, I suppose.

Yeah, I could definitely see that being a really difficult position to be in, basically needing to keep up with or even compete with the machine. And that can be incredibly unrealistic for a human being to be able to do that behind the scenes. Okay, so humans behind the scenes, these humans in the loop, they're working, as you mentioned, often in these tough conditions.

But then there's the other side of it, which is making sure that the people who interact with AI don't abuse the system and that the quality of the data going in doesn't result in, as you say, garbage in, garbage out. Yeah. And I think there's also, you know, this is a it's always a temptation to try to make the system appear smarter than it is so that you can get funding.

So a lot of these startups will blur the line between the system and the human augmentation of the system in order to sell the dream of the system to funders. We saw a really interesting food delivery robot in the Bay Area that looked like autonomous little robots tooling along on the sidewalk. They looked like, I don't know, robot coolers or something like that, right? These little square things with wheels.

and googly eyes or something similar on the front. So they were cute and everything, and people wanted to interact with them, but they were uncannily good at navigating people and sidewalk and stuff. And it turned out that they were actually being driven by people in a different country who were looking through front cameras. So there are a lot of situations like that where people are trying to

get their startup going and the technology isn't quite ready to go when they need to launch. I see. So once we do get to realize the benefits of AI-driven tools, they certainly seem like they're going to be a great tool for us. But for now, it also seems like there's quite a bit of labor in getting AI to the point where it's considered smart enough to work on its own.

Now, I don't know if there's a straightforward answer to this next question, but do you think the benefits that AI promises that we might see in the future, do you think that these benefits outweigh the risks that you've mentioned? Well, I think some of the benefits outweigh some of the risks.

So we have a long, hard road ahead in preventing awful machine-written news websites and art theft and deep fake impersonation and many other kind of fraudulent abuses of the technology.

And privacy violations need a lot of scrutiny and probably new legal protections as well. So, you know, should you buy your child a talking bear? Maybe not yet. Some systems and their organizations just can't be trusted because your private data is too valuable. So we have kind of perverse incentives going on in this realm where some companies and organizations are taking advantage of people and their data or their data.

or their creative output, people's artwork and people's music. Because they can scoop it up, they do scoop it up. And there's a lot of legal problems around that and cultural problems around that and privacy problems around capturing people's voices and faces and so on. And we really haven't dealt with that effectively in law or society.

So right now we're having a lot of very unhappy people who are experiencing their creative output being scooped up by

machines and then resold as products. It really sounds, and honestly, I hate using this word, it sounds unprecedented. I hate that word because I feel like we've really exhausted it over the last few years, but it really is an unprecedented frontier, legally speaking, as well as in the realm of intellectual property. It sounds like it's honestly only going to get more interesting, I guess, and probably a bit more of an involved process when designing a solution that uses AI output.

Yeah, it's unstoppable, I think. The question is, who's going to get to steer it? And where are we going to draw some lines? So yeah, on the topic of drawing lines, do you have advice on how teams can figure out where those lines are? Like, what I'm thinking is...

Teams are probably trying to push the envelope, right? They're trying to maximize those benefits that we mentioned earlier, whether that's creating the illusion of benefits or whatnot, that's up to the teams, obviously. But do you have advice on how teams can push that envelope while still keeping these ethical boundaries in mind and reducing harm? Well, I think UX people should try hard to prevent problems with AI products.

And when that's not possible, we need to stand up and call it what it is. I think there's a lot to be said for listening for outside critique of your products too. From what we've seen so far, it seems difficult to get traction in ethical issues from inside the organization that pays you. So I think as UX people, we need to test and interview and do research the things that we do.

And bring that back to the company and be careful what you measure and go in with a keen eye for what can go wrong. You don't want to assume that revenue is the only thing you should be measuring, for example, or that engagement is the end all be all, because a lot of times engagement is

means people are mad at you. So you have to have some nuance in your measurement. Otherwise, you won't understand what's actually going on with your product.

Also, in doing social listening is part of that too, because people are always going to be talking about your products. And I think that a lot of social listening is hard. And so a lot of companies don't do enough of it. But somewhere on the internet, people are talking about your product. And if you don't know what they're saying, then you're really missing some important information.

And you don't want your product to fail big in public and then wonder if that's because you didn't ask the hard questions. So I think that we're tasked with something big here in our jobs, something that before was probably only in the weaponry or medical fields where you

you have terrific harms that you have to think about and guard against. Yeah, absolutely. So I think you had a lot of really insightful points here. One being measurement, right? Obviously, measurement is a tool we use in UX quite extensively, or at least teams that are really aware of their progress, they tend to use metrics extensively. And if you're not, then certainly something you should be doing. But

The key is that you're not just measuring one thing, but rather having a nuanced approach and maybe picking a couple different criteria, right? Not just revenue, not just engagement, but maybe other criteria like how pleased are people with the service or, you

Are we combining metrics with customer satisfaction or net promoter score? Are we thinking of both the positive and negative experiences in that evaluation? Right. One of the things that people don't measure well enough, I find, is the people that didn't use the product. Right. The people that looked at it and they went, nope, and they're out of here. We really need to know more about that.

It's one thing to measure the engagement of your customers or how many customers you have as opposed to visitors or people trying things out. But it's the people that went away that can tell you the most about the money you're leaving on the table or the...

the functionality that you didn't offer or the need you didn't fill. Also, competitor analysis. Companies often don't want to pay to do competitor analysis, but some of the most important stuff you might learn is from understanding who your competitors are and why they're competitive with you. Working with

Brand new products as I do, we tend to look more at those because new products fit into the gaps.

that existing solutions don't fill. And so it's important to understand what will make somebody switch from what they're using today to something new. What are their pain points and what can you offer that will solve problems that they know that they have? And I think these things don't change when we're talking about AI systems.

It's just that we need to look at different kinds of competitors because they might not all be AI systems. Just like any new product, your competitor might be the old way that everybody's always done it or whatever, right? So there's a combination of tools or workarounds or human labor that go into solving whatever problem that is today. And

The new system might disrupt many of those things or might augment some of those things or might replace some of those things. And so kind of getting a grip on that is important.

I really appreciate what you said there, that your competitor isn't always going to be, like you said, another AI system. It might be something like the current way or the old way of doing things. And I often encounter it in the courses I teach. I'm often teaching the Emerging Patterns in Interface Design course. And it's a common question. What if I'm working on something that's so new or doesn't actually have competitors?

And I don't think there's a such thing as not having competitors because there's always another way of doing an activity, right? There's always an alternative to whatever it is we're designing. And I think you're right. It absolutely is important to spot the gaps in those existing ways, as well as the gaps in our current system. Well, so a good example of this is travel sales.

So back in the bad old days, you had to know a travel agent and you would call them up on the phone and tell them where you wanted to go. And then they would do a bunch of research and call you back in a few days with some kind of a plan. And it costs quite a bit to on top of the trip to pay this person to do that work.

And they were a privileged group who knew the airport codes and really knew the ins and outs of many systems. And they had special access to them. Well, then along comes the web and the airlines want to sell you tickets directly. And now they have...

voice chat systems that sound just like people and do a pretty good job. So I've been pretty impressed by these voice assistants and their ability to sell me airplane tickets and to answer my questions.

But what if you, I was looking at the other day, what it would take for me to visit one of my coworkers. He lives on an island off of the southern coast of Brazil. And it was a 44-hour flight, according to Google Flights, because they only look at one airline end-to-end.

And I thought, wow, you know, a travel agent would have done some research and figured out how to make that fewer than four flights and maybe combine different airlines and different layovers and airports in order to shorten that journey for me. But the computer's not doing a great job at that because they have other incentives available.

for making that travel plan. And so I think when we're looking at systems like that, disrupting a system like that, we have to ask ourselves, can we do a better job than the human can do? And if so, what are the parameters of that? And how can we also work with the existing infrastructure? So today we still have travel agents and they still do those same kind of jobs, but they also use these new web systems to do that with.

And so their jobs have become easier in some ways because of the systems that have been put in place. So that's the automation piece. But the traveler still has to try to figure out when is a good time to get a human involved.

And so do systems, right? I'm not going to buy that 44 hour flight plan ever. If there's some, if there's some possibility that I could get there as the crow flies piecing together something else. Right. And I think it also speaks to the value of those human to human interactions as well. Like,

It's often easy to, at least nowadays, it's easy to default to a solution that is technical in nature when maybe actually the best solution to incorporate in the moment is one that involves human intervention in some way. So you're absolutely right that-

it does require us to be a bit more reflective of what solutions we're coming up with, whether that's creating a better version than reality or an augmented sort of version with the current infrastructure. Right. And back with travel, there's a lot of human things involved. Like how are you going to get your luggage from there to there and do it before the next one takes off? Do you have children?

And what does that mean about eating or sleeping on a long journey? And what does that mean about time between flights? So I think that we can't just look at time on task. Yeah, time on task is probably not going to be the most effective

I mean, certainly can be insightful, but probably not going to be the end all be all. Anyway, long story short, it sounds like humans aren't going to be replaced anytime soon, but it does seem like there is some promise in our future, maybe an augmented one.

as long as we're critical of the work that we're putting together with our teams and, you know, cognizant of the potential harms that could come with systems like these. So it seems like as long as we keep these things in mind, then we're off to a decent start. Yes. That's what I think too. I think that, uh,

We have to be careful of these technologies, just like we do with any new technology, but especially keep in mind that we can do harm at scale with some of these big systems and that we need to really understand that before developing things to the point where they can't be steered or mitigated. Absolutely. Well,

Susan, I've learned a lot in this conversation and I think others will find this incredibly valuable as well. So thank you for joining us today. I really appreciate you taking the time to let me get to know a little bit more about you and the work that you do. It's been a real pleasure. Thank you. I really appreciate you asking. That was Susan Farrell, Principal UX Researcher.

She's more recently started working at Mmhmm. Yes, that's the name of the app. It's Mmhmm spelled M-M-H-M-M. It's a meeting app looking to make meetings and online education more engaging. So you can learn more about that at mmhmm.app and you can also find her on LinkedIn.

So if you're on the Fediverse, she also happens to be on Mastodon at joytrek at hci.social.

And lastly, she's graciously shared some links to recommended readings. So if you're interested in learning more about AI, machine learning, and large language models, LLMs, so those would be the systems powering the likes of ChatGPT, you can find all of those reading recommendations and more in the show notes.

On the next first Friday on February 3rd, we'll be sharing an interview that we had with energy UX specialist Evan Sunwall. In that episode, we talk all about expert reviews and heuristic evaluations. So if you want to get notified when that or other new episodes get released, the best way to do that is to subscribe on your podcast platform of choice. And while you're on our show page, please leave us a rating. It really helps us expand our reach and spread the knowledge of UX all over the world.

And of course, don't forget that you can find thousands of free articles and videos on UX by going to our website at www.nngroup.com. That's N-N-G-R-O-U-P.com. This show was hosted and produced by me, Therese Fessenden. All editing and post-production is by Jonas Zellner. Music is by Tiny Music and Moments. That's it for today's show. Until next time, remember, keep it simple.