Amazon q business is the generate A I assistant from A W S, because business can be slow, like waiting through mud, but amazon q helps streamline work, so tasks like some mary zing monthly results can be done in no time. Learn what amazon q business can do for you at aw com ash, learn more.
Many of us remember the image of the pope in a puppy coat that went viral last year. That image of pope Francis studying the street in a cheek White coat wasn't real. IT was generated by artificial intelligence, but IT was extremely convincing. You could argue that an image like that is pretty harmless, but what happens when less harmless fake images, videos or audio, or circulated online A I generated content can pose serious risks to individuals, governments and even financial markets. In october of twenty twenty three, president biden invoked emergency federal powers to a third overset of powerful new artificial one piece of a new executive order aimed raining in A I risks.
everything in cybersecurity mitigation. If you want to posit some phenomenally complex time travel, space aliens, conspiracy, yeah, I I can help you, but I can help lots and lot of people who are reasonable and are just being fed lots of lies and we can pull them out of that echo chAmber.
That's honey ferid, professor of computer science at the school of information at university of california, burkey. He's been thinking about fake images for over two decades and just research on digital forensics, misinformation, image analysis and human perception. And he says part of managing A I risks means giving people the tools to determine which images are real or fake from the wall street journal.
This is the future of everything. I'm charla garden burke. Today we are bringing new alex osla conversation with honey for about how easy IT is to create convincing A I generated content and why that's a problem plus for retail, alex, about the content authenticity initiative, a global coalition of two thousand members from tech policy and media, including the wall street journal for read works for IT, is a paid adviser here. How companies are working together to combine technology with media literacy to help us spot the fates to say with that.
Amazon q business is the new generative A I assistant from A W S, because many tasks can make business slow, as if waiting through mud a help.
Luckily.
there's a faster, easier, less messy choice. Amazon q can securely understand your business data and use that knowledge to streamline tasks. Now you can summarize quarterly results or do complex analysis in no time.
Q got this. Learn what amazon q business can do for you at A W S dot com flash. Learn more.
Before alex and honey .
got to the stakes of identifying deep fakes, SHE asked him about his early experiences with A I generated images and how his interest in deep fakes began.
I have been thinking about manipulated media since one thousand and ninety seven in ninety seven, films still dominated the landscape. Digital was a glimmerin. R.
I. And the internet was very much in its inanity. Y, in the early not, we were thinking in my lab about how you can detect manipulated images.
And then around two thousand and fifteen, sixteen, something very dramatic happened, which is what we now call generate AI or deep fakes. And so that was about seven years ago. I I heard that term for the first time.
And you know, you looked at the the images and videos being created. They were terrible. They were like gray scale and tiny and super noisy.
But if you watch the next generation and the next generation, which was happening on a month to month basis, you very quickly realize that this technology is getting Better. Now the cases can be measured in weeks. right? Every few weeks you see something dramatic.
We went from these glitchy deep fakes from five, six, seven years ago to full time, real time, almost a frame rate running on my laptop superimposing symbol's face on mind at high read. That is incredible. Do you think a typical .
user can tell if an images fake? No, I don't think .
the typical user can can tell the difference, especially at the speed at which they're flicking their the screen going through images. It's it's hard. Look, I do this for a living and it's hard.
So where we at that sort of tipping point.
we're there, I would say that we are passing through the uncandor valley. Um it's not a hundred percent perfect, but what yet to understand about fraud and disinformation is IT IT doesn't have to be a hundred percent perfect, just has to be pretty good and it's pretty good on the way to exceptional are .
we are there for all forms of AI generated content, voice, still images, video, are those on different trajectories?
Yes and no. So let me tell you where we are. Stable 的 fusion major ney di images are incredibly good, like the artifacts now are very, very minor.
I think images are almost there. If we were having this conversation three months ago, I would say audio is years away. And I would spectacularly wrong.
Audio really caught up. And with audio, there's two things to think about. One is the natural nest.
Does that sound like a human? And the other is identity preserving. So if I give you two minutes of audio of you, alex can IT recreate your voice. And I would say that there are now a number of services out there that have essentially crack that not it's solve problem and it's really, really good. Video is probably the one that is lagging. So if you look at the the chat P T for text majority stabled, the fusion in delhi for images, eleven labs, ub doc and a number of these for audio, those are all just go to a website and use IT, right? There is zero barrier to entry. But with video, you still need some scale, like you've got to go over to github and download this repository and get IT compiled and run IT in IT takes a little bit of in that sing, but that's a matter of months less than a year before somebody creates the website that says, okay, upload the video, update the image and i'll slap them together.
Tell me what's unique about this moment is that the sheer pace of improvement of these algorithms .
or is access um it's both. I would actually say there's a third component, which is I can also publish IT to the world instantaneously. We can separate out, generate A I from goods, fashion, social media, right? If I had the ability to create a harmful or hateful audio of the president or non consensual sexual imagery, all I can do is sending to my five friends.
It's not great, but I can live with that. But the fact is, I can carpet bomb the internet with IT by posting IT on twitter and facebook and instagram and youtube and tiktok. These things are really getting traction on some of these platforms. So I would say it's a confluence of all three of these things, underlying technology, ubiquity and access, and then the ability to distribute widely .
and effortlessly. What are the stakes here?
So fraud, illegal activity, a disinformation campaigns. I can now create audio and videos of the president or presidential candidate saying anything I want them to say. And so now you're looking at threats to our individuals, to our societies, to our economies and to our democracies.
And that is with today's technology. And here's what you know about technology. IT does what technology always does.
IT get Better, IT gets faster, IT gets cheaper and IT gets more ubiquitous. That's what technology has done, and they will continue to do. So this trend is going to continue.
AI generated imagery isn't going anywhere, but tech could help us separate what's real from what's deep fake. More on that after the break.
Amazon q business is the new generative A I assistant from A W S, because many tasks can make business slow, as if waiting through mud a help.
Luckily.
there's a faster, easier, less messy choice. Amazon q can security understand your business data and use that knowledge to streamline task? Now you can summarize quarterly results or do complex analysis in no time.
Q, got this. Learn what amazon q business can do for you at A W S dot com. Slash, learn more.
How our images like this actually, mate, like what's going on on in the back end.
First of all, what you should understand about this, A I revolution is I don't think it's an A I revolution. I think it's a data revolution. I think the reason you are seeing generated images and video is because over the last twenty years, we meet you when everybody else has uploaded billions and billions of pieces of content that the machines are learning from.
And I mentioned that because here's how the pope and the puppy co image was made majority stable, the fusion delhi, what they did is they went out and scrape billions of images that were annotation with text captions. That last part is really important. So now what has some five billion images? And what IT does that takes an image with a caption, uh, five people sitting at a bar in nap valley and enjoying a nice cabin that's the caption.
And IT has an image associated with that. Then what he does IT takes the image and IT adds a little bit of noise to IT degrades the image and then learned how to go backwards, how how to d noise IT. And then IT doesn't until it's IT right.
And then he adds a little bit more noise and then goes back and little bit more noise and goes back. And he keeps degrading the image until eventually IT degraded, until it's completely unrecognizable. And it's learned how to then generate from a pure noise image, the image of five people sitting a bar and nap vali enjoying a nice cover.
Ny IT basically forcing its up to degrade and then clean the image. Degrade, clean. Degrade, clean. IT does is five billion times now, and then what he has is IT knows how to go from a pure noise image with the caption to a clean image that depicts what the caption is there. There are other techniques for doing other types of sync images, but that's the one that is particularly popular these days.
I'm curious whether any of this stuff leaves a trace. Like when we're talking about software that can detect whether an image was generated by A I or not or just manipulated, like how do we know?
So the short answer is that there is a really big difference between when I pick up my phone and take a photo and there's a complex three dimensional scene with lighting and IT goes through a lens and IT goes through the post processing and eventually gets process and delivered to me verses that diffusion process that I described to you earlier, where IT synthesizes whole cloth and image.
And so we can also do what we would call politics s, which is what architecture generated this. Now, this is an inherently adversarial system, understand? So that means I build a Better detector. You build a Better attacker, I build a Better detector, you build a Better attacker, so that that will be a bit of an arms race. But the way these techniques work is they learn characteristics, whether they are low level statistical or for example, we know that a lot of these different fusion based images do really weird things with how they light the the scene. The lighting is not .
particularly natural. okay. So the way things are now, these tools workers h, what will you take for them to get a lot Better?
okay. So there's some good news and bad news here. So the good news is we absolutely should keep developing these technology and they will get Better, but the synthesis will get Better to and we will do what we always in cyber security think spam, anti spam, virus, anti virus.
Everybody keeps getting Better. And you eventually you had study state, but here's the with what we call passive techniques. So this post hoc analysis is great.
This is my brand butter, understand. But IT doesn't really deal with the consensual sexual al imagery that is now has been covered, bombed in the internet. IT doesn't deal with the manipulation of the market.
IT doesn't deal with fraud. I get a phone call. I've already transferred by vivo five hundred dollars fine.
You figure out the audio is fake. Too late to really deal with this. At the scale of the internet, billions and billions of upload, we need different solutions. And this is where things like the content authenticity initiative coming to play. Because here, instead of having this model love, let anybody do what they want, uploads something online and then start scrambling to figure that out.
The different model is, look, if you're in the business of generating content, whether a synthetic or real, you are in the best position to tell me whether this is real or not. And so what the C A I does is IT says if you're in the business of synthesizing images, you should watermark and fingerprint every single piece of content. Um devices themselves should watermark and fingerprint every single piece of content that they record.
The synthesis engines are in the best place to tell me what's really and what's not. And if they watermark and fingerprint every single piece of content, which is a big if, by the way, or then downstream instantaneously, my browser will know, because in my browser can say, okay, I know what to look for embedded in the image that will tell me right away. And so I think a combination of these technologies are required.
So tell me what this wood, in a sort of ideal form, look like for the average internet user.
What I would like to see if my ideal plantation is biting into the browser, because if my browser was C A. I compliant, IT knew about these signatures, any image that loads into my browser and i'll just have that little when have a little logo over the image saying this has been authenticated as as a human recorded image, this has been authenticated as a computer generated image and it's just baked into the browser.
IT actually has a light IT says, here's what happened. Honey imported this image. Uh, he modified this part of the image um and then he did this, this, this in this. You gotto get people the information, but then we've got do other things to make sure that they can incorporate that to the way they think about the world.
Some people who talk about this issue say that media literacy is a solution to this problem. What do you think about that?
I think we need at all. I think we need more responsibility for our technology leaders. I think we need good technology like we've been talking about.
I think we need regulatory pressure, and I think we need media literary cy, we need at all. There's no, there's no magic solution here. There's nothing of itself that sort solves this, this misinformation problem we've been dealing with for a few decades.
I think we need lots and lot of solutions to this, and everything together starts to really chip away. The promise doesn't eliminated. By the way, you have to understand everything in cyber security mitigation. If you want to posit some phenomenally complex time travel, space aliens conspiracy. Yeah, I I can help you, but I can help lots and lots of people who are reasonable and and are just being fed lots of lies and we can pull them out of that echo chAmber.
What is this issue going to look like?
This problem doesn't stop, right? This issue with manipulated media is only going to continue. And I think there's one of two scenarios. It's not even ten years. It's three years. Um five years is either we keep going down the road or going which is the disobeying helsa pe of an internet where everybody's living in their own echo chAmbers. And I don't think that out of the question.
I think if we don't get interact both from a technological perspective and industry leadership perspective in a regulate perspective, we are going to keep making the same mistakes of the last twenty years. It's going to make things worse. Um I don't think that has to end up that way.
I think there is a Better scenario for us. And I and i'm not a technical utopias. I don't believe that technologies is inherently good and leads to good things all the time, but I do believe in the power of logy.
And I think we have not been leveraging this technology in a way that's necessarily good for the masses. So I think with the right regulating pressure, with the right leadership, with the right technology, we can start to write the ship. Um I think though things go sideways, we are gona continue down this head scape that is the the current social media landscape that were in and I honestly probably coin flip right now.
which wait goes that .
was honey's red. Professor, computer science at the school of information at university of california, berkeley. Want to learn more, check out our episode real or AI the tech giants racing to stop the spread of fake images that in your feet, the future of everything is a production of the wall street journal l this episode was produced by alex osla and beat charle garden. Thanks .
for listening.
Amazon q business is the new generative A I assistant from A W S. Because many tasks can make business slow, as if waiting through mud a help.
Luckily.
there's a faster, easier, less messy choice. Amazon q can securely understand your business data and use that knowledge to streamline tasks. Now you can summarize quarterly results or do complex analysis in no time.
Q got this. Learn what amazon q business can do for you at A W S dot com. Slash, learn more.