Home
cover of episode The Story Behind Gannett's AI Debacle

The Story Behind Gannett's AI Debacle

2023/9/27
logo of podcast On the Media

On the Media

Chapters

Gannett's AI service for high school sports coverage quickly faced issues with bizarre phrases and robotic placeholders, leading to a PR crisis.

Shownotes Transcript

This episode is brought to you by Progressive Insurance. Whether you love true crime or comedy, celebrity interviews or news, you call the shots on what's in your podcast queue. And guess what? Now you can call them on your auto insurance too with the Name Your Price tool from Progressive. It works just the way it sounds. You tell Progressive how much you want to pay for car insurance and they'll show you coverage options that fit your budget. Get your quote today at Progressive.com to join the over 28 million drivers who trust Progressive.

Progressive Casualty Insurance Company and Affiliates. Price and coverage match limited by state law. My Keurig Brewer from Walmart always comes in super clutch. I got it so I can keep grinding on my paper. You know I'm hitting that deadline. I also got it so I can stay up late to do some exam cramming. And of course, you know I'll be ready to stroll into my morning class sipping in style. I guess you could say it's a literal lifesaver. Cheers to that. Shop your coffee fuel needs at Walmart. ♪

Listener supported. WNYC Studios. In August, Gannett, the country's largest newspaper company, rolled out a new artificial intelligence service that would provide automated high school sports coverage in Illinois, Texas, Wisconsin, Tennessee, Ohio, Arizona, and Indiana. And within a matter of days, it had gone horribly wrong. People were

People on Twitter/X quickly discovered that bizarre phrases like "close encounters of the athletic kind" or how one team "took victory away from another" had shown up in hundreds of local news stories.

As Scott Simon explained on NPR, in some of these AI articles, there were robotic placeholders where there should have been a mascot's name. The Worthington Christian winning team mascot defeated the Westerville North losing team mascot 2-1 in an Ohio boys soccer game on Saturday.

It's according to a story that ran last month in the Columbus Dispatch. Go winning team mascots! Our client had a PR problem on their hands. Jay Allred is the CEO of Source Media Properties, which includes Richland Source, a local news organization in Ohio.

He's also the co-founder of Lead AI, the company that built the technology that Gannett was using to automate some of its coverage. Gannett put an indefinite pause on the project of reporting high school sports results using AI with us. And we are no longer producing content for Gannett. Jay agreed to speak to me about what went wrong, his first extensive interview since his deal with Gannett blew up.

I wanted to understand why he built this technology in the first place, how it's supposed to work, and whether this disaster had shaken his belief in the potential of AI in journalism.

He told me that his team began building and using Lead AI in his own newsroom at Richland Source a few years ago, after they learned that they could draw on high school sports results from a service called ScoreStream, which collects game results often recorded by fans. Let's just take the state of Ohio, for example.

We look at all of the high confidence games in the state of Ohio, and then we're analyzing the box score. So if we're looking at a football game, we're trying to figure out, was it a close game? Was it in overtime? Was it a blowout? Was it a come from behind win in the fourth quarter? We've kind of grouped those different outcomes into scenarios. And then we're going to pull from a library of pre-written templates, plug those variables into those pre-written templates,

Choose a headline. They're all pre-written and they have variables that get plugged in. Typically with our clients, we connect to their CMS via an API, an application programming interface. Go in and then programmatically create that asset on behalf of the customer. Title it, assign an author, tag it for SEO.

And then we can either publish those assets in draft status, or we can actually automatically publish them for the customer. How many customers do you have? About 20 individual newsrooms around the country. The goal was that you could basically be offering, let's say, fairly rudimentary coverage of high school sports all across Ohio or wherever that

that your writers and editors wouldn't necessarily have to be solely on the hook for producing, and then they could go and do more meaningful coverage. We're a small newsroom. There's only 10 of us, and there's only one full-time sports reporter. There's well over 20 high schools in our region. So we can use Lead AI to cover six games on that particular night. What this lets us do is be able to provide coverage to communities that we wouldn't have been able to be at that game at all.

Our sports reporter covers the A game or the number one game. We'll cover the B and the C game with our two other reporters. And then Lead AI will be in to write the briefs for us for those other three games. From that standpoint, our editor that's on the desk that night can call coaches, flesh out that Lead AI story, combining the technology that Lead AI provides with the actual journalism our newsroom provides.

And how do you communicate to readers that what they're reading was not written by a human? Every single story that publishes on Richland Source has an author, and that author is called Auto Newsdesk. And if you click on Auto Newsdesk, it identifies itself as an AI tool right out of the gate. At the bottom of the article, we are disclosing that it's an AI tool that we're using. We're actually linking to Lead AI's website.

We have a feedback form that publishes with every piece of content that we publish. How do people react? In general, the readers understand it's information. It's not journalism. Of course, a lot of times readers want the content to be longer and then to include player names and photos and video. They want it to be a reported article. Exactly. So there's a certain sense of we're fulfilling the information need, but not necessarily the information want.

There's a couple of things about that, that there's a big gap. Number one, AI is not in a place where you should trust it as a reporter. And number two, they're really, at least in the high school sports zone, there isn't a data set that would allow us to be able to confidently report things like player names and video and photo and to be able to accurately identify all of those things. You need humans for that. Yeah. So how exactly do you...

to make lead AI produce human-sounding articles. I mean, I know that with some of these large language models, they require lots of data. And this has led to a lot of controversy around AI startups scraping enormous parts of the web, including books.

entire news outlets, entire forums like Reddit. So explain to me how you feed language and templates to lead AI. Every single word, every comma, every semicolon in our database has been written by a person. And then it's been checked by another person and checked by a person after that.

It's what allows us to be confident in all cases that if we're using our standard data set, that the content that we're producing is accurate as long as the data is accurate. And it's very accurate. Okay. That's interesting because in late August, people on social media began posting some of the really awkward phrases that lead AI has put into some local news sources. The one that caught a lot of attention on Twitter, for instance, was,

was a piece in the Columbus Dispatch and some other Gannett-owned papers. Readers finding examples of lead AI using phrases like, quote-unquote, close encounters of the athletic kind. There were a lot of articles referring to high school sports action or how one team, quote, took victory away from another team. You know, these are phrases that most human journalists would consider ranging from awkward to poor writing. So how did that happen? Yeah.

I knew you were going to go there and I'm glad you did. In mid-August, our technology powered a really big launch with Gannett across, I think, six or seven major markets in the US. We had written some custom code for that particular customer and the code had bugs in it, Micah.

It had not been tested as well as we would normally have tested it because we had internalized a deadline that made it very, very important to us to kick off right with opening night of high school football season. And some of those things that showed up in those Gannett articles, especially the errors, were the result of a small company saying,

working really, really hard to get ready for a launch with a very big company. As far as the awkwardness of the phrasing and the now infamous close encounters of the athletic kind, a human being wrote that, Micah. A person wrote that. Good is a subjective measure. And we got called out on a few phrases and they are no longer in our database.

It was as simple as taking them out. It was strange to become a meme, honestly. Yeah. But we're in a very early stage of deployment in the local news industry. And I wish that we did not have a part in a deployment that didn't work.

We learned a lot from it. I was curious to know if this was a feature or a bug. And so I actually just searched some of these phrases on the Richland source. And I counted like over 140 articles on the Richland source from this year that featured the phrase close encounters of the athletic kind or similar phrases, including 50 articles from this year that featured the phrase close encounter of the winning kind in the headline. So I don't really buy that it was just...

A fluke that happened with launching a new service with Gannett, like you have been publishing these sentences for years. No, and I appreciate you calling that out because that phrase has been in our code for years.

The things that were unique to the Gannett launch were some other great, he says sarcastically, some other unfortunate stuff. For example, there were a couple of leads that published in some of the papers where we had plugged in a variable where there should have been a mascot name. There were instances where we published two very similar lead paragraphs that

They said exactly the same thing in terms of factual information, but they said it slightly differently. And those were bugs that were built into that custom code. But those awkward phrases that the internet called out, that's been there for years. I guess, but see, I guess this is...

What sort of sends a shiver down the spine of, you know, media critics and journalists and editors, because we're talking about high school sports like this is not the highest stakes beat in all of journalism. But it seems like it does speak to the risk of automation where one small mistake.

when automated becomes 150 small mistakes all across the country. Yes, absolutely. One of the things that we learned from this process was that scale should be something that is approached cautiously. And if I had it to do over again in our biggest launch ever, I would have launched in one site and checked every single piece of content up and down. And we would have found the exact same errors and awkward phrasing

And those objections would have rolled through. And then we would have been able to change those and fix those bugs and then launch in two sites a week later. And when those two sites were okay, we could have added a third and a fourth.

High school sports are a low risk beat. What if this had been crime reporting? Yeah. What if these had been arrest reports? Real harm could have been done as leaders in the industry. I think it should give us all pause. It's why I'm having this conversation with you. And I appreciate that. I appreciate your vulnerability and your openness to introspection. Are you at all concerned that local newsrooms would see this?

the promise of lead AI, maybe think that it's capable of doing more than it is and kill entry-level jobs. I think about that every single day.

In three years of talking to newsleaders around the country, I've never once heard one of them say, I'm super excited for AI because I get to reduce my headcount. Well, no one says those things. You're right. They say we would like to be more efficient. We speak in euphemism. I agree with you. Getting people to not say things like reducing headcount is a good first start.

I don't know, though, how large public news organizations are going to choose to deploy this technology. Yeah, and I guess speaking of large public news organizations, BuzzFeed laid off its new staff and then said, we're going to be pivoting to AI. MSN, owned by Microsoft, more or less did a similar thing a couple years ago. What do we do about some of these larger organizations that are frantic?

fully bought in to AI and believe or may want to believe that it can replace journalism. I think that the organizations that believe that are not news organizations, as you and I understand them, Micah.

I'm not a spokesperson for any of them. I can only speak to the conversations that I've had. But up and down the spectrum in terms of size, they're trying to figure out how to implement this technology so that they can do better journalism because they have cut their organizations in many cases to a point where there's not a lot of fat left to trim.

to try to create with this technology tools by which their journalists can do more impactful reporting while not having to chase informational reporting that audiences want, but they simply don't have the people to do anymore. And with all of those things said,

I still lose sleep over it at night. What do you lose sleep over? Is the intention to use your euphemism to find efficiency and to do that through less people? Or is the intention to create more value for consumers so that we can get the nose of this airplane pointed up?

And we can start to create a future where local news entrepreneurs can think of local news as a good small business. At the same time, as you're expressing fear, I also sense a kind of belief that this technology can help and that it will grow and work. I mean, how do you hold those two together? No one was harmed by our awkward phrases and poorly placed variables inside of our high school sports reporting.

But had we been doing arrest reports, there could have been incredible harm. The reality is we're all going to be dealing with this. I don't want anybody to go through this shit, Micah. Right.

And I think that there's lessons to be learned here and we can grow as an industry and get better because the reality is this stuff is it's not coming. It's here. That's what I've heard in some of your answers is still this, this implicit belief that the kind of rushing river of technology is coming no matter what. And I wonder if this is a moment in time to say like, there might be some uses for AI, but we don't just have to see it to its logical conclusion just because technology is great, bro. Yeah, I agree with

I think we should use techs like Lead AI to report unreported stories that would never go reported otherwise. We should interrogate that technology vigorously and make sure that it can be trusted and be accurate.

And I know that there are ways to do that. You've been forthcoming about the mistakes your team made and the limitations of the technology. Do you feel that any of the backlash to AI within the media has been unfair? Like, has any of it you think kind of missed the mark? I think that our industry has a tendency to respond to stuff like AI from a very defensive position. It's super understandable.

Our industry has done nothing but cut newsrooms to the bone for going on two decades now. I wish we could get into spaces where we understood that we were more all in this together and that we are trying to figure it out. I think we as an industry need to be able to hold multiple things to be true at the same time, which is malevolent deployment of AI inside of our industry is going to hurt our industry.

intentional, thoughtful deployment for the benefit of readers and communities and reporters can benefit our industry. Both things might happen. I hope it's the second. That's going to be the work I continue to do. Jay, thank you very much. Thank you, Micah. I was glad to be invited onto your program. Thanks.

Jay Allred is the CEO of Source Media Properties, which includes companies like Lead AI and Richland Source, a local news organization in Ohio. That's it for the Midweek Podcast. You don't want to miss this week's big show. We're debuting a brand new series that we've co-produced with ProPublica. Check your feed around dinnertime on Friday. I'm Michael Onger.