cover of episode Meet the Latest AI Darling: Reddit

Meet the Latest AI Darling: Reddit

2024/12/12
logo of podcast WSJ Tech News Briefing

WSJ Tech News Briefing

People
L
Liz Young
担任 SoFi 投资策略负责人,拥有丰富的金融和投资行业经验。
S
Sarah Needleman
Topics
Liz Young: 亚马逊最新的高度自动化仓库虽然在分拣和运输等环节引入了大量机器人,以提高效率和降低成本,但仍然需要大量人工参与。这是因为目前的机器人技术还无法完全胜任一些需要精细操作、判断和适应性的任务,例如识别和处理形状、大小和易碎程度各异的商品,以及装卸卡车等。亚马逊的仓库自动化更多的是在辅助人工,而不是完全取代人工。虽然自动化可以降低成本,提高效率,并提升安全性,但它无法完全取代人类在处理复杂和不可预测情况中的作用。亚马逊路易斯安那州的新仓库就是一个例子,它占地超过300万平方英尺,最终将雇佣2500人,目前已雇佣1400人。自动化主要体现在机器人搬运货物到人体工程学高度,减少工人弯腰等重复性动作。 Sarah Needleman: Reddit的数据对于AI公司来说非常宝贵,因为它拥有海量、多样化且高质量的用户生成内容。这些内容经过用户投票(点赞和点踩)筛选,保证了一定的质量,并且涵盖了几乎所有你能想到的主题。Reddit的匿名性也使得用户在发帖时更加坦诚,这对于训练能够以自然语言进行对话的AI模型至关重要。Reddit通过向AI公司许可数据获得了可观的收入,这部分收入虽然目前仍小于其广告收入,但在快速增长,对投资者来说极具吸引力。然而,Reddit的数据并非完全可靠,因为其内容来自普通用户,而非专家,因此可能存在偏差和错误信息。AI公司需要对数据进行筛选和处理,并对AI模型的输出结果进行必要的提示和说明,以避免误导用户。

Deep Dive

Key Insights

Why does Amazon's new automated warehouse still rely heavily on human workers?

Robots cannot yet perform tasks requiring fine motor skills like identifying and picking items from bins, loading and unloading trucks, or handling a wide variety of products with different sizes, weights, and fragility. These tasks remain challenging for automation.

What are some tasks robots can perform in Amazon's automated warehouse?

Robots can carry totes to workers at ergonomic heights, reducing physical strain. They are also used for repetitive tasks like lifting heavy objects, which helps improve safety and efficiency.

How much has Amazon's new warehouse reduced fulfillment costs and sped up operations?

The warehouse has cut fulfillment costs by 25% and increased order fulfillment speed by 25% compared to less automated sites.

Why is Reddit's data valuable for AI companies like OpenAI and Google?

Reddit's extensive, text-heavy content, spanning over 19 years and 100,000 subreddits, provides a wide variety of high-quality, conversational data. Its upvote and downvote system also helps AI companies identify quality content.

How much revenue has Reddit generated from data licensing deals with AI companies?

Reddit's revenue from data licensing grew to $81.6 million in the first nine months of this year, up from $12.3 million a year earlier. While still small compared to advertising revenue, this category has seen significant growth.

How much content is posted on Reddit annually?

In the first half of this year, users posted over 5.3 billion pieces of content, a 20.5% increase from the second half of 2023.

What are the potential downsides of using Reddit's data for AI training?

Reddit's data is user-generated and not always high quality or free from bias. AI companies must filter and correct flawed or biased content, and users should not take AI outputs as definitive truths.

Shownotes Transcript

Translations:
中文

You want a straightforward path to your goals, but at Merrill, we know things may get in the way.

Or new opportunities can put you at a crossroads. With the bull at your back, you get a personalized plan and a clear path forward. Go to ml.com slash bullish to learn more. Merrill, a Bank of America company. What would you like the power to do? Investing involves risk. Merrill Lynch Pierce Fenner & Smith Incorporated. Registered broker dealer. Registered investment advisor. Member SIPC. A wholly owned subsidiary of Bank of America Corp.

Welcome to Tech News Briefing. It's Thursday, December 12th. I'm Belle Lin for The Wall Street Journal. Amazon just opened its most automated warehouse yet. But underneath all the robotics and artificial intelligence, the facility will still rely on thousands of people. We'll find out why the e-commerce behemoth can't do without humans yet.

And then AI companies were one of Reddit's biggest frustrations last year. Now they're a key source of growth for the social media platform. Our reporter Sarah Needleman tells us why OpenAI and Google are hankering for Reddit's data.

But first, Amazon said its new warehouse in Shreveport, Louisiana is its first to use automation and AI at every step of the fulfillment process. Yet it still needs thousands of people to keep it running smoothly, especially in the midst of the frantic holiday retail season that pushes millions of added shipments through fulfillment networks. So what can't Amazon's robots do?

WSJ reporter Liz Young has been following the story, and she joins us now with more.

Liz, why does Amazon need humans at an automated factory that's filled with robots? Basically, Amazon has been able to incorporate more automation throughout their warehouse. So they're able to do that to help speed up fulfillment, get packages to you at your house faster. But in the process, there's still a lot of things that robots can't do. It's difficult for robots, for example, to do what a human hand does, to reach into a bin full of items and

both identify what it's looking for and then correctly pick up that item. Same with loading and unloading trucks. These are tasks that just haven't been able to be fully automated yet. When you think about it, Amazon sells, they say, more than 400 million products worldwide.

And those range in size, weight, and fragility. So, you know, you have everything from like a soft dog toy to a toaster oven that's really big, heavy, but also easily breakable. So it's really challenging to then teach that robot, here's how you pick up a dog toy and here's how you pick up a toaster oven. And the amount of time and training data required to teach it on those things.

400 million items. That's got to be huge. Yeah, absolutely. They're working on training the AI so that the AI can identify damage, for example, is another part of this, that they want to be able to look at it and say, okay, is this broken in a way that a human can quickly look at something and say, oh, that dog toy is falling apart. A robot is still learning how to do that reliably every single time to say, we can't send that to a customer.

All right, Liz, give us a sense of this new Amazon warehouse in Louisiana. How big is this facility? Yeah, absolutely. It's more than 3 million square feet. And the building is going to employ eventually 2,500 people picking orders, loading and unloading trucks, managing the robotics, all that good stuff. So far, they've been open a little over two months. They've hired about 1,400 people. So they're still ramping up operations there.

And the idea here is that they want to incorporate more automation to help save on labor costs, speed up operations, and to make warehouses safer.

So for doing things like lifting heavy objects repetitively, are those what robots are being used for now? Yeah. So there's some automation at every different point that serves different kinds of functions. But one of the things they have, these robots that are able to roll across the ground kind of like a Roomba. And what they carry is they carry a stack of totes.

directly over to workers. And Amazon says that the robot is able to provide a tote to a worker at an ergonomic height so that people don't have to bend over, for example. Liz, do we have any idea how much of a bottom line impact these robots can have? And especially during the upcoming holiday season, can they simply pack millions more of these boxes? Amazon has not disclosed the

what a difference this makes, but they do say that it helps speed things up. Amazon said that this building has cut fulfillment costs 25% and is fulfilling orders 25% faster than some of its less automated sites. That was our reporter, Liz Young. Coming up, it turns out years upon years of Reddit posts are excellent fodder for training AI models. We'll find out why and what that means for Reddit's revenue after the break.

You want a straightforward path to your goals, but at Merrill, we know things may get in the way.

Or if new opportunities can put you at a crossroads, with the bull at your back, you get a personalized plan and a clear path forward. Go to ml.com slash bullish to learn more. Merrill, a Bank of America company. What would you like the power to do? Investing involves risk. Merrill Lynch Pierce Fenner & Smith Incorporated. Registered broker-dealer. Registered investment advisor. Member SIPC. A wholly owned subsidiary of Bank of America Corp.

AI companies need data so their apps can respond to users' questions and prompts with accurate results and in a conversational tone. Enter Reddit, whose text-heavy platform and growing collection of online human interactions fit the bill.

Plus, Reddit's willingness to sell its data to AI companies makes it stand out because there's only so much data out there to gobble up for free or purchase. Reddit recently reported its first quarterly profit as a publicly traded company, thanks partly to data licensing deals it made in the past year with OpenAI and Google.

For more on Reddit's newfound wealth and the limits of its data, we're joined by WSJ reporter Sarah Needleman. Sarah, we know there's a lot of text on Reddit, but why is that data so interesting and valuable to AI companies? Well, there are a number of reasons. The platform has been around for about 19 years. And during that time, people posted a lot of content.

comments. And what's interesting about Reddit, or at least what makes it a little bit different than some of its peers, is that users can respond to those comments with an upvote or downvote. They can also accumulate what's called karma, which are essentially like points that show you've been a good contributor to the platform.

And that's, you know, with the up and down vote, that's different than what you might see on other platforms where the content is organized by an algorithm. But when posts on Reddit get a lot of upvotes, those are the ones that you're more likely to see than the ones that are voted down because they are considered higher in quality. And that's the sort of thing that AI companies look for. They want high quality information and quality.

They're looking for a wide variety of information, too. So Reddit is divided into more than 100,000 so-called subreddits. And these are dedicated to all sorts of topics, just about everything you can possibly think of. And they can be very specific or they can be general. And that's really helpful for AI because it really covers a lot of ground, whereas other social platforms may cater to a specific group of people. So, for example, Discord is known for being popular among video game enthusiasts.

Strava is popular for fitness fans. But Reddit is for just about everybody on Earth and its demographics are really spread out. There is one other thing about Reddit that makes it attractive for the AI companies. And that is that because

Users are mostly pseudonymous. They just go by like a fake username. They tend to be more candid in what they post. And that's, again, something that's helpful for an AI tool that wants to be able to take users' queries and respond with accurate information and in a conversational tone, the way that we talk, not a robot.

For Reddit's business, this seems to be pretty great. How much money has Reddit made from some of these deals? Reddit has made a good chunk of change from these deals. It's still the amount of money that comes in from the data licensing is still much smaller than their main bread and butter, which is advertising sales. But that being said, with the first nine months of this year, Reddit's revenue category that includes the licensing deals we're talking about

grew to 81.6 million, and that's up from 12.3 million a year earlier. So this is like a category that's blown up. And so even though collectively, it's still small compared to the broader picture of Reddit's revenue pie, the fast growth is really interesting. And it's really exciting for investors, because this could be a long term opportunity. We don't know exactly how long these deals are for. But the companies are looking for more than just

what people have written in those past 19 years. They also want what's coming in right now, day after day. It's an ongoing, their thirst can never be quenched. They're constantly drinking from this fountain that is Reddit. And we're talking about a relatively high margin business. So they didn't have to put out a lot of costs to make this happen. Yeah.

Yeah, absolutely. Can we quantify how much data Reddit really has? Reddit did say that in the first half of this year, people posted more than 5.3 billion pieces of content to Reddit. And that's a 20.5% increase from the second half of 2023. And that seems like a pretty beefy amount. I will say, however, that private messaging chats, that content is not shared with the AI companies. So we wouldn't be counting that. And it does account for a large part of

The 5.3 billion pieces of content I just mentioned. Also, keep in mind, Reddit is not as big as some of its other social media peers. So while that number does seem big, and it is big, keep in mind, Reddit has something like 97 million daily users. Snapchat, for example, had 443 million daily users. In both cases, we're talking about as of the end of September.

Are there any downsides for OpenAI to be training on Reddit's data or for any AI company to be training on Reddit's data?

Reddit's data is based on what regular people are posting day in and day out. And these aren't necessarily experts. They have opinions that are wide ranging. And just because something is voted up as being very popular doesn't necessarily mean it's high in quality. Oftentimes that is the case. It is a logical conclusion, but that isn't always a guarantee. And so it's possible that some of the data that's being trained on is just flawed or biased and

And that is true of a lot of content on the internet. And so there's some judgment calls or some editing they may have to make along the way to correct that. But again, we have the signals, the upvotes and downvotes, the karma. These are all signals that help AI companies get a sense of what is high quality. And a lot of the AI companies, when they spit out results, they have some sort of language that warns readers to necessarily take this verbatim. But anyone with common sense should not take it.

anything verbatim from the internet and assume it to be true. That was our reporter, Sarah Needleman. And that's it for Tech News Briefing. Today's show was produced by Julie Chang with supervising producer, Catherine Milsop. Logging off, I'm Belle Lin for The Wall Street Journal. We'll sign back in this afternoon with TNB Tech Minute. Thanks for listening. You want a straightforward path to your goals, but at Merrill, we know things may get in the way.

Or new opportunities can put you at a crossroads. With the bull at your back, you get a personalized plan and a clear path forward. Go to ml.com slash bullish to learn more. Merrill, a Bank of America company. What would you like the power to do? Investing involves risk. Merrill Lynch Pierce Fenner & Smith Incorporated. Registered broker-dealer. Registered investment advisor. Member SIPC. A wholly owned subsidiary of Bank of America Corp.