People
E
Eli Schaefer
Topics
Eli Schaefer:新闻机构指责Perplexity AI新闻引擎剽窃和网络爬取。他们认为Perplexity在总结新闻文章时会逐字复制原文内容。这引发了关于Perplexity是否应该为用户使用其AI生成的剽窃内容负责的争议。 我认为这是一个复杂的问题,涉及到技术、法律和道德等多个方面。一方面,Perplexity作为AI工具,其功能是根据用户请求提供信息,它本身并不主动进行网络爬取和剽窃。另一方面,Perplexity生成的总结中包含了受版权保护的文本内容,这确实引发了版权和道德方面的担忧。 Perplexity的回应是,总结URL与网络爬取不同,因为Perplexity只是应用户请求提供信息,而非主动爬取数据,因此剽窃责任在于用户而非Perplexity。这与搜索引擎Google的运作模式类似,Google显示的搜索结果包含了来自各个网站的文本内容,但这并不意味着Google在剽窃。 然而,Perplexity的这种说法并不能完全消除人们的担忧。因为Perplexity生成的总结中包含了大量的原文内容,这使得用户很容易进行剽窃。因此,Perplexity需要承担一定的责任,例如改进其AI模型,避免生成包含过多原文内容的总结,或者在生成总结时明确标注信息来源。 此外,AI图像和文本生成也面临类似的版权问题,因为它们是基于已有的受版权保护的素材进行训练的。这需要我们重新思考AI技术与版权之间的关系,制定更完善的法律法规来规范AI技术的应用。

Deep Dive

Shownotes Transcript

Translations:
中文

Hello everyone, welcome back to another episode of the AI podcast where we talk about recent developments in artificial intelligence and how it evolves in the future. I'm your host Eli Schaefer. I'm so excited to be here with you today. We have an exciting episode, so let's get into it.

Okay, so today we're going to talk about an AI search engine called Proplexity. If you don't know what Proplexity is, it's pretty much

AI tool that you can ask it questions and it will give you the answer. This is what their hyperlink says. It says, Perplexity is a free AI-powered answer engine that provides accurate, trusted, and real-time answers to any question. That's just a brief summary of what the service is. So Perplexity launched in 2022 and they've

made a very big impact on the AI space and they've been growing ever since for the last two years which is crazy. So what is happening is kind of insane. This is something that we've seen a little bit with other large language models as well and it's not really different with perplexity where you

News outlet companies are accusing them of two things. So one is plagiarism. And then the second one is web scraping. And essentially what web scraping is, is when a robot goes through the internet, goes on websites, gets data from those websites, and then puts it in an index, similar to what Google uses so that like websites can show up on there. So that's that's kind of what web scraping is. And

Why they are being accused of this is they have, there's been a little bit of research done where people from different news outlets will give it a link to one of their news articles and ask it to summarize the article. And some of the text it will output is word for word what the article says. And obviously this is something that is a big deal, especially with these news outlets. Um,

plagiarism is something that they want to stop and protect. And also because Perplexity is a big enough company, it makes sense for them to just want to go after them solely because they could probably make a good amount of money if they could catch them, you know, doing something unethical or illegal. So it is, it's a pretty big deal. Now,

Web scraping is, although it is kind of what they've been accused of, Perplexity's head of business, Domitry, I hope I'm saying that right. He said that, so he said summarizing a URL isn't the same thing as crawling. Crawling is when you're

So he pretty much said that summarizing a URL isn't the same as web scraping. He said, so this is what he says. He says it is when you're just going around sucking up information and adding it to your index. He noted that perplexity's IP might show up as a visitor to a website that is otherwise kind of prohibited for robots.

only when the user puts a URL into the queue, which doesn't meet a definition of crawling. So pretty much what he's saying is, although it could seem like it is web scraping or web crawling, those are the two different terms, because there's an IP address from that company going to those websites and collecting data, what they're essentially saying is,

The AI is really just fulfilling a request from a human. So it's not the same as an automated robot going through and doing this 24-7. It's really just a robot that's just fulfilling a request, which is going to a website, getting data from it, and then bringing it back to the user. And then what they're saying is,

Now, it's up to the user if they use that data. If they use that plagiarized content, then it's kind of on them. It's not really perplexity's fault. That's the stance they're taking. Kind of like how if you search something up in Google, a result that is text from the actual website will show up under that website. Doesn't mean Google's plagiarizing it. It just means Google is showing them that content, which is from the website. So,

Really interesting. I'm curious what your thoughts are on this because there is... This has kind of been a concern for a while. Even image generation, like, AIs, people have...

kind of accused that as being like using copyrighted material because it's trained off of actual artwork that people have made and then it generates images on its own kind of like inspired by that artwork and text generation is the same thing so

Yeah, I'm curious what your thoughts are. Definitely let me know. Don't forget to subscribe. That's it for today's episode. Thank you so much for tuning in. This has been a really good discussion. I know I've certainly learned a lot about artificial intelligence. I'm going to definitely be a little bit more cautious when I use it. Don't forget to subscribe. I will see you in the next episode.