cover of episode How Anthropic Pushes AI to Its Limits in the Name of Safety

How Anthropic Pushes AI to Its Limits in the Name of Safety

2024/12/18
logo of podcast WSJ Tech News Briefing

WSJ Tech News Briefing

People
S
Sam Schechner
S
Stephen Rosenbush
Topics
Stephen Rosenbush: 芝加哥正在积极发展量子计算产业,利用其现有资源和基础设施优势,吸引了IBM和PsyQuantum等公司参与,预计将带来巨大的经济效益。该项目得到了政府的大力支持,并计划在2025年初破土动工,在2028年运行大型量子计算机。虽然该模式并非完全可复制,但其核心原则——根据自身优势发展特定领域——具有借鉴意义。 Sam Schechner: Anthropic公司致力于AI安全,其红队测试旨在发现AI模型的潜在风险,例如被用于制造生物武器或发动网络攻击。测试方法包括建立风险模型,提出针对性问题,设置自动化挑战等。Anthropic作为一家公益公司,建立了一套治理机制,以平衡商业利益和安全责任,并承诺在发布AI模型前采取一系列措施降低风险。许多主要的AI实验室都进行类似的安全评估,但目前尚无强制性规定。

Deep Dive

Key Insights

Why is Chicago investing in becoming a quantum computing hub?

Chicago aims to revitalize its economy by leveraging its existing infrastructure, universities, and research institutions to become a leader in quantum computing, a cutting-edge technology with potential economic gains of tens of billions of dollars.

How much is being invested in Chicago's quantum computing hub?

The state of Illinois has allocated around $500 million for the development of a former steel mill on the South Side of Chicago, with additional private sector investments from companies like IBM and PsyQuantum.

When is the Chicago quantum computing hub expected to open?

The project is expected to break ground in early 2025, with PsyQuantum planning to have a large-scale quantum computer operational by 2028.

What is Anthropic's Frontier Red Team and what do they do?

Anthropic's Frontier Red Team is an internal group tasked with pushing AI models to their limits by testing them for dangerous behaviors, such as hacking or creating bioweapons, to identify and mitigate potential risks before public release.

What are some of the risks Anthropic is concerned about with AI?

Anthropic worries about AI being used by terrorists to create bioweapons, hackers launching cyber attacks, or AI reprogramming itself to escape data centers and cause widespread harm.

How does Anthropic's Frontier Red Team test AI models?

The team uses a combination of expert-led questioning, automated challenges, and third-party testing, such as hiring Griffin Scientific to ask detailed questions about bioweapon creation, to push models to their limits and identify vulnerabilities.

What measures does Anthropic take if a safety issue is identified in an AI model?

Anthropic implements filters to block dangerous queries, enhances cybersecurity protocols to prevent misuse, and follows a responsible scaling policy that outlines specific actions to be taken before releasing models with higher risks.

What is Anthropic's governance structure regarding AI safety?

Anthropic is a public benefit corporation with a governance structure that prioritizes public interest over profit. The company promises to increase the proportion of board members focused on public safety over time.

Shownotes Transcript

Translations:
中文

Amazon Q Business is the generative AI assistant from AWS, because business can be slow, like wading through mud. But Amazon Q helps streamline work, so tasks like summarizing monthly results can be done in no time. Learn what Amazon Q Business can do for you at aws.com slash learn more. Welcome to Tech News Briefing. It's Wednesday, December 18th. I'm Danny Lewis for The Wall Street Journal.

Chicago, Illinois, home to deep dish pizza, the bears, and maybe soon quantum computing? We'll hear how the region's business and political leaders are laying the groundwork to make the Windy City a hub for developing this cutting edge technology. And artificial intelligence startup Anthropic was founded in part with an eye on making AI safe. But in order to do that, the company tasks an internal team to push its models towards dangerous behavior.

We'll find out how they do it and how AI companies assess risk as the tech continues to advance. But first, an old steel mill on Chicago's South Side could one day become the Silicon Valley of quantum computing, a technology that relies on quantum mechanics to solve complex problems that regular computers struggle with. Instead of using bits, which only ever have two states, zero or one, quantum computers use qubits, which can be zero, one, or even both at the same time.

Researchers have been working on quantum computers for years, but Illinois leaders like Governor J.B. Pritzker are making a multi-million dollar bet that Chicago will be the center for this technology. And WSJ Pro Enterprise Technology Bureau Chief Stephen Rosenbush says companies from IBM to startup PsyQuantum are signing on. He spoke to my colleague, Bell Lin, about how this research hub is coming together.

You write that Chicago is kind of diving headfirst into quantum computing. Why is it doing that? The answer may be someone has to. Why not Chicago? Chicago has played such a huge role in the economy for many decades, but the industries that sort of

Built it in the 19th century, the first part of the 20th century, aren't what they used to be, but there's a lot of real estate there. And it occurred to the leadership in the city and in the state that there was a lot of quantum infrastructure that could be used as sort of a springboard to develop a much greater infrastructure.

technology ecosystem around quantum computing, rather than trying to catch up in areas where other parts of the economy or parts of the world are really super well established. And what about the money involved? How much money do you need to build up something like the Silicon Valley of quantum computing? Well, in round numbers, you need a lot. The state has directed something on the order of half a billion dollars into the development of this former steel mill on the south side of Chicago. There's

Private sector money being invested, both PsyQuantum and now IBM are planning on establishing a presence at this site, which is right on the shore of Lake Michigan. When does the city expect that this will open? The project is making its way through the final stages of city approval right now. The developers expect to break ground on this project in early 2025. And PsyQuantum expects to have a large-scale project

quantum computer on the site up and running sometime in 2028. Economically, how much does Chicago stand to gain from building a really thriving quantum infrastructure? BCG looked at this. There could be tens of billions of dollars in economic growth generated over the foreseeable future. So

It translates into massive gain economically, if it works. If it works. Yep, that's the big question. Chicago has kind of laid out this blueprint for revitalizing its economy and harnessing a lot of its resources into a really exciting area of technology. Do you think it's likely that we'll see other cities follow suit?

This model is particularly well suited for Chicago. It has the physical infrastructure and it also has the network of universities and research institutions and corporations that can supply expertise, that can supply potential workforce down the road. There's very well developed financial industry in Chicago as well.

So they're building on what they have. They're not trying to duplicate something that another region has already developed. They're trying to do something new that suits them. And they're also going about it in a pretty targeted way. So you couldn't necessarily translate this specific model to other regions, but the underlying principles

figuring out what is it that a particular region is well suited for, that is something that could be deployed elsewhere. That was WSJ Pro Enterprise Technology Bureau Chief Stephen Rosenbusch speaking with Bell Lin. Coming up, how close is AI to building a catastrophic bioweapon or causing superhuman harm? Just ahead, we'll hear about the team at Anthropic working to reduce the danger of AI by pushing models to their limits. That's after the break.

Amazon Q Business is the new generative AI assistant from AWS because many tasks can make business slow, as if wading through mud. Uh, help? Luckily, there's a faster, easier, less messy choice. Amazon Q can securely understand your business data and use that knowledge to streamline tasks. Now you can summarize quarterly results or do complex analysis in no time. Q got this. Learn what Amazon Q Business can do for you at aws.com slash learn more.

Anthropic is the AI startup behind the Claude chatbot. But before it releases models to the public, its Frontier Red team tries to break them in ways that could be dangerous. Like asking the AI to hack into a computer, or to provide instructions on how to make a biological weapon. WSJ tech reporter Sam Schechner looked at how this team tries to make AI break bad in order to make the models safer. He joins us now.

So, Sam, when Anthropic talks about danger from AI and making models safe, what exactly do they mean? Nobody thinks that today's models are currently capable of being like HAL 9000 in 2001 and trying to kill a human or take control of a spaceship. But the question is, what will they be capable of?

And are we going to be able to figure that out before they are capable of it? For instance, one of the risks that they're worried about is could a terrorist use it to learn how to make a bioweapon? Or could a malicious hacker use it to launch millions of simultaneous cyber attacks? Or, and this is a little bit more esoteric, could an AI eventually learn how to reprogram itself and

and escape from the data center that it's in and reproduce and run amok in the wild. Right. And so you were reporting on Anthropic's Frontier Red Team, but first...

Really briefly, what is red teaming? Well, red teaming didn't start with AI. It's actually a pretty common practice for cybersecurity in computers. You essentially set a red team to try to attack your server, your system, and see if they can break it. And that's a way of testing your defenses. And then you try to improve the defenses and set the red team at it again. In this case, they're setting the red team at AI.

these new AI models that they've just put out to see just how bad they can make them be. You could red team them to see if you can get them to say really offensive things or get them to spew Nazi nonsense. In this case, they're red teaming them to see if they can get them to show some of the capabilities that would be necessary to cause what they call catastrophic harm.

So how does Anthropix Frontier Red Team then test artificial intelligence models? Like, what are they looking for and how do they go about pushing these to the limits? Well, it starts with figuring out what risks they're actually interested in. They actually have to come up with what they call a risk model, a very specific model of

a particular danger that the AI could present. Like, okay, you have somebody who maybe has access to certain things that you'd need to create a specific bioweapon, but they don't have the wet lab skills. So can the AI give you

accurate advice about how to manipulate a virus in a lab. And you start to red team it and you set, you know, and they actually hired an outside company called Griffin Scientific, which is now owned by Deloitte, to ask it lots of questions. And they had, you know, both experts in bioweapons do this because they already knew the answers. And they also had smart novices, PhDs in other areas trying to see if they could get more information than you could get from Google.

In other realms, it amounts to a lot of automated questions or questions

Automated challenges that you give it, capture the flag challenges, for instance, are what they use in cybersecurity, where you have a flag on a target system and they have to somehow break into that system and get the flag, which would be a string of text like, it's a me, flaggio, or something that you wouldn't find online. That actually is the flag that they had it find on their target systems. Say Anthropic's Frontier Red team has concerns about a new AI model the company's developing. What happens next?

There's a lot of people who are concerned that companies that are for-profit companies, like if they discover a safety issue, what are the incentives here? Anthropic actually was founded in part by people who thought that other AI companies weren't taking safety seriously enough. And so they have a lot of governance mechanisms built in.

to kind of try to rebalance those incentives. And so right now, it basically amounts to a promise. But with these governance mechanisms, more and more of their board over time will be controlled by basically people who have the public interest in mind as opposed to necessarily their profit. And they're a public benefit corporation, which allows them to take into account other criteria besides just government.

return to shareholders. And so what they have is this thing that they call a responsible scaling policy that they've promised that they'll follow, which basically says that if an AI shows specific skills

then they promise that they will do a list of things before releasing it. Like the skill right now is to give you a big leg up in building a bioweapon. So they're going to put in place filters that will not allow you to ask those questions or block the answers. They also promise to put in place better and verifiable cybersecurity protocols to make sure that the model can't be stolen by some hackers and then misused without those filters put in place.

And then for the next safety level, once they get even more dangerous, they're going to have to come up with

a list of things that they'll do that's even more advanced. They haven't yet come up with that list that's part of the promises that they've made. And for now, we're basically taking these promises at face value. And there's no reason to think that this isn't all in good faith. But we have yet to hit the moment where the profit motive and the safety motive are really in conflict. And most, if not all of the major AI labs, depending on how you define them, do this kind of testing that I'm

called evals or safety evals, short for evaluation. So OpenAI does them. Google DeepMind does them. There's no requirement to do it, but they've all pledged to do it. And they do it and report with varying levels of detail the results that they get. And they've also pledged to mitigate the risks that they uncover in one way or another. That was our reporter, Sam Schechner. And that's it for Tech News Briefing. Today's show was produced by Julie Chang with supervising producer, Catherine Millsap.

I'm Danny Lewis for The Wall Street Journal. We'll be back this afternoon with TMB Tech Minute. Thanks for listening.

Help? Luckily, there's a faster, easier, less messy choice. Amazon Q can securely understand your business data and use that knowledge to streamline tasks. Now you can summarize quarterly results or do complex analysis in no time. Q got this. Learn what Amazon Q business can do for you at aws.com slash learn more.