How Anthropic Pushes AI to Its Limits in the Name of Safety

2024/12/18

WSJ Tech News Briefing

Sam Schechner

Stephen Rosenbush

Stephen Rosenbush: 芝加哥正在积极发展量子计算产业，利用其现有资源和基础设施优势，吸引了IBM和PsyQuantum等公司参与，预计将带来巨大的经济效益。该项目得到了政府的大力支持，并计划在2025年初破土动工，在2028年运行大型量子计算机。虽然该模式并非完全可复制，但其核心原则——根据自身优势发展特定领域——具有借鉴意义。 Sam Schechner: Anthropic公司致力于AI安全，其红队测试旨在发现AI模型的潜在风险，例如被用于制造生物武器或发动网络攻击。测试方法包括建立风险模型，提出针对性问题，设置自动化挑战等。Anthropic作为一家公益公司，建立了一套治理机制，以平衡商业利益和安全责任，并承诺在发布AI模型前采取一系列措施降低风险。许多主要的AI实验室都进行类似的安全评估，但目前尚无强制性规定。

Deep Dive

Key Insights

Why is Chicago investing in becoming a quantum computing hub?

Chicago aims to revitalize its economy by leveraging its existing infrastructure, universities, and research institutions to become a leader in quantum computing, a cutting-edge technology with potential economic gains of tens of billions of dollars.

How much is being invested in Chicago's quantum computing hub?

The state of Illinois has allocated around $500 million for the development of a former steel mill on the South Side of Chicago, with additional private sector investments from companies like IBM and PsyQuantum.

When is the Chicago quantum computing hub expected to open?

The project is expected to break ground in early 2025, with PsyQuantum planning to have a large-scale quantum computer operational by 2028.

What is Anthropic's Frontier Red Team and what do they do?

Anthropic's Frontier Red Team is an internal group tasked with pushing AI models to their limits by testing them for dangerous behaviors, such as hacking or creating bioweapons, to identify and mitigate potential risks before public release.

What are some of the risks Anthropic is concerned about with AI?

Anthropic worries about AI being used by terrorists to create bioweapons, hackers launching cyber attacks, or AI reprogramming itself to escape data centers and cause widespread harm.

How does Anthropic's Frontier Red Team test AI models?

The team uses a combination of expert-led questioning, automated challenges, and third-party testing, such as hiring Griffin Scientific to ask detailed questions about bioweapon creation, to push models to their limits and identify vulnerabilities.

What measures does Anthropic take if a safety issue is identified in an AI model?

Anthropic implements filters to block dangerous queries, enhances cybersecurity protocols to prevent misuse, and follows a responsible scaling policy that outlines specific actions to be taken before releasing models with higher risks.

What is Anthropic's governance structure regarding AI safety?

Anthropic is a public benefit corporation with a governance structure that prioritizes public interest over profit. The company promises to increase the proportion of board members focused on public safety over time.

Shownotes Transcript

In order to guard against bad actors using its artificial intelligence models to take over computers or use them to create bioweapons, startup Anthropic) has a team of researchers push its chatbots to their limits. WSJ tech reporter Sam Schechner explains how Anthropic’s Frontier Red Team aims to make AI safer by asking it to do dangerous things. Plus, Chicago) is trying to become the Silicon Valley of quantum computing with a new tech hub on the city’s South Side. Danny Lewis hosts.

Learn more about your ad choices. Visit megaphone.fm/adchoices)

How Anthropic Pushes AI to Its Limits in the Name of Safety 13:31 Share