The three biggest AI companies — Anthropic), OpenAI), and DeepMind) — have now all released policies designed to make their AI models less likely to go rogue or cause catastrophic damage as they approach, and eventually exceed, human capabilities. Are they good enough?
That’s what host Rob Wiblin tries to hash out in this interview (recorded May 30) with Nick Joseph — one of the original cofounders of Anthropic, its current head of training, and a big fan of Anthropic’s “responsible scaling policy” (or “RSP”). Anthropic is the most safety focused of the AI companies, known for a culture that treats the risks of its work as deadly serious.
Links to learn more, highlights, video, and full transcript.)
As Nick explains, these scaling policies commit companies to dig into what new dangerous things a model can do — after it’s trained, but before it’s in wide use. The companies then promise to put in place safeguards they think are sufficient to tackle those capabilities before availability is extended further. For instance, if a model could significantly help design a deadly bioweapon, then its weights need to be properly secured) so they can’t be stolen by terrorists interested in using it that way.
As capabilities grow further — for example, if testing shows that a model could exfiltrate itself and spread autonomously in the wild — then new measures would need to be put in place to make that impossible, or demonstrate that such a goal can never arise.
Nick points out what he sees as the biggest virtues of the RSP approach, and then Rob pushes him on some of the best objections he’s found to RSPs being up to the task of keeping AI safe and beneficial. The two also discuss whether it's essential to eventually hand over operation of responsible scaling policies to external auditors or regulatory bodies, if those policies are going to be able to hold up against the intense commercial pressures that might end up arrayed against them.
In addition to all of that, Nick and Rob talk about:
And as a reminder, if you want to let us know your reaction to this interview, or send any other feedback, our inbox is always open at [email protected]).
Chapters:
Producer and editor: Keiran HarrisAudio engineering by Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongVideo engineering: Simon MonsourTranscriptions: Katy Moore