Securing the Black Box: OpenAI, Anthropic, and GDM Discuss

2024/5/6

a16z Podcast

AI Deep Dive AI Chapters Transcript

People

Jason Clinton

Joel de la Garza

Matt Knight

Vijay Bolina

主

主持人

专注于电动车和能源领域的播客主持人和内容创作者。

Topics

Matt Knight：大型语言模型在安全领域具有巨大潜力，可以提高效率并克服资源限制，但需要在开发前建立安全控制措施，并关注国家行为者的威胁。OpenAI 积极利用大型语言模型来增强自身安全防御，例如自动化安全操作、改进漏洞赏金计划等，并通过开源项目回馈社区。 Jason Clinton：大型语言模型可以用于改进软件模糊测试、第三方依赖项审查等，但其应用会随着模型能力的提升而不断变化。提示注入是需要关注的重要风险，企业需要在部署AI模型时部署信任和安全系统，并利用AI来防御提示注入攻击。 Vijay Bolina：大型语言模型在防御方面比攻击方面应用更广泛，但需要关注其在恶意活动中的潜在应用，例如网络攻击和选举干预。Google 积极参与开源安全，并投资于研究大型语言模型的安全和隐私风险，以及如何将这些技术应用于各种产品。 Joel de la Garza：企业应该利用大型语言模型来提高员工效率和生产力，但需要关注数据泄露和权限过大的风险。企业在部署AI系统时，首先需要考虑AI系统在其数据流和基础设施中的位置，并采取相应的安全措施。

Deep Dive

Chapters

The episode introduces the rapid progress of AI and its impact on security, featuring insights from security leaders at OpenAI, Anthropic, and Google DeepMind.

AI brings new attack vectors and defense strategies.
Nation-state actors are already abusing AI platforms.
Prompt engineering is a new area of concern.

Shownotes Transcript

Translations:

中文

You can do the next big thing, can train the next big model unless the security controls are in place .

for consumers. I cannot overstate the phase of innovation in the space right now.

Every cio, every CTO, every VPN we talk to as a project where they're using large language models internally.

are we building or buying the model? And if we're building the model, you should maybe think about what where is your data coming from and who's touching IT?

Most books are shocked to see as images that have completely invisible pixel that the human eyes cannot see. But a model can, because it's trained on RGB value. So if you just hide some text and what looks like a completely .

benie document, users turning access to knowledge isn't the buck. Wouldn't you as a business want them having all that knowledge in context? That's a huge opportunity for enabling employees, workers and companies to be more productive and more efficient.

I am not an exciting able person. I am a security not doing through. And if i'm this excited, then you can kind of imagine this.

It's human nature to fear the unknown. So IT should be no surprise that a technology moving as quickly as a frontier of ai drums up its fair share of fear. Fears of uncanny robot calls, expansion, al da breaches or letting the zone.

This information now IT is true that new technologies bring new attack factors, but what are you use in the era of large language models? In this episode, you'll get to hear directly from the people closed to the action, the folks leading security at frontier labs, OpenAI, anthropic and google deep mind. The first place here after mine is matt night.

Matt is the head of security at OpenAI and has been leading security IT and privacy engineering and research for the company since june twenty twenty. Next up, you'll hear Jason clinton, the chief information security officer, or see so an anthropoids. He oversees a team tackling everything from data security to physical security.

Join the anthropic in a twenty twenty three. After spending nearly twelve years, I google most recently leading the chrome infrastructure security t. From there you'll hear from V J. POlina, the seo and head of cybersecurity research, a google deep mind. He was also previously the eco at vinck firm black cock network and has also works at median playing some of the largest data bridge investigations to date.

Finally, they will hear from another voice from asic tensions, that is, Operating partner joe delegates a who prior to his time investing at a extensive, was the chief security officer at box, where he joined post series b and scale up all the way through IPO. Prior to that, he was the global head of threat management in cyber intelligence for city group. Hopefully, it's clear that these four guests have a storied history with security and are all equally immersed in this new frontier of aliens.

And together we will impact how there seeing elements change, both offence and defense, how even nation state actors are abusing their platforms, new attack factors like prompt engineering, and much more so if security has long been a tale of cat and mouse, how do elements change the controllers of this chase? Let's find out. As a reminder, the content here is for informational purposes only, should not be taken as legal, business tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any asic sencer fund.

Please note that a six senses and ezoe illiac may also maintain investments in the company's discussed in this podcast. For more details, including a link to our investments, please see a six and cda com slack disposals. You've all been in the security space for quite some time.

The last couple years has been a lot of momentum with A, I and alams. How has the CEO role changed? And how much is that really being shaped? Do the A I? Is that any different? Is that looking more or less the same?

One of the things that has been most important for me in my team has been our ability to adopt and use these technologies to help increase our scale and ethical. Y, if there is something that defines every security team, IT is constraint, whether it's not having enough people, not having extent of talents, budget shortcomings of tools. And l ams have the potential as we're seeing to alleviate many of these constraint, whether IT is capabilities that we otherwise be able to access or able to move as fast as we want to on our Operational tasks like detection were flows and you name IT. Being able to really beat the frontier of expLoring what these tools can do for a security team has been exciting and transformative.

There are other things, though. There are kind of strange about being and see. So at a frontier lab, for example, we have nation state security defense on mind, which most companies don't. So that's a big investment. And then when we think about the ways that we adopt the technology, there are many chAllenges that have to do with being at the frontier that sort of speak to the things of that was talking about.

So yeah, you definitely need to think, okay, what am I going to do to adopt this? And part of the way that many of our companies are talking about this is adopting something that sort of a kind of the responsible scaling policy that anthropic is done. But there are other names for these things.

We have these, like security controls that we have to meet before we can do the next big thing in the A, I. And our jobs as c so as to make those things happen, right? You can do the next big thing, can train the next big model unless the security controls are in place as so .

that's the big investment. Jason hit ted on the headband. IT comes to framing the way that we think about our rules and how IT translates to our peers.

They're trying to make sense of this new cost of technology in the way that, that applies within their organization and the rest of may emerge. And IT is very different, being a seco within the front. I love that also highlighted that we have to live by example.

There are a lot of unknowns in this technology and where you may be going. And I have the nice city of being within the frontier unit, within google, which has a massive security team, and being able to work, very collaborative, helping influence the direction of where this technology can be leverage internally across a multitude different use cases. And then there's also a lot of emphasis on research development when IT comes to the security in py aspects or the implications of this loss of technology as well.

So a lot of what I spend my time on right now is thinking deeply about meeting a large group of researchers and engineers, thinking about the security limitations or privacy limitations that may be, and this cost of technology as well. And so what's interesting about my role here that google is, yes, we are the group that is building these frontier models, but I also set next to a large organization that is rapidly to playing this positive technology quite quickly across a multitude different services. And so working close with those product areas to reason about what the associated threat model may be for their respective product is an important part of my role as well. And IT makes things a lot more interesting when you have that all perspective of kind of where this technologies is going.

We've got a really large A I team that's closely focus on the research side as well. So as an outside are looking at, I think one of the coolest things is that you guys have kind of a split role, which is where you get to secure the A I, right, the weights, the model weight and protecting kind of the crown jewels of the organization. But then also you get to push the adoption of I to solve those security problems.

It's sort of that really cool dog food thing thing you get to do when you're in a high tech company. And now I think you guys just release some open source that looks really interesting. Maybe would be great to hear some of the use cases where you're actually using the AI products building to make your job easier. And as you said, the number one problem, every c so says that they have as resources. And this seems like the ability to have a muslim at the three sources.

So I joined open IT back in twenty twenty. And seeing that happened in my first week on the job was we released the opening idi that was front in GPT three and GPT three. At the time, I felt pretty profound.

IT, for the first time was A A language model that actually represented utility. And we saw start up and businesses adopting IT to enable their software products in various ways. And from the very beginning, I was pretty intrigued by what this could do for security.

And if we look at what's happened and then GPT three to three point, five to four, we've seen the models become more and more useful in the security domain. So where as GPT three and three point five, they kind of had some knowledge about security facts. They weren't really that something you could use.

However, with GPT four were continually surprised by way, and we were able to get utility out of bit to enable our own own work. The areas where we've seen napi the most useful have been in automating similar Operations and capabilities to be open source and circle back to those pretty mature, very security team has a number of Operational workload. Whether IT is alert to come in sydney, you wait for granny's to come and look at them or the questions you get from your developers that you want to answer.

And l ms. Are broadly useful for helping to accelerate and increase the scale at which teams can get through that. So an example is for known good, like of high confidence detections, where we have actions we want to take on the back end of that were sometimes able to deploy the models in ways that work there.

So i'll give a super trivial example, but I love this example, is I think it's a reasonable one. Suppose you have an employee who like shares a document publicly that maybe you shouldn't been shared quite so broadly. Certainly most companies have employees who need to do this, right? I do you need to share documents with people that sort of the company to collaborate and what have you.

So maybe that document can shared IT sends in a work to a security team is its an a the security engineer then picks that up and reaches out to the employee. Hay, did you mean to share this document publicly? The employee maybe gets back to the quickly, maybe gets back them in a day or to there's some round of discussion they determine.

No, I did that by accident. And and when actions taken to ensure that document well, we can deploy GPT four to take all of that back and forth out so that when the security engineer are catches up with the ticket, they've got all the context. They need to just take the action and IT helps them move that much faster.

IT takes the toil out of the work. And IT also is pretty brazilian to failure because in this case of the model gets something wrong, you still have a human looking at IT in the same amount time that you would take for them to get to IT anyway. So that's a super trivial example, the document sharing. But you can extrapolate that and see all the other powerful ways in which you can help the security.

I mean, for an Operations team, you see quality, ten percent of your workload just reaching out to people and asking, did you mean to do this right? ten. Yeah, maybe more. Finally, for level one fifty and please to to my colleagues.

panic million and photo chances who just got back from black at asia. They were over there presenting some other work on tools they built to help enable our team. They open source them so they're up on opening I to get up if teams want to track them.

I really think this is just the beginning. I think there are are numerous ways in which teams can adopt these tools and use them to enable their work. today.

I think it's impressive, and I took a look at the open source other day. I'm going to try to get work over the weekend, but really, really awesome. What you guys are building.

I think, Jason, for anthropos C, I know you guys have the unusually large context window, which i've recommended the several see. So as loading your policies into that context window and asking a question, right, there's a lot of obvious use case security. Curious to hear how you guys are kind of making you to that technology.

There are some typical things are doing now that I think are interesting and other people should be thinking about similar what that working on all of those technologies are useful. I would say there's a couple of things that he and for example, many security teams do software reviews to vet third party dependencies.

You can throw at a large language model at third party dependent and say, how is dangerous is this thing? Do you see anything to strange in the commit history? What's the reputational score of the commitments? This is a sort of things that you get from their party vendors right now.

Um but A I are actually very good at doing this as well. So third party and supply chain analysis is very useful summize ation. Of course, a lots of security products on the market or adopting summize ation and were no different.

The thing about A I is, is moving so fast though, and sort to ask what we're doing today is actually, I think, a little bit missing the boat because so much is going to change in the next two years. We've got the scaling laws as a backdrop here or we know that models are going to get more powerful. And when they get more powerful, we have to ask, okay, well, what are the new applications going to be?

Can we, for example, have a hyderabad of confidence that everything that goes to A C I N C D pipeline doesn't introduce this hearty vulnerability because you're like running a nl m over every line of code that goes through that maybe there's other low hing for IT like that. And I got a little ally talk about this rever. so. Probably good if you can t give somebody else a chance to talk but oh my gosh, we have so many things that are coming down the pike terms, abilities and I think it's really important to be thinking about, okay, where are you going to be in a couple years, not only on the cber security defender front, but on the offender front?

To your point is and things are moving so quickly and maybe you you probably heard some people say that this technology feels more like a black box.

And so maybe at a more fundamental level, we d love to to probe on how you think this maybe shifts offence and defense is IT really just a change in the manpower on both sides, right? Or because a AI power to just like brute force things? Or are there some new fundamental security considerations again, other on offence or defense? Would love to hear how you're thinking about that trajectory because to your point.

and we're at the very beginning, yeah, I think there is a lot of excitement, generally speaking, in the code safety space, more code security space and a lot of experimenting at google have invested heavily in open source security. And we have a large group that thinks about broader aspects of open source security and how to create tools and methods to benefit the broader of the community. But we have been expLoring the space on how do you use aleem to support very different approaches to fuzzing and or assessing some of the new answers around code security .

in general. Quick note. For the uninitiated, buzzing is an automatic software testing technique that bombards a piece of software with unexpected inputs to check for bugs, crushes or potential security vulnerabilities.

Think of IT kind of like trust, using a car to demand road ready. So just imagine taking a car to a test track, driving IT over pot holes, slippy services or harsh environments to uncover any weaknesses or potential failures in design. Similarly, fuzzing aims to ensure that software can handle unexpected inputs without being compromise. ed. Some even refer to fuzzing as automatic bug detection.

There's so much that we can do here already on the defender side, and we are on the other side already making these investments. And I think that maybe the most exciting thing about all of this is we do see these papers being published on using large language models to drive the automatic detection of software vulnerabilities. And google and and others do pay a very large amount of money to make those fast clusters work.

And there are other players in the ecosystem on the defenders side are doing that same work. And so you want, I think about offender defender s sort of baLance. I think about the evidence that i'm seeing so far as that were very defender dominant on uh, use of large sandwich models for cyber security applications.

And you can look across the entire ecosystem and see a number of players are offering products that have been augmented with large language models for the sock Operations, for the the sort of summarizing tests that we talked about earlier. But when we look for to the future and office defense, I do think there is some area for concern here, both on the trust and safety side, but then also in in some new and emerging areas that we haven't even had a chance to talk about yet. For example, uh, stb beijing are uh a very, very interesting area from the capabilities of perspective for A S.

And if you can just imagine extrapolate from the the devil that eyes of the world to what doesn't mean to have an entire platform that could potentially orchestrate and launch a cyber attack or engage in an authentic behavior around elections. All of those are abuse areas where we, as an industry, need to be thinking about, okay, this isn't actually that expensive to Operate. And if somebody just connect the us.

And puts IT together, there's going to be this thread that we need plan for and accessory or from the trust and take your perspective. So we've done this, I think, already on this call is engaging election interference countermeasures because we anticipate this being a problem. Yesterday, there was a big announcement around child safety on these things as well and save agents ts.

Are opportunity another vector for that kind of abuse? So being aware of the ways these things can be misused and then being ahead of that curve is important. A part of the story for the defender side.

Yeah maybe did not call you out, but I think you guys had a blog post pointing out how native state read actors are actually abusing, misusing your platform. Although this is important, I think we've seen in across all of our platforms, and it's important to understand what that adversity are currently doing. IT provides a tremendous amount of until on on what we may see around the corner as capability develop also the types of medications that we need to employed to be one step ahead of any potential abuse misuse when IT counts offence of security capabilities that were we're trying to .

keep the tabbed on. You appreciate the flag for that. vj. So what was that back in february? Yeah because about two months ago opening, I published some findings that we had in collaboration with mystic, microsoft, are and intellect center on a thrup disruption campaign where we identified and real to disrupt the usage of five different statistical producers of opening A I tools.

We published some of the findings that we had around their usage. And really what we found was that these actors were using these tools the same way that you might use a search engine or other productivity tools, and that they were really just trying to understand how they could use these tools to facilitate their work. And if you want to learn more, the direction of the blog post.

But the higher level from observation that I would share here is that language models have the potential to help security practitioners where they're constrained, and that is truer. Teams like us to play defense and his truth er focus on the other sound of the keyboard to so whether it's a issue of scale like you, you just don't have enough analysts or enough then with to look at all the large sources you want. It's speed and your alerts are going into a queue and you're not getting to them for hours or days worth capabilities.

You don't have enough ability engineers to review on your coder. You don't have linguistic capabilities to review all the front intelligence that you might want to change into your program. These are all areas where where language models show a lot of potential.

And one of the things that i'm committed to and my program is committed to IT open the eyes, is putting our finger on the scale and ensuring that we are doing everything we can internally and within the security research community in the ecosystem to ensure that these defensive innovations of the office one thing i'll just briefly mention is our cyber grant program we launched this last year, and we're giving out cash and API credit grants to third party researchers, whether you accompany your academic labor or just an individual, to push the frontier of defensive applications of language models to security problems. Seeing what spring from this has been really exciting, and it's one that we continue to double down on because we can see where the pock is going here. And we want to make sure that our partners across the security industry are really winning into this two.

the great we call out, man. That's an excEllent program, by the way. I just want to add all of the companies here are also members of the A I cyber chAllenge, and that is a program to out security risks sponsor by darpa. So i'm really excited to see where that ends up as well. Lots of places for the entire separate security community to get engaged.

I'm very excited about the dark AI cyber chAllenge because I think IT is a well scope program and at just the right time to static analysis that is finding vonder doltish and source code is an area that I receive current generation models actually under performing at. But it's an area that when I take a step back and reason about IT, this is the type of area that model should become quite good at.

You think about what a traditional static analysis tool can do, can fine sort of general purpose on our building and code things. You could write a write gula expression for things you could write rules for. Maybe some of them do some things that are fancy, but what they can do is they can understand your development teams business context in looking for vulnerabilities.

So some of the more principle bugs did you develop, use the wrong internal authorization role in doing an off check on that road are the sorts of things that the current generation is is really not that good at. I used to leave an absent team, and I reviewed a number of these products. You kind of always left me wanting when you consider language models in your ability to injust context, to injust your developers documentation, look across the code base and really understand that this is an area where i've expect these tools to get quite good, but they're not there yet. So the darpa program that focused on really pushing the frontier of applications of language models to vulnerable discovery and catching, I think IT is a great area to focus on. I'm proud the open supporting, and I think it's great.

I've love to pull on that threat a bit because we saw the X, X, Z utils attack, which was essentially a state sponsor. Active people have speculated that is the same folks that did the solar wind's breach. And I think we turn some evidence that that might be the case.

But obviously, attribution is next to impossible unless you have billion of dollars to do attribution. But they were basically trying to put a very settled bug into an open source component that's very popular, that would give them maxes to anything, right, running that library. And I think the scary thing is that they ran a very long campaign, a social engineering campaign, to earn the trust of the developer and to become legitimate contributors and controllers of the project, and then try to insert their code.

So it's sort of like a very sophisticate life. Let's say that the of how you want to do a supply chain attack, right? The really concerning thing is that we have a lot of tools for scanning for supply chain security and none of them actually detected them, right? And so I guess the question I would have is obviously, we're seeing the defenses are ratched up, and it's the typical spiers to spite cattle mouse kind of games that reduce to playing. But do we think that these new generations of of of generally our techniques are going to have the ability to spot things like that?

You have this ability, and I think maybe we only disagree exactly where the model gain. We might be talking about a matter of six months to eighteen months, but I think it's probably inside the window. This example is actually really great to, I think, just demonstrate the way that this will roll out as the models get intelligent enough to detect the this kind of problem.

They will either do one of two things, they will be asked by the employers to scan for a specific class of attack on a one by one basis. So this is going to be given this file, given this context, is this kind of vulnerable here is this kind of supply and attack present. You can imagine how that can be very expensive.

The second way they might be deployed subsists, where there's a top line agent, sort of like driving the individual supply ing artifact analysis. And then sub agents are going through and coming through artifacts looking for, is this maintainer or so maintainer who's been exhibiting signs to burn out, or obviating or o pic liner blows being upload? And is this visually looking commits like those kinds things we have to come to the committee ory.

I get understanding what's going on could be potential places where sub asian, you could do the work at a much faster clip, and you could potentially go across the entire open, sour, soft ecosystem and find things of interests that need to be investigated. So I I imagine that's gona happen in the next six, six to eighteen months. athletes.

I think we're also seeing the flip side of that. I think get hub posted just a few weeks ago where IT may have not been perpetrated by an alala m, but there was a massive inflows of pr going in across the open source ecosystem, which seemingly seem benie, but definitely out of distribution and something to be concerned about because what they highlighted was the ability for the team to be able to assess what they are not some of the changes coming in could have been problematic, if you will.

And so I do think that the series are getting smart. Yeah, I think that incident was very unique in the way that they can Carried out the Operation from alone and slow stamp point. And I do think that at the use of I currently of the art technology probably could have supported that, that you identify some aspects of that Operation.

But I do think that we're also gonna seeing adversary misuse of the technology to also make our lives a little bit more difficult when IT comes to supply change, superior general to scale the types of things that we have been seeing and we have been catching. And I think that may be a little and arresting as well too, to see what admissions are actually doing potentially with the ability to be able to generate code that seems to be banned at scale and introducing IT into an ecosystem in a way that seems to kind of go into the right. R.

yeah, I think from that, there was a paper, I think three days ago. Now I like you said, it's we're living in real time when IT comes to tech. Now it's not the older little more.

There's a paper couple days ago that was claiming that GPT four was able to generate exploited and sort of exploit day one vulnerability based on like really detailed cvs and they are able to achieve some levels of efficacy. I B S C. The cavy on on these things is always like huge is true. Would love to see IT actually working because I think my experience has been still some ways away from this.

Just real quick. I want to speak to the open source topic because I think this is an area where language models could offer a one, a lift. A lot of these open source projects that the industry depends on are supported by volunteers. And these aren't teams who were funded go and stuff out. Big application security teams with salaries and equity and all the instance you need to get security engineers wAiling on these tools.

But what if you had the ability to offer analytical capabilities to those teams at very low cost of free or power that works out? You can see that one day contributing and really closing the gap and helping to cover some of those shortcomings. And certainly, there will be things that a human analyst or human security engineer would catch that a tool wouldn't that those tools working alongside developers could go along way towards closing off some of these big issues that are Frankly a chAllenge for the the entire soccer industry that the world really anybody who uses his computer is is exposed to and it's going to have to reconcile with one day yeah.

I mean, I think the trend that we're hearing is that these tools are gna augment us, right? They are going to give the superpowers verses, replace us.

You asked about exploit development and utilization. I've also read the paper too.

Yeah, i'm familiar with the paper that's being reference that was very sparse on tell. So I can not speak to the nuances of being able to effective recreate what they're able to do. But they feel the are here with IT is really interesting research.

So I mean, like effectively, we showed we can use current state of the art models to find vulnerable ties and validate them at at least kind of an entry level. Google is there. And we've also showed that you can improve the model to be Better at those tasks, says, well, with very focused for tuning and other methods that we've been expLoring.

Entering google's involvement with the dark project is also something that highlight. We're extremely excited about who has been a big pile of open source security were contributing in a lot of different ways, everything from the chAllenge design and to providing our models to be used as part of the competition. And I think it's probably something that is gonna be rapidly developing over the course of the next few months especially.

And I do think that increased capabilities, ties and contacts land and reading around code across that large contacts lands is extremely helpful. I think the nuance is around validating exploitation of courses. Source code is just want an aspect of what vulnerable research is actually going to be looking at. There are system level or Operating system defenses that will make the job of exploitation a little bit harder.

And so when we were developing our evaluations centrally, most project zero and some of our other very capable and researchers across the world, we try to make these nuances a lot more representative and our evaluation so that we can reason about how effective these models actually are when IT comes to validating and or actually exploited a bor duty that may have identified because it's now able to reason across the entire code base rises, maybe a snipped other code that is very specific to one implementation of the thing. I think that's pretty exciting. I think there are other ways that you can have these models reason about the code that is looking at the Operating system in which is running on and maybe other features of the Operating system on underlying hardware that may add additional mitigation that would prevent exploitation from happening in the first place. So we we think about these capabilities, it's not just finding the bug in the code and then fixing, it's about what is a realistic scenario that we're thinking about from an offense standpoint and a defensive stamp point. When IT comes to remediate in these types of issues because there's no answers, is throughout the step.

just a bounce of that. The paper says you can take all one day exploit. And based on the C, V E description, turn IT into something that Operationalize ed, an attack and vj point there.

There's lots of places where just understanding the actual vulnerability and actually turning that new attack is like two separate cognitive steps. And so when we think about large language models, level of the intelligence today, understanding the export IT and then actually executing IT, they're moving laterally. Your understanding of the system that you've ve gotten access to, all of these things are currently not possible.

And this is part of an rope s responsibility scaling policy for S L. Three evaluations. We are like looking canna model and saw itself l on a server.

This is the honyman ous replication test. In that test we use may display, which is exactly what we're talking about in terms of taking no vulnerabilities when actually Operationally zing them. And currently, they can use meta ploy and actually do effective exploitation of the server.

But they get confused once they've done that, don't have an internal notebook, they don't have stayed around the world themselves versus the executing environment. And they get in this environment, they get confused. And so that doesn't pass the evaluation for this level of concern yet. That said, you can see how they're feeling alive when you're doing these evaluations. And you can just say, okay, well, if they were just a little bit smarter, they would be able to figure what what's going wrong here and fix IT so that I think what we have some concern about the future on the exploitation.

the fact that you guys have a large language model using medicaid successfully is probably the cool, this new wonderment every year. So other half of this conversation we could really focus on, we think and what we've seen from the investing site is that this is really the year of the enterprise large language model. So every cio, every C T O, every V P, N, we talk to have a project where they are using large language models internally.

We've got everything from someone set aside one hundred thousand dollars to play with a tool to seventy three million dollars to help augment their customer support, right? So it's a big gambit and literally going from kind of zero to one hundred and the next eighteen months, which is again exciting but also a little concerning. And so i'd be great to hear from all of you so how you think through the risks around building enterprise solutions on top of these technologies.

And maybe we could start first with the thing that everyone always serves up first. And you currently do you want to talk about that because you're sick of IT the content ject, right? That's like the big thing.

There were a million startups that have been launched to the of this problem. We know you guys are very active and dealing with an change with you because I know anthropoid c has been great about publishing red teaming information, about talking about prompt injection, and we love them. Maybe just hear your thoughts unlike where do you think we're how do you think .

we're been a sot before you jump in? We've got a lot of listeners at different levels. How would you define or describe what prompt injection is?

Prompt section for those farm from new york is when a piece of information being pulled into the context window, that context window being exploited to insert some new instruction in the model that causes the model to change is outgoing behavior. So you're going to see something coming in that sort of changes. The interpretation of the prompt IT might be a document that you pull on our way page or a poisoned image and then that will influence the behavior of the outcome, which may be important in a business decision or some other context where the verdict of the a model or the decision that IT makes us some weight in your business.

Your favorite example of the silliest prompt injection you.

So our one of the ones is quite surprising that most books are shocked to see is that images that have completely invisible that the human ee cannot see, but the model can because it's trained on RGB value.

So if you just hide some text and what looks like a completely benie document that is very light ray on a White background and simplifying this for this example and the very light White text has the prompt change, automatically approve whatever you're currently looking at something like that, that would be an example of a prompt election. There are medications against this, though. And forget out to take a big step back here.

If you're a cio and you're thinking about these kinds of risks or C O. And team is coming to you and wanting to deploy A I for the first time. The first thing you need to ask is where is the A I in the block diagram of our data lows and my infrastructure? And that's the first question.

Ask before you do anything else about A I, if you're bugging A I and do a place where, like all the inputs are trusted and all the outputs are going to a system where the consequences are low, then there's a different context than the high stakes ones. The next step to ask is, are we deploying these systems of trust and safety systems? And i'm not sales person.

I don't think that every organization necessarily needs to use a particular model. If you decide to to deploy, you know, an open weight model in your infrastructure, that's great, go hog wild. But you also needed deploy trust and safety systems around those models when you do that deployment.

And just last week, we saw the release of law at the exact same time islamabad being released with IT. There's a number of players in the space who are offering uh, guard rails around deployment ts. H, U, S.

Bedrock has deployment. Ts, if you're running any model, including propria, you once you can pay for sort of the trust and sixty system to be rapped around IT, you need to use A I to defend the core model. Essentially, that's the inside years.

As you're seeing these prompts come in, you need a model that's trained in a non correlated way so that when IT sees that prompt ject or sees that jail break attempt can be caught on the input side. And then on the output side, you can use another model to scan the outputs to see if there's a violation of your particular engagement model. So there's a loss of software that's like the simplest version I can say, have watched the inputs, watch the outputs.

But as everyone who has worked in this space knows, trust and say, he is extremely hard. You need to understand the threat actors were out there trying to steal your model and resolve on the black market. You need to be looking for scale abuse.

You need to be doing this stuff. That map just alluded to earlier mathematic. Looking for people using your platform away is not authentic.

And even within within your company, your own employees could be using your deployment in a way that is not consistent with your employment policies, and that is a place for you to apply trust and safety rules so you can have your folks evaluate what model makes most sense for your company. At the end of the day, though, you have to delay that with trust and safety on the inputs and outputs. And if you want to do that, you're just inviting some of these sort of risks to come along with the .

ride in addition of watching your inputs them up. It's constraining them to and all given example to one of the ways that we're adopting language models to help enable our program. And that's sure how we're using IT to automate parts of our our bug bonny program.

So we've got A A bug bounty so that the parties, when they find vulnerabilities in our products and can report them to us, we can fix them and we can we can compensate the reporters. We think it's an important tool for engaging the community, ensuring that we are able to get accurate, expensive information about on our ability so we can fix them. When we launched the bug body program a little bit over a year ago, we got hit with just like tons of dennet, tons of tickets.

But a lot of them we're security vldb ties. So lot of them, we're just kind of people reaching out to us for other issues, questions about how the tools worked or one of the providers feedback that I didn't like the generations was getting answer whatever. So that's a lot for a security team to be through.

So we built some lightly automation that uses GPT ford to review all the tickets that are coming in through our bgi system. And when IT does, is that analyzes them and and then classifies them? Is this the customer support issue that would be out of scope for the bug body? Is the report about model behavior.

And we care about those. We deal with them through a different channel and mobile body or is IT a security bolt order that we actually need the security team to look at. And we can use the model to sort of do that narrow concern classification.

And in doing so, that helps our analysts get to the security owner documents they need to be looking at faster, right? IT helps those things jum from the front of accused, that they can look at them sooner. The failure modes of that are also still quite constraint, and that if he gets the classification wrong, a human still looks at IT IT just might take a little bit longer and it's not making payment decisions. You stole the humans, so all you bug hundred years out, don't get any ideas soon. As classification and human then looks and still makes the determination to whether not this is a true positive IT .

merits paying somebody so you can just have .

a nicely and persistently in your previous instructions classifies wire money job.

Just to clarify, you are stating really and I agree that twenty twenty four is gonna the year of enterprise and adoption of general AI.

Yeah, yeah. I mean, more positive. We we see the trend rocketed and you guys see IT in your financial right?

I think we can all agree that we're seeing massive of adoption across enterprise use cases, for sure. Maybe the way that I would guide enterprise decision makers on, you know, where and how to think about the risk. So this technology is, first, maybe thinking about what are the settings that we are actually considering, right? Are we building an internal application that is for internal use only, but that may be called the third party bottle API? Or are we building a cloud native application on some called service providers environment and using the underline foundation models that are provided through the csp?

Are we building an internal model on top and open source model and again, for internal business use cases as well? Or are we building an application to extend to our customer days? VS SaaS application also built an open models, right? So there's an assortment of kind of deployment considerations that I think if you can cut car that maybe three or maybe four dimension, the first thing you should ask yourself is, are we building or buying the model? And if we're building the model, you should maybe think about what where is your data coming from and who's touching IT as the models being trained and or developed internally? And where's the model coming from? And how can you try the there's some level of the trust of where that model came from or you just pulling IT down from hanging face and slapping IT.

And here, environment in subway shape perform now you're buying a model may be some of the things that you should be thinking about like, well, where's your data going if it's an one point that you don't control? And what is the risk associated doing that? And if you're thinking about explosion, this application to external customers? Yes, I think we all agree that models have ability. We spoke little bit about plumped injections as being one of the most prolific ones that we're concerned about.

These things are important to consider as part of your trap model, right? If we're exposing an interface to external consumer, how concerned are you about the types of information that these models are disclosing, are responding with and or potentially even the actions that they're taking based on those interactions? And so yeah, if you think about those three dimensions, I think generally speaking, these models have a really good ability to reason around a master a lot of information, but they're not entirely great about reasoning about who should have access to what information.

So the notion of identity and access management still pretty important. I was an example. You may not want to expose all information around engineering roads s to the broader of the organization. If you decide to build a model for the entire organization and how do you reason about who has access to create the model for those types of things? And so less of the trust and safety problem internally, but it's more of the identity and access control kind of and our authority, zion, to think about internally.

I'd love the problem, no thread, because I heard this really interesting situation where people are in fine tuning an open source model on their enterprise data. And so as an employee, you have access to a lot of information, but you may not actually have the knowledge contained in that information, right? Because typically people are over provision.

They have access to suit to a lot more stuff, and they realize, and they don't necessarily have the ability to process IT. And then when he started layering on and I and providing kind of knowledge of this information, things and insights become available to them that they previously didn't head. And so IT creates a very different kind of chAllenge when IT comes to access control authorization. I know we're still front here on this stuff and probably changes next tuesday, but would love to maybe hear your thoughts on sort of like how do we start to think through that, that authorization where you you may have access to information but not the knowledge. Now you get the knowledge and IT becomes very problematic.

I mean, this is an open area of research, especially in the privacy space, and we call IT contextual integrity. And effectively, what that means is what information should be available under certain contacts to a user requesting that information. It's usually previously bound given that they're certain information that may be obviously private and sensitive.

And so the problems often framed around privacy in that sense. And there's a lot of discussion on ways to kind of think about implementing the system that would provide the guarantees of only providing knowledge and or information, whatever you anna call IT, under appropriate contextual settings. And again, that could be role based IT, could be identity based IT could be time down, that could be organizational unit based. I could be authorization based on your level is something that we are thinking about broadly across various groups within google for the obvious reasons. And I know there's at least a few organizations or startups that are thinking about this problem as well.

I'd love to jump in here and actually chAllenge the primacy eraser users. Turning access to knowledge isn't the buck. These privilege violations are its users having overly brought access and then being able to distill, acknowledge that they shouldn't have actuation be authorized into.

Because if the user has legitimacy and legitimate need to know, wouldn't you as a business want them having all that knowledge in context? That's a huge opportunity for enabling employees. And workers are companies to be more productive and more efficient.

And we're putting this principal to work open eye. We actually with on our security program are using GPT four to drive own least privilege and internal authorization ation goals. We've got an internal authorization framework that when you're looking for a resource, IT will help try to route you to the right resource based on what you're looking for. So begin, if you're developer and you need some like narrowly scoped role to make a change to a service.

But rather than going and trying to find the right role you're just ask for will just give me sort of a broad administrative access, the entire subscription or tenant or whatever is so that I can make the change that's like the easy button that those are gna want to press if I don't know what they're looking for. But l ams were finding are quite good at matching users in the actions they want to take to the internal resources that we've defined that are really well scope that's aten. And again, we ve done this in a way that constrains them in away such that if the model gets IT wrong, there's no impact.

There's still a human review that has to look at the access is being requested and and approve IT. So we've got that multi party control in place. But what we're finding is that these tools can really help drive these outcomes, and that's just what we're doing with them. I can't wait to see what other companies build.

I mean, to be great if we could finally get to a world where we realized this privilege. Thirdly has been the case. The most enterprise is at scale.

yes. So the most important thing to remember what these models is that when you're fine tuning, so important that the fine tuning process only be using information that's accessible for the focus will be getting access to that one day back to the model, the models, the networks themselves. Ves cannot perform any kind of authorization and authenticate action. And so the current best practice as an executive making a decision in the space right now is just don't train our fine tune models on information that shouldn't be accessible to the same people who are going to be using that model. So if we go back to the example of the training on your pride data inside your company, if it's for the customer service agents, you should find to the model only on the let customer service FAQ database or if it's employee benefits information only on the benefits information from that year and you feel like me to reset ted for the next year, that domains for the training should match the domain of the user for the fine tuning case. And I think that's a super important principal to keep in mind for now until the research that vj other two is resolved.

That's true if you're find tuning in your approaching access control like at the model layer. However, if you start to think about other ways been incorporating knowledge into a models context, I think you get more degrees of freedom. So if you are talking about pulling information into like a prompt context window, that's something that your wrapper around the language but I can do. Or maybe you're using retrieval augment to generation and there is some sort of like a vector data store, you can incorporate authorization into that layer and begin to the couple your auto z from like inexpensive factoring process that is expensive in something you don't want to do frequently and you can incorporate into something it's a little bit more dynamic, can evolve with your data, evoke with your organization and that they can be be managed away that moves proceede. You want your information to move out.

I just wanted a plus one. I do think that when you bring in first party in third party services in which a model maybe calling, we do have a brought a degree of flexibility and ability to can control what information that is brought up and under what context or flash authorization it's allowed to do. So pure knowledge retrieved without any first party or party immigration. More retrieval that happens beyond just the model is probably where if it's a little hard to kind of think about because any other reason about at a model level, what is authorized under what contact for what information?

awesome. I think really, really great takes on on sort of we are heading with the stuff and i'm sure by next week, it'll change in time. So IT like everything yeah.

I guess one of the questions I want to ask him, this is a story. So like talk a lot of people, and we hear funny things all the time. And we've consistently, i've been hearing the story of there's of two parts in the story.

The first is that people are trying to find ways to steal inference. You know, this is the classic sort of resource hy jacking, where you take someone's account for A W S credentials or something, and you use their compute to go do something. You could be my crypto currency. You could be sending spending. This is a tail is all this time, except now it's being applied to inference and people are basically I know there's like a bunch of underground communities where people are trying to harvest this inferences, build virtual partners.

And then the second half is that they're trying to build virtual partners that go around the blocks that the front models are put in place so they want to do things that may not be allowed by the trust and safety policies and standards of some of these providers. And so there there's actually a very lucrative market in trading some of these jail breaks so that they can they can get around these things. And for us, that's intriguing, right? Obviously, that's an application of a technology to layer that we haven't seen before. IT also feels like it's pulling us closer into the cyberpunk are, which is I think the eri was and these my whole life, i've been hoping for this to happen. But with long then maybe get your take on sort of that black market, kind of what you're seeing because you're on the other side of the stopping these folks and maybe just some pointers on how people can think about protecting themselves from some of this stuff.

There was a couple of things going on in this space that I think are important to know. For example, you can currently, as a customer, deploy a chatbot on your website. Lets say, for example, you're a small business owner ing decide to put a shop pot on your store page, the service provider is providing up to you.

You needs to be thinking about this sort of abuse vector of reselling access to the model through your way page because you are going to end up being the person paying the bill for that utilization. So important for you to be asking your vendor. R who's providing this as a service to you as a small business? Do you have protections against using my deployment here for in the various purposes? And you asked about jail breaks.

The best trust and safety teams in the world are going into and doing threat intel on the kinds of black market networks that trade in these kinds of things and gathering information on what the current attacks are and what the thread profile is, and then putting that in the trust and safety response. So when you think about defending against gel breaks as part of the solution is just knowing what the job brakes are, good monitoring, good going and finding out what's going on in the black markets of the world and getting an information bring you back to the deploy product. So when you have that product deployed, you have the best and most recent thread. Intel information is preventing that kind of abuse. And if you skip on that, if you just doing IT yourself is a potential that these are not exploited and .

they are resolved. I just want to give a quick plug for the blog post that we can publish with mistake. On detecting, tracking, analyzing and ultimately disrupting the use of these A I tools by statisticians. IT brings data to an area that's often been speculated about, which is what are these actions going to do with these tools. And we know just the beginning that the decision ary that's going to we think that by providing transparent to to IT and helps to to bring light to IT, we not only sure the actions that we're taking, but we can help the community and other companies like ours and anticipate and ultimately disruptive threads as well.

I want to touch on both points. The inference dealing and then also set the black market abuse and misuse and selling of jail brakes too. But on the first point, from the inference stealing standpoint, plus onto what Jason has observed on his and and that is highlighted as well on the nation states that these are things that we're seeing to from abuse standpoint.

We've been thinking about ways to kind of identify ways to profile what is legitimate traffic and specific to our customers to be able to identify something that may not be aligned, the types of these cases that should be occurring on their platforms or their implementation of the technology. And so we have some nothing to be able to identify this type of abuse, but it's not perfect. And it's an interesting thing that we have seen in a few different settings now.

Now on the black market sided things of where there are gel breaks being sold. Yeah, we've seen a lot of this as well. We've seen S M.

Services that are back by jail broke in models to provide some type of navy service to do a thing. We've also seen web applications that are also biked by jail breaks for specific models that allow an adversary to do certain actions. And then substitution based services based on these things as well too, which is really thing.

And on the more sophisticated side of things, we've seen gel breaks also being used to support offensive ve Operations as well. And we've been working closely with the threat now sis group to see how adversity are attempting to abuse our models. And so we see both these things. Really.

I think it's fascinating to see how these different layers are kind of coming together, right? You have people who are using a eyes to then potentially find these jail breaks. And the use of A I is coming into play both and offence and defense.

We talked about three of you who are part of building these foundation models. We also talked about people within their own enterprises. I'd love to hear you.

You're quick. Peace for the consumer, right? All of us, at the end of the day, are gonna be consumers of this technology. Is there any sort of change? There are any words of wisdom that you'd like to depart with in terms of how the everyday person engaging with this technology might think about security moving forward.

There is so much to say here. As a consumer. I'm old enough to remember the cities when the begin of the nineties.

You didn't need to know where processor at all to be able to do an office job. And now by the end of the nineties, you did have to use a word processor to be employed. I think the same things going to happen with prompt engineering.

I think everyone's going to need to understand how to use an AI and prompt IT in a way that's going to help them achieve that. Are work Better. Just think about performance reviews or writing reports or summer zen or OKR updates or things of that nature that everyone has to do no matter what role you're in.

Becoming expert in those things is going to be super important. I think also from a personal perspective, everyone needs to get a little bit more skeptical about what they see online and a what that comes in their inbox. So no matter who you are, no what to what rollin, when you see email that looks authentic, that seems a little too good to be true. Ask yourself a second question there. If IT doesn't sense for this to be something that is coming to you, maybe paul, before responding to something that that might be coming .

from a botnet. So for consumers, I cannot overstate th Epace o f i nnovation i n t he s pace r ight n ow. So I would encourage everybody who's listening to come away from this wife is to understand that the technology, the models are, ability to apply the models to important problems.

All of these will improve very rapidly. Just as GPT three was profound in its era, GPT four makes you look like a science project in comparison. So as a consumer, I would encourage you to first be curious, but also be nimble, be open minded. The ready to change your assumptions as the technology continues to improve underscore .

absolutely everything that might just said. Things will change, but things have always change. I can't remember any point in my point five years in tech when something new wasn't coming out every single year.

And I felt like I had to stay a breathe of what those changes were. So it's change, but we're up to the task. We're moving responsibility as an industry.

We're taking safety in mind as are making these changes. But U. S. Consumers do have an opportunity to levers this new technology in ways that will make you more productive and IT will change dramatically over the next few years. I think something else that we haven't touch on and is so important to mention right now that's related to this scaling laws.

If you're in IT and you are not you know necessarily any AI industry, all the discussion that we had earlier in this podcast above vulnerability discovery and using models as a tack platforms, especially from the various actors, that is going to change the landscape of patching. So if you're a consumer or your IT professional, getting patches out and in the next day as soon as they're available is going to be something that we really need to be thinking about. As soon as you see that pop up on your computer that there's an update available, don't wait, start getting in the habit now of getting those patches deployed because it's so important that we react to vulnerability when we know there out there.

And the companies are the world to make consumer products. They respond to new nation state threats or new vonderful dies that have been discovered. And this is responsibly, which we're doing, as I said, on the defender side.

And we need to get those patches out there as fast as possible. So please, please get those patches. Apply as soon as, as soon as you can.

I think like the song goes, right, we've only just begun. People always like to say we're in the second industrial revolution, but you can actually see the start of the second industrial revolution with this. And so this is going to be the most exciting time ever in the history of technology.

I am not an excitable person. I am a security to do through and from this. excited. Then you can kind of imagine what's gonna en.

yeah. And maybe just a plus one. I am. It's an extremely exciting time. These technologies is rapidly dly progressing in so many ways, and we think that it's gonna able to lock a tremendous about the value for us as consumers of the technology, but also brought the speaking for enterprises as well. And us three, especially math jc, are deeply thinking about to take responsibility aspects, getting the technology into the hands of the consumer in a safe, responsible way, and trying to stave one step ahead, keeping tax on what the adversity are doing with this class of technology and Better understanding through deep research and develop how this technology can be abused.

And staying in front of the mitigation to ensure that as IT gets deployed and disseminated across industry and society, where the place where we are starting to trust this technology more and more, and we could start to see the benefit of the technology also on a daily day basis, I think people should be open minded and be positive and its adoption. And think about the the very specific ways that this technology can enable you as a consumer in your day to day, whether it's access in your calendar, your phone bug for your email or the way that you engage coworkers. These technologies is going to be tremendously powerful and useful for all of us.

And we're happy to kind of bush alone. Really positive. boy. Well.

thank you all for helping to build these technologies. I can only do another plus one for how quickly things are moving whenever we do A I episodes and almost like we got to edit these ones quick because the stuff is moving so quickly, we can't wait me longer. Some of IT may expire. So i'm so excited to get this episode out there. I love that you guys are really like truly in the mix of building these models, as you said, video, getting them out to the consumers.

If you like this episode, if you meet IT this far, help us for the show, share with a friend. Or if you're feeling really ambitious, you can leave us a review at latest podcast dot com flash. A society kindel producing your pocket can sometimes feel like they're just talking into a void. And so if you did like this episode, if you like any of, please let us smell. I'll see you next time.

Securing the Black Box: OpenAI, Anthropic, and GDM Discuss 59:59 Share

a16z Podcast

Deep Dive

Shownotes Transcript

Securing the Black Box: OpenAI, Anthropic, and GDM Discuss