cover of episode The path towards trustworthy AI

The path towards trustworthy AI

2024/10/29
logo of podcast Practical AI: Machine Learning, Data Science, LLM

Practical AI: Machine Learning, Data Science, LLM

AI Deep Dive AI Chapters Transcript
People
E
Elham Tabassi
Topics
Elham Tabassi: NIST 的使命是促进美国的创新和工业竞争力,通过发展测量科学和标准来增强人们对技术的信任,目前正将此应用于人工智能领域。NIST 通过与利益相关者进行开放透明的合作,开发工具、指南、框架、指标和标准来支持产业和技术发展。为了理解AI系统的信任构成,NIST与开发技术和研究技术影响的社区(包括经济学家、社会学家、心理学家和认知科学家)进行合作。NIST 的 AI 风险管理框架 (AI RMF) 是一个自愿框架,用于以灵活、结构化和可衡量的方式管理 AI 风险,并通过与 AI 社区的合作开发。AI RMF 将 AI 系统的信任定义为有效可靠、可问责透明、安全可靠、可解释和可理解、保护隐私并避免有害偏见的系统。NIST 试图通过示例(例如医疗保健中的 AI 系统)来解释 AI 中信任的概念,强调有效性、可靠性、安全性和隐私等关键因素。白宫关于安全、可靠和可信赖的人工智能开发和使用的行政命令加强了 NIST 在培养人们对人工智能的信任方面的努力。白宫的行政命令促使 NIST 加快了开发评估、安全和网络安全指南、促进共识标准以及为 AI 系统评估提供测试环境的工作。NIST 通过征求意见和公开评论来交付行政命令中规定的内容,并发布了关于生成式 AI 的 AI RMF 配置文件等文件。NIST 的生成式 AI 配置文件描述了生成式 AI 技术特有的或加剧的风险,包括信息操纵、有害内容、数据隐私风险和环境影响等。NIST 通过识别、测量和管理 AI 风险来促进 AI 系统的信任,这包括确定信任特征、测量方法和风险缓解策略。AI RMF 的建议分为治理、制图、衡量和管理四个功能,为风险管理提供结构化方法。组织可以通过阅读 AI RMF 和使用其配套工具包来实施 AI 风险管理,从治理和制图功能开始,并根据具体情况逐步实施其他建议。组织不必完全实施 AI RMF 中的所有建议,可以从阅读 AI RMF 和配套工具包开始,选择少量建议进行实施,并根据具体情况逐步实施。AI RMF 工具包为每个子类别提供了建议的行动、信息性文件和透明度建议。组织应根据自身资源和专业知识,优先考虑 AI RMF 建议的实施,并持续监控和管理风险。AI RMF 提供了特定行业或技术领域的配置文件,以帮助组织根据具体情况实施风险管理。已经有一些实体正在开发用于实施 AI RMF 的工具,NIST 正在与社区合作,关注工具的运营和实施。NIST 关注人工智能的未来发展,包括改进人工智能评估、建立更强的科学基础以及开发清晰易懂的技术健全的标准。NIST 希望看到人工智能技术被用作科学发现的工具,以促进精准医疗、个性化教育和气候变化研究等领域的进步。需要更好地理解人工智能模型的工作原理,并开发出可靠的评估方法来确保系统的可靠性和有效性。需要开发出清晰易懂的技术健全的标准,以促进人工智能评估、保证和治理的全球互操作性。 Chris Benson: 作为 AI 从业者,对 NIST 提供的指导表示赞赏,并强调了其在政府、行业和公众之间的作用。

Deep Dive

Chapters
Elham Tabassi introduces NIST's mission and its role in advancing AI technology through multi-stakeholder collaborations, emphasizing the importance of trust and risk management in AI systems.
  • NIST is a non-regulatory agency under the Department of Commerce focused on advancing U.S. innovation and industrial competitiveness.
  • The agency cultivates trust in technology by advancing measurement science and standards.
  • NIST's AI Risk Management Framework (AI RMF) is developed through open, transparent collaborations with diverse experts.

Shownotes Transcript

Translations:
中文

Welcome to practical AI. If you work in artificial intelligence, aspire to or are are curious how A I related tech is changing the world. This is the show for you. Thank you to our partners at fly I O fly transformer containers in demo vim that run on their hardware in thirty plus regions on six confidence so you can launch your APP near your user. We're more at why that I.

Gay friends, i'm here with the new friend of hours over at time scale atha. So then so afthernoon understand what exactly is time scale.

So time scales. A post gress company. We build tools in the cloud and in the open source eco system that allow developers to do more with progress. So using IT for things like time series and and more recently, air applications like crag in such an agents.

Okay, if our listeners we're trying to get start with post grass time scale A I application velocity ment, what would you tell them what's a good .

robot if you are develop out there? You're either getting to sport building in AI application or you're interested and you're seeing all the innovation going on in the space and want to get involved yourself. And the good news is that any developed today, they can become an A I engineer using tools that they already know and love.

And so the work that we've been doing a time scale with the P G, A, I project is allowing developers to build A I applications with the tools and with the database that they already, and that being congress. What this means is that you can actually level up your career. You can bold, new, interesting projects.

You can add more skills without learning a whole new set of technologies. And the best part is it's all open south. A I N P, G vector scale, our open source, you can go and spend IT up on your local machine by a docker, follow one of the tutorials of the time slog cutting age applications like rag without having to learn ten different new technologies.

And just using post gress in the SQL cray language that you will probably already know and familiar with. Uh so yeah that's IT gets started today. H it's A P G A I project and just go to any of the time scale, get hub of uh either the P G I one or the P G vector ale one, and follow one of the tutorials to get started with becoming an engineer just using progress.

Okay, just use post grass, and just use post grass to get started with A I development build rag, search A I agents in its open source, go to time skill dot com flash ai play with P G A I play with P G vector scale, all locally on the desktop is open source once again, time kill 点 com flash ai。

Welcome to another episode of the practical AI podcast um I am Christenson. I am a principal A I research engineer at lucky y Martin and unfortunately my cohoes dangle is not with us today but IT is my pleasure to introduce a hum to bossi who is the chief A I adviser at nest, which is the national institute of standards and technology.

Welcome to the show at him. Thanks for having me.

You guys are doing so much uh in this area in terms of of A I and um kind of setting the stage. And I was wondering for those of us in the audience who may not be familiar with this, if you could kind of start out with just telling us a little bit about nist, uh, what you do both uh in A I and maybe outside to give a little context and and give us a little intro into what ness is doing an ai and your role in that yeah happy too.

A nest or national instep standards and technology is a non regulatory agency under the department of commerce for establishing nineteen or two, and our mission has not changed since then. This mission is to advances U. S.

Innovation and industrial competitiveness. At least we have a very broad portfolio of research, from building the most accurate atomic clock to modeling the behavior of the wild fire. But most importantly, we have a long tradition of cultivating trust in technology. We do that by advancing measurement science and standards, museums science and standards that makes technology more reliable, secure, private for, in other words, more trusty. And that's exactly what we are doing in the space of A I.

As I mentioned, this was established in nineteen thousand, one to fix the standards of weights and measures, our party source created and advance the standards of measures, basic thing, such as length of mass, temperature, I don't know, time like electricity, all those that were essential for technological innovation and competitiveness. At the turn of the twenty eighth century, we are following the same course, working with and engaging the whole community in figuring out proper Sanders and measurement science for advanced technologies of our time, which is artificial intelligence. And and the way we do IT is exactly, or maybe an improve version of what we have been doing in the past century yourself, at least day to day work is focused on helping industry develop valid scientifically rigor methods.

And one thing that they want to emphasis is that we do this through multi stakeholder, open, transparent collaborations. Uh, while we have a lot of really good experts and expertise, at least we also know that we don't have all of the answer and it's really important and Whiter for us to Foster a uh, consensus and buying across our style, the community. So what we do is that we listen and we engage, we get the input, we to sell them down, we develop apart for measurement to build up, or both are the scientific underpinning.

And then we develop tools, guidelines, frameworks, metrics, standards, set a to support industry and technology. And we have done that for development of the A I refinishing framework. We have done that for the computing for cybersecurity. And we are continuing uh doing that for uh improving methods and measures for risk management and trusted in a sovereign ms.

That's very as a great introduction. Um i'm curious you know you talked about a couple of things about I uh collaboration. Um you seem to be right at the center, uh kind of you know sort of an interface between uh, government interest in these technologies and the and the issues around them and industry. And I know you work with the number of different organizations as you know that this does in these different things that you have talked about and you you specifically called out trust. I was wondering if you could talk a little bit about kind of how those different collaborations work, uh, how trust and technology can evolve in and how does this go about that process of starting to, you know, in A I and in another adjacent technologies? How does IT go about that process, uh, that it's been doing for so on.

Yeah, thank you for that question. As I said, it's uh, sort of uh, I think the magic sauce for us to to do stay holder engagements to work with the community and ask for their inputs, uh leverage the knowledge uh of the community and built on the really good works at the community has done and by working with all of the experts, strengthen the scientific, entertaining, uh building the right technical building blocks that are needed for development of scientifically valid guidelines and standards.

In terms of the engagements, particularly in the space of the A I, uh we all know that A I is a multiple civinal and uh in order to understand the the concept of the trust, uh, what make A I systems trust rely and what constitutes trust, uh, that was one of the main questions as we were developing the A I rise management framework and in the engagement that we were doing with the community early on, we recognize that as much as we need the input from, uh, the community that developed the technology, this is the community with the expertise in math, statistics, computer science. We also need input from the community that study the impact of the technology. That's economists, sociologists, psychologists, cognitive scientists.

And we need to bring all of them together because A I systems are more than just data, computer algorithm. They are a complex interactions of data, computer and algorithm with the human, with the environment, with the human that Operates this, with the human that can impact be impacted by the systems so that engagement with a very broad set of uh actors in the community bring different expertise and backgrounds. Uh became really important in the uh answer your questions about the trust and what constitute the trust.

As I said, that was one of the important and central questions in development of the A R R M F uh, the I uh R M F R D I with management framework very briefly. Uh IT was directed by a congressional monday and is a voluntary framework for managing the risk of A I in a flexible, structured and miserable way. IT was, as we do with anything else, uh, developed in close collaborations with A I community and engaging diverse uh groups of different backgrounds, expertise and perspective to particularly get in and focus and home on the concept of trust and trustworthy is so on the side of the you know trust, what makes us A I systems trustworthy?

When we started the process there have been uh very good, highly able uh value based documents that talk about A I systems to be on discriminatory, ethical and there has been a lot of different uh other papers and uh publications. Basically there are many different views about what make an A I technology A I systems trust ready and these views uh were not all a line and not the same page. So it's not a property that can be defined with perfect trigger but based on the collaborations and engagement and consultations that uh, we did a with the community, we only that there are well establish key factories of trust with systems with the help and consultation with the community in the I O M F uh describes trust with the A I systems as those that are valid and reliable, accountable and transparent, safe, secure and resilient, explainable and interrupt able privacy, hands and firm with harmful bias. Managed IT takes its step further and for each of these charteris's provide a sort of a definition, or bring the community on a shirt understanding with expectation from each of these uh characters and also talks about how these characteristics interrelate and uh tradeoff fs involve in decisions about how safety is safe, how private is private uh or how to enhance instruct possibility or transparency while at the same time, for example, uh preserving the privacy or ensuring the security and resilience of the I systems.

I'm curious as and as we go, we'll certainly dive into those topics. But one of the things around trust is you um folks like us who are in this industry and are living and working in and around and developing A I every day. Um you know there's these are kind of work topics that we're going through and the the guidance that list provides uh is invaluable uh and especially being part of that process of developing and as you described.

But I would before we surge into all that, there are so many people out there, uh that are not in this line of work as we are. And there are those that are know curious about how I they see you in in the news every day and they see, you know, they are trying to understand what are these technologies that were working on. And many people in the audience for this podcast are what we would describe as kind of AI curious as opposed to just practise.

We are practitioners, but we also air curious people who are trying to understand how IT fits into them. And I was wondering if you take a moment and kind of talk about the context of trust and A I for those of us who are not in this industry in a direct way like that. And you know like how IT does does nist a try to frame IT for the larger population? Or is IT more for practitioners? How do you see that .

for the larger world? Yeah, thanks for that question. So a let me try to answer that with an example. We are seeing a enormous advancements in the A I technology. And just in the past year, we saw a lot of in a release of the powerful models.

We are also seeing that these are technology is A A I systems are being incorporated into a lot of the functions of uh of the society and or the way we do our work. I want to explain the the concept of the trust in an examples in uh use of the A I systems in health domain when we go, you know for a medical imaging. Uh, i'm coming from the computer vision.

That's where my training was and that's where they failed that feel comfortable. So when we do a medical imagine some imagine and uh some sort of the imaging of the brain. And the question is, is IT a tumor there or not? So I agree, them can go when you be employed to help the physicians to, uh, make that decision.

So first, for that systems, for that algorithm wanted to be talking about the I R M uh uh trust word in this characteristic. We wanted to be valid and reliable. We wanted make sure that IT has some certain level of the accuracy.

So the false positive and false negative is low because there is gonna be, you know, you don't anna scare a patient saying that, yes, there was a two american IT was there was none or vice versa. IT too much is because of the errors of the system is being a is going on recognize ah. So we want the systems functions, functions as intended.

We wanted to be valid and results being reliable. On top of that, we also wanted the systems to be secure and resilient because uh, if it's not and if the systems get tag. And there is a lot of the uh personal information uh, that can get in the hand of non friendly users talking about that. We want the system to be privacy enhanced. We have heard read that particularly the large language models. Uh, they they have the tendency to memorize the training data even before the large language model, there were papers that showed that uh with certain level of expertise, the training data can be inferred from A A I systems so if the system has been trained on real patient data, you don't want to have any hole that can give you know access to those uh, private informations, explain ability and interpretations.

So if IT IT comes and say that, yes, there is a tumor, we expect to give some reasoning, some sort of the explanations of why decide that there is a tumor and then there is a lot of no one in there too because that explanations, if it's been given to a physician versus a technician versus patient, uh, it's gonna different level of the technicality and different level of the information being shared and and of course, we wanted to be, uh, fair. We don't want A I systems that has been uh uh more accurate for certain demographics versus others. You know this is usually happens if the training data is on even so, all of this at the end, we want to build the confidence in this, that this technology works, and the results, predictions, recommendations that the system is providing for Better decision making.

In this case, analyzing a scanning of the brain to see if there was any tumor was. So all of these things are with the end goal of A I technology has a lot of promises. They are very powerful tools. They can and transform the way we work for for Better, but make sure that at the same time, uh, IT uplift all of us and we get the maximum benefits or minimum the and negative consequences of the technology.

As of friends, i'm here the brakes with David, shoe founder and CEO of real. So David reall has definitely called the market on internal tall, soft development. Soon after me, the big idea, what did you are really? What is the big idea with internal software?

Yes, we'll started at this point seven years ago, and we started to tall. The core idea was that internal software is a giant, giant category that no one really thinks about. And what surprising to most people is that in internal software represents something like fifty to sixty percent of all the code written in the world, which might sound pretty surprising.

But if you think about IT, most of us, silicon valley, we work. It's off for companies, whether it's like a airbnb, google, meta, these are all companies that are software company, is selling software and extra pyy software. But do you think about most software engine in the world? Most software in ers the world actually don't work at these software companies is not that many of this may be ten one of them take ones at least most of the companies, the world driving non soft compass.

So we think about a company like an L V M H for ample, like a cola for ample. And software have a lot of soft, soft, all they do day in a day up, is basically built in turtle software. So that I think one reason started tool, the second reason we started agree tool is if you look at all this internal software the people are building, IT is remarkably similar.

So if you take a look at you like a ZARA, for example, rosa hole cola, two very different competence, obviously, want a clothing company, want a beverage company. But I usually look at the soft of the building internally to go run their Operations. IT is remarkably similar.

It's basically forms, buttons, tables, all this sort of pretty common building blocks, basically, that come together in different ways. But that if you think about, you know not the U I, but also what the logic behind a lot of the stuff, pretty much as having a in points at the database, you can about authentication, you can about authorization. There are sort of a lot of common building blocks, people to nal tools.

It's a first. The insight was wow turtle software as a giant normous category. It's also similar and developers have building IT. So could we create a sort of higher level fry work, if you will, for building all this software? And that would is really cool.

That would be really cool. okay? So listeners really is built for everyone, built for enterprise, built for scale, built develop pers. And that you, if you find yourself, not your head, then check out reat redcoat flash change log is the build international software. You a favor, get a demo or start for free today again, redirect flash changed.

So I know in uh in the early part of two thousand twenty three uh nist issued uh the A I risk management framework that we've been talking about but a few months later on almost exactly a year ago as we're talking in late october ah the White house issued its executive order on the safe, secure and trust for the development and use of artificial intelligence so I was wanting to understand that uh how the issue of the execute of order uh might have uh either altered or accelerated or changed any of the work that this was already doing. You guys we're already very much involved in artificial intelligence through the framework and other activities. Could you describe impact of the executive order on on the work you are doing?

absolutely. Uh, in answering your question, if I can just uh go back from the release of the A I R M F in january twenty twenty three two uh release of the executive or the end of october or october thirty, twenty twenty three. So the A I R M efforts released january twenty twenty three uh, in march of that year, we released A I resource center.

This is a one stop shop of knowledge data tools for A I was management IT houses A I R M F, its playbook in a interactive uh search, able fruitful manner. And by the way, the I resort center is definitely working progress, and we want to keep adding to that and adding more additional capabilities. Think such as standards hop repository for matrix.

We wanted to be really a one stop shop of all of the information, but also a place for uh engagements across the different experts. In june of twenty twenty three, uh just ability of context. ChatGPT three was releasing november twenty twenty two a month, so before released on A I R M F, an ChatGPT four was releasing february beginning of the march a year after a month after release of the uh A I R M F. So in response to uh all of these new development and uh advanced technology, uh we put together A H generate A I public working group where more than two thousand volunteers help us study and understand the risk of the general ai.

And then in october, as you said, we receive our a hind, our latest signal executive order on safe, secure and trust with the A I this executive where there really builds on a the foundation works that we have been doing uh from the A R R M F to play book to resource center to generate bi public working group uh super charge or effort to cultivate trust in A I uh mostly I giving us some um tight timeframe of the things that to deliver the year specifically directed needs to develop uh evaluations. Retiring safety and cyber security guidelines facilitate development of consensus space standards and provide testing environment for evaluations of A I systems. All of these guidelines, infrastructures, uh, true to the nature of news, will be a voluntary resource uh for use by the I community uh to support trust ord development and responsible use of ai.

We approach delivering on the ill the same way that uh we do all of our work going to the community. We put a request for information out to receive input based on the input that will receive, put draft document out for public comment based on the comments that will receive. We uh, develop the final documents that, uh, we were very plays that all of them were a released by the deadline of a july twenty six that the year had given us a quickly quick overview of the things that we put out.

Uh, one of them was A H document on a profile of A I R M F for genetic ai. The document number is at this. We like to refer to everything with the number. Uh, so that document is the nest A I six hundred dash one is across central profile companion research to the A I rise management framework based on the input that we uh had an discussions that we had uh the gently AI public working group, uh, responses to the terrify and inputs that we have received. I think one main contribution of that document, uh, if I want to summarize that is its description of the risks that are novel or exasperated by generating uh uh A I technologies. These risks uh uh span from C B R information of capabilities is access to sync uh of material the furious informations uh that can lead to uh design capabilities for C B R N confabulation uh dangerous, uh, violent, hateful content, data privacy risks, uh, let me remember the the rest h environmental impact of bias, human A I configuration, information integrity, information security, intellectual property, degrading, good abusive content, uh, the concept of the value chain and component integration uh with the generative A I we are moving from the binary deployed developer, uh kind of a actors and dynamics are now we are having upstream of the third party components, uh, including data uh, that are part of the this this value chain uh so one of the things that you are working in, uh continuing that work is to work with the community to gather h to get a Better understanding of the technology stack of A A I stack, if you will. For A I understand the different uh uh the role of the different A I actors involved the so we can do a Better uh h risk management.

Yes, you're talking about that. I could you describe a little bit and this is just a question in my mind, like when we're talking about A I as as risks as a as a set of risks and we talk about um kind of that that effort to create trust and technology um what how do you tie those together how in the this process um and you know you have identify these risks and you just to nummer ated those um and with the purpose of ultimately kind of helping people get to a point of trust and being able to implement the technologies productively, how do you approach getting to trust through mitigation of risk? Does that i'm not sure if the question make sense or not .

certainly makes sense. I I try to answer the way I rested this. A so AI systems are not inherently but a risky and is often the context that determines if negative uh impact will occur。 And also what are the so uh example that I usually use is that if I use faith recognition to unlock my phone, where is face recognition as in the airport that now our faces are burning pass to get on the plane. Our face recognition in the context of the low enforcement is the same technology, but in the different context, there is different risks and different level of the assurances that we want to have for the system work in a trustworthy manner.

Uh so what we have been trying to do uh as part of our work in approaching trust and trust with the A I, the first one was to unpack the concept, try to uh get into the characteristic that make a system trustingly that helps to answer the question of what to measure if I wanna know if it's trustingly or not, whether the measurements I need to do so I list that the seven characteristic more valid and reliable, six security sets to try. So that gives a more of a systemic approach and structural approach to what are the water, the dimensions, what are the characteristics that uh uh that together can make a system, by the way, A R R M F talk about this, that not each of them by itself make a system trust, really you can have a system that is uh a very secure, but not valid or accurate. So that's not gonna be trust, wordy and the system that hundred person accurate, but not security, also not so that gives you, again, more structured approach on what to measure.

Then the next step is how to measure methods and matrix for the measurement. Those type of measurement h gives an information about limits and capabilities of the systems, the type of the risk that can occur, the magnet of the impact if those occur. And then based on this information, then we can come up with mitigations and management of the risk. So A I R M F is its recommendations is really a categorised in the four functions of government map, measure and manage. The government is uh giving a recommendations on procedures and processes, rules and responsibility, E S, that we want to happen and organizations to do effective risk management.

So what is the accountability line? Whether the role and responsibility involve the map functions, provide recommendations on understanding the contents of the use, you are going back to the examples of the faith recognition underselling the environment that the A I systems are being Operating, the understand the community that can be impacted by that identify the risks in this particular context, understanding the laws, regulations and policy that are uh effective in this context of use um the measure functions provide recommendations on uh the how to measures for all of the risks identified in the map measure provides quantitative or quality ative recommendations on how to measure them how to take into account the trade of between all of those uh trust characteristics and all of this information being use during the um managing the risk part that can the recommendations can go from safeguard and mitigations that can put in place to uh mitigate risk to sometimes we cannot just mitigate and either be accepted or transfer erred or the system is to risky that they should not be developed or uh deployed. So that is how the process in A I orm.

There's a lot of your personality out there on the internet and you know this, anyone can see this stuff. There's more than you think though. Your name, your contact and fail your service security number, your home address, your various address is your past addresses.

There's even information about your family members, maybe even the name of your cat is all being compiled by data brokers and is being sold. Now these data brokers, they make a profit of your data, obviously. So they do IT your data a commodity.

And anyone of the way can buy your private details. They can identity theft you. They can fish you.

They can attend to fish you. They can hurrah you. They can see you unwanted span. They can call you non stop. And this is something I get lots.

But now you're able to protect your privacy online with delete me as a person who exists publicly for some time now, especially someone who shares their opinions online quite frequently, i'm aware hyper aware of safety and security and to take this seriously, and it's easier than ever to find personal formation about anyone online. Really, all this data just hang up on the internet and can have actual consequences in the real world. That's why I was excited about finding this recent solution.

And a sponsor of the show, delete me. Delete me is a subscription service that remove your person information from hundreds of data brokers online. When you sign up, you can provide delete me with exactly what information you want deleted and their experts take from there.

They send you regular personalized privacy report showing what information you've found, the internet about you, where they found IT and what they removed. And delete me isn't just a one time service. They are always working for you, costly monderman cony.

Removing your personal formation that you don't want on the internet. And to put IT simply, delete me does all the hard work of wiping your data, your family's persons information and all these things you don't want out there from those data broke websites. Now, the next step is to take controlling your persons data and keep IT private forever by sunning up for delete me.

Now, at a special discount rate for our listeners, of course, this is awesome. Get twenty percent off you delete me plan by texting practical to six four zero zero zero again. Text the word practical to six four zero zero zero.

And of course, you may know this already, but message and data rates may apply. Chick, in terms all the good stuff, once again takes the word practical to six four, zero zero zero and get twenty percent off. Delete me, enjoy.

So that was, uh, very useful for me in terms of trying to frame and understand, you know what you're relying here in terms of government measure managing. And you talk about something a moment ago that was a really interesting in this sense of you have these characteristics, you know that you're trying to measure trustworthy, but it's not just one and it's not just A A black or White issue.

You have A A collection of them and they vary across different types of of use cases. That sounds like so you kind of have a new characteristic profiles in a sense. How do you think about if you are out there uh, as a uh, consumer of the guidance that you're providing from nest, maybe you are in a small company that's doing some work in A I and you're you're trying to implement the guidance from list and you're kind of evaluating your own profile of characteristics through that government map measure manage process. How does one frame that if you're kind of just getting into this and trying to implement the guidance, could could you talk a little bit about how an organization ah that maybe had not done this before might go about implementing uh, a particular, you know whatever their use cases and uh you know how how do they get started in the process? What would your recommendation there?

The first thing I will say is that you don't need to implement all of the recommendations in the A I R M F to have a complete risk management. So um our recommendation is that start by uh looking at and reading the A I R M F. It's not a very long document. I forgot.

I think it's about between thirty to thirty five pages, so get a kind of a holistic understanding of this and then check out the a playbook uh in the A I resort center。 But for each of the recommendations A I R M F is in uh high level for uh functions. Uh each functions is divided into category and then sub category.

So in a sort of a grand old um the approach, we give recommendations on what to do for the goal. And then for each of those recommendations get into a little bit more uh granule recommendations. The playbook for each of the sub categories, which is about I think so many subcategories and they are rm provides uh recommendations on suggested actions uh informative uh documents that you can go read and h get uh more information h and also um suggestions about transparency and documentations uh for implement of that stop category.

So we often uh suggest that get a Better understanding of the I R M F, spend some time in the playbook to get a Better understanding of the type of the things that can be done. And then based on the use case, based on exactly what you want to do, start by simple small number of recommendations in the A I R M F and start implementing that governor or map functions are uh uh useful starting points. Uh government provides recommendations about the set up that you need for air successful risk management.

So I can give you ideas or organizations ideas about the resources that needed, the teams that needed to do that so they can align with their own resources and and the are teams that they have and uh the map functions as we discussed, uh gives recommendations of uh Better understanding of the context, getting answers to what needs to be measured. H I will also add that the uh the functions, uh government measure, manage, uh there is no order on doing that. Um depends on the use case, depends on what needs to be done.

The starting point can be recommendations of any of the functions we usually recommend, start with govern and map and then um start with a few number of the subcategory or recommendations uh that the resources and the expertise of the entity, a laws for their uh implementations, of course prioritize in terms of the their own risk management uh and then the last single also at is uh also be mindful that the risk management is not a one time practice that we just do at once and you say done with my risk A I systems, you know there's data drift, model drift, uh, disney, where models can change based on the interactions with the users, with the with the environment. And so we suggest a continue monitoring and risk management. So uh I think one of the recommendations in the map or governor is to come off the dictate of repeating a the assessments I of the risks.

Uh, so that would be, uh, my recommendations that I would say is that I mentioned the A R R C, I mentioned the playbook, be also in the A R R M F talk about profile. Uh, so I keep emphasizing the context of the use and mentioning that uh the the importance of the context in A I system, the deployment, development and the risk management. And at the same time A R R M F by design is trying to be sector agnostic and technology agnostic.

We try to kind of come up with the foundations, the common, a set of the practices that needed to be to be aware of and uh or suggested uh for a risk management. But we also have a section on A I profit and recommendations on building verticals。 These are um these profiles or incentive of the A I R M F A particular h use case or domain of use uh or technology uh domain so that each of the subcategories can be slender or be aligned with that escape.

So there can be a profile, A I R M F, for example, that used a medical image cognition that we can imagine approval of A I R M F for financial sectors. That something that we have been asked to work with the community on that was a very long intro to say that there are couple of profiles posted on the A I resource center and one is a the one that a department of labor labor did for inclusive hiring. Another one the department stated for uh human rights uh in A I so that can give some sort of a window to or uh idea about radio organizations can start in addition to the profile. We are also uh we have also posted a few use cases and we will post more use cases and that is how different organizations are using A R R M F that can hopefully be more practical uh, examples of how to use A R R M.

No, that's a fantastic uh set of suggestions right there. Um and I actually like to to can ask a follow up to that and as a prelude to my follow up to if I understanding kind of go to the A I R M A for read that or document. It's not very long, it's very consumable. Go to the playbook. Look at the sub categories ies, I believe you said they were about seventy of them you know IT has suggested actions and references to other dogs.

Uh in that and then start to bite off kind of simple small chunks in terms of how you're going to approach the functions that you mention, starting kind of with governing map and then um kind of how to put together resources and teams uh and then kind of cycling back with a cent of repeated assessments ah that are also specific to the vertical that you're in. And as you're doing that, it's feeling really practical for my serve. We are practically eyes so that appeals.

Um i'd like to ask, are there now or do you expect kind of tooling you like if you look outside of ai of the software industry, large is kind of a predecessor to that as standards and workflows and kind of best practices a rose in software development at large, uh, lots of tooling, a rose around how to do you Angel methodology and and you name that there are many different approaches to suffer development. Are you expecting tooling or do you haven't thinking about what kind of tooling might help A A I development teams? Uh that as they're building these teams and their resources, uh, so that they can be productive over time. Uh, how are you seeing that evolve going forward? Or do you think that theyll be a cottage industry kind of forming around this the way we've seen in software, other areas where there is a lot of tool support?

Yes, we have all ready started seeing some of that. So there are entities that are putting tools for implementation of the I R M F and dash words on all this h on that. They have developed those tools and they are having get on their websites. If I can just go back and thank you for your a excEllent summary of uh my very long when the answers .

it's no is very good. I'm learning a .

lot here and ah I ask your listeners to I think start with the A I resort center. The your l is A I R C that that gov. A R R M is there. And playbook in an interactive filter, filter able ways there. So if their businesses is only, you know, there are developers, they can go on first filter all of the, you know, from the seven recommendations, anything that is only applicable to the developers, so they're not overall with all of that.

Or if they are only care about deployment and the issue of bias for the deployment, they can go and say, you know filter for the from the A I actors for the deployment and from the charteris tics from the bias and that gives them h that saved them sometimes so that is the verdict information from our website and some hints about you know kind of we have a in more suitable way. And yes, there has already sorted uh entities that are putting more killing in with the six hundred dash one that was the across extra profile of the A I R M F for the generated ai um and the work that we are doing with the community, we are uh we are focusing on um use the word Operation ization. So whether the tools that are needed for Operation alizon and implementing A R R M F and uh going back and emphasizing the community engagement and the role that the uh input from the community plays in all of these things. Some of the tools can be developed by us, but majority of the tools are being developed by the community and shared by the community. And we held that.

We see more of that. I hope so too. It's it's fascinating. I love the framework that you've given us here that is you know I can be applied in so many different uh verticals and so many different ways and yet is flexible in its guidance that way as we wind up here. And we have seen so much advancement in in the a the development of A I both as a technology and as the industries around IT and as you are, are a kind of sitting there in the nerve center of kind of where these uh this guidance and the standards come together brigge, both government and industry.

As you look forward, what what are some of the things when you're you know not in a particular meeting and you're just kind of winding down and your you're kind of thinking creatively about where things are going, what are some of your thoughts about the future of this, both for this role and for the the industry and the technology at large where we're going because it's just it's going at such a rate. It's so fast and it's fascinating and changing the face of business, changing the face of how we are as humans and staff in terms of the tools and you know that are available to us. I'd really love your insights into where you think all of this is going. Uh, in the days and years ahead.

I think what and for me and goal for me, what have I am hoping to see a lot of that is to use a this powerful technology in the way as as a sort of a scientific discovery tool in the way that you are doing. The science and discovery is there.

I think that is where we are gonna see a lot of a really advancements into precision medicine, individualized education, uh, you know, climate change, anything that uh that's gonna make a life a lot Better. Uh for for all of us. I I will have to say that that I uh uh you know my my heart was was worn by think uh, noble prizes for h things such as alphago ld, I think a long time that you know that needs a lot month recognitions.

Uh but really all of the recognition that uh that A I got uh true those those Prices and Prices, but we all am also very aware of very important things that list can do and the community needs to do. I think we all agree that there is a lot that we don't know about how these models work, and we ought to do something about that. We need to have a Better understanding of how these models work.

There are capability and limits that get me to the important topic of evaluations and tested。 Uh, we talk about IT at the h beginning of this podcast that it's important to impact the concept of the trust into the things that needs to be measured. But at the end of the day, we need to have reliable measurements for assurance that the systems are the atmosphere.

Uh, as a measured science agency, we are the big fan of this, a code from lord calvin, that if you cannot measure that, you cannot improve IT. So if you want to improve the trustworthiness and and the reliability of the systems, we need to be to have a good handle on how to test them and how to evaluate for reliability, for validity, for the trust in his and our knowledge on how to test A I systems is very limited. We need Better evaluations.

As we can see, benchMarks are too easy. They get saturated very quickly. We need to have a Better understanding of how they work. That gets to the assurance that can build trust into the technology and give users, uh, everybody confidence that this system works. And and the the third item that I put in, uh once we have build that knowledge uh base, once we have a good scientific foundations, when we have uh through the research on the work with the community, we have built a technical building, building works. Let's develop a clear, understandable, technically robust standards that can help with uh global improbability of uh A I evaluations, A I assurance and A I governance.

fantastic. Well at to thank you so much for coming on the practical AI gas. I was very, very instructive uh, in terms of how to frame this certainly information that i'm going to be using going forward. And really appreciate you take a time to talk with us today.

Appreciate the opportunity to a to be here and talk and um really enjoy the conversation. thanks.

Alright, that is practically eye this week. Subscribe now if you haven't already had to practical AI df m for all the ways and join our free slack team where you can hang out a Daniel, Chris and the entire change law community sign up today at practical A I dt F M flash community. Thanks again to our partners at fly A I O to R B.

Frequent residents break mater ceLinda to you for listening. We appreciate you spending time with us. That's offer. Now what are you next time?