Al Jazeera Podcasts. Today, how AI could save Indigenous languages from extinction. By 2050, around 20 Indigenous languages are going to be left in specifically the United States. But in the wrong hands, could it do more harm than good? I'm Kevin Hurtin, and this is The Take.
Hello, my name is Danielle Boyer. I'm an indigenous robotics educator. My indigenous community is Nishinabe from the Sioux tribe, which is in the upper peninsula of Michigan. But today I'm calling from San Diego, California. Danielle, maybe you can start by just telling us a little bit about yourself, where you grew up and how you became an inventor.
I grew up in Michigan, kind of all over the place. And I grew up not being able to afford things like science and tech education and being under the poverty line. I got started in science and tech education because of my little sister, Bree.
but the local programs that existed were not accessible or affordable to our family. So I got started teaching at age 10 because of her and wanting to bring science and tech affordably and accessibly to my community. And that kind of balled into the work that I do today where I founded a youth-led charity called the STEAM Connection, where we bring free science and tech education to Indigenous youths, especially through robots. And we do it all for free.
Yeah, and your community is, you said, Upper Peninsula of Michigan. So just for people who are an international audience, that's around Lake Superior, the largest of the Great Lakes. So you're from an Indigenous community, and Indigenous communities have been warning for a long time that their languages are disappearing.
But there has been some excitement in the last couple of years about the possibility of preserving them using generative AI and large language models like ChatGPT. I know you're a bit more skeptical. I wonder if you could first just tell me the optimistic theory of the case, how that might work, and then why you think it's falling short.
I think the first thing I'm going to say for context about Indigenous languages, especially in North America, is that there are hundreds of them. Something that people don't often know is that there's not just one or two Indigenous communities or first peoples communities in Canada, the United States, and Mexico. There are hundreds and hundreds of recognized tribes. So within our recognized tribes and communities, there are different cultures, there are different languages, different regalia, foods.
everything like that. And so within that, we have different language. We have different dialects. Even within my own community, we have different dialects.
And so kind of prefacing with that, the way that we approach language varies to each community that we're working with. So in artificial intelligence, there is a lot of, I think, hopefulness that people have surrounding it, especially when it comes to preserving Indigenous languages. I feel a lot of the time people see the capacity of artificial intelligence with, oh my goodness, we could fill in patches for missing parts of languages.
quote, dead languages. We could preserve different verbal elements of languages. We could have chatbots being able to communicate in indigenous languages. There's a lot of hype and hope that people have surrounding large language models and also resources like ChatGBT.
For me, I'm a lot more skeptical of these resources and tools because of the way that the data sets are actually being trained. And they're being trained right now, especially through resources made by like OpenAI and their program called Whisper.
They're being trained on inaccurate data sets of indigenous languages. So the resources that do exist now from places like OpenAI or Duolingo, a lot of them are pretty inaccurate. And so that gives me a lot of worry that the way that we're approaching indigenous languages is actually without the indigenous language.
Right. And there are a few things that are specific to indigenous languages that make them very difficult for AI to parse and understand. Can you give me a few of those examples?
In most contexts, our language are best preserved and learned through spoken form. That's not to say that there aren't languages that are not written or ways that we have preserved our language over time in a written capacity. Even my own language has written elements to it. But...
to take those elements and feed them into a computer and try to replicate things that should be only replicated by a human, especially when a lot of our languages are very context-based. Like, for example, in my language, which is called Nishinawbe Moen, the word for blueberry pie is super, super long. Like, I can't even say it. It's
practically over 20 words long because it is literally an ingredient list of what is inside blueberry pie. So it's a very context-based language where we're literally describing everything that's within something and everything that's around us. And so when it comes to indigenous languages, you can't miss those elements in that context and the explaining elements of the language. Otherwise, you're not able to communicate anything.
So ChatGPT, which struggles with English, as we all know, anyone who's used it, they come across a word like blueberry in your language, and they get an entire ingredients list. It just has no idea what to do with that information. Is that right? Yeah. And so ChatGPT wouldn't be able to efficiently communicate the context behind the word and what it means, because it's going to lack a lot of authenticity, and it's going to lack a lot of what you're able to get from being in person with someone, or even from
from a recording with someone. Hearing a recording from an elder or a community member is going to give you way more context to the language than anything that would be manufactured with AI.
And that's why I think it can be dangerous when we see a lot of new language tools that are being created at a very rapid rate, manufacturing new languages, manufacturing new phrases, coming and training with different data sets from the internet that may or may not be accurate because often it's pulling from places like Google. How are we to ensure that the words that they're feeding into these programs are actually correct when a lot of our language is meant to be spoken person to person?
The models that do exist currently, especially from large corporations, are being trained in the wrong way. They're being trained by data scraping and pulling information from the internet without regard or context to its accuracy. And I think that that's a dangerous thing. There are ways to go about it where the language is quote-unquote accurate, right? You might be able to get, you know, ask, hey, what's the indigenous word for
or specifically my language with the Anishinaabe word for hello, which is boujou. I could ask it that and it could tell me that, but I feel like even that lacks a lot of depth that is important from hearing it spoken or actually communicating it in person. And so even if it's 100% correct, I still worry that it might lack important context.
Yeah. Okay. So why don't we talk about just how we got here, broaden the lens a little bit. Why are so many indigenous languages on the verge of extinction and just how urgent is this problem?
Yeah, so within Indigenous languages in North America, we have faced a rapid decline of Indigenous languages. And estimates suggest that by 2050, around 20 Indigenous languages are going to be left in specifically the United States. That's a very small number when, I said before, there are hundreds and hundreds of communities and tribes.
And so seeing that rapid decline is a very concerning and very scary thing, especially we saw even a steeper decline after COVID-19. And with COVID-19, we saw a lot of our elders and community members who spoke the language pass away. And so seeing that happen, you have a very hit, like strong hit to the culture and to the language because the people who bear our knowledge are no longer with us.
And so seeing that happen and also the impact of things like colonization and also things like residential schools, we have seen our language get lost at a very rapid rate, especially from generation to generation. Even within my own family, that's the case of that rapid decline of language. And so it's a scary thing where if we do not learn our languages soon, what do we have to pass on to our children and to our grandchildren?
Yeah, absolutely. I mean, and there is a question of funding. You know, some of this takes money and it takes resources and those resources aren't being made available. Yeah, so a lot of the issues that we face with language preservation is funding and resources and things like that, especially because like sometimes you'll get funding for one dialect and then not another or a lot of the burden of preserving a language is placed on one person and that's, you know, can be a lot of pressure and stress.
But we're also seeing issues where with our language resources, a lot of white and non-Indigenous owned companies will come into our communities and actually steal our language. An example of that is a nonprofit group founded by two non-Indigenous people called the Lakota Language Consortium.
And they came into a Lakota reservation, I think it was the Rosebud Sioux Tribe. And they came into their reservation and they started recording resources and preserving language resources with community members.
The community members thought it would be a collective and shared effort, but the nonprofit actually ended up copywriting resources and allegedly selling them back to community members, or at least attempting to. Wow. And so they actually got banned from the reservation and from the community where Rosebud Sioux Tribe is. And they also are getting sued by a grandson of one of the language educators. And the tribe is also suing the nonprofit. Wow.
That's really depressing to find that white people may be finding new and creative ways to steal from Indigenous cultures. Oh my goodness. It's also because a lot of these organizations see a lot of value in Indigenous data, but don't always want...
it to be connected to indigenous people. And that's something that is tried and true of colonization where they want parts of us, but they don't want us. And so it can be a very scary thing, especially when we see stuff like open AI and their use of whisper, because to me, that feels like the same thing just with the use of AI. And so we're seeing an issue where time and time again,
Our information and our data is getting stolen. What is the information they're interested in? Is it just the personal data or is it something deeper? So a lot of the information that I think especially corporations are interested in is just sacred indigenous knowledge in general.
Like in California, we have traditional methods of keeping the wildfires down from the tribes out here in the West Coast. And for a long time, it was illegal for tribes to burn a level of the ground first so that the rest of the stuff doesn't catch fire.
And you don't have massive wildfires. But that was illegal to practice up until recently. And then, of course, the government comes in and is like, hey, we actually want to use your data and your information. And we want to practice that now, but without you. And so we're seeing an issue time and time again where people are seeing value, especially governments and corporations see value in indigenous knowledge and information for either capitalistic purposes or things like that. But they don't want us to be a part of it.
I even had a top 10 global company with a smart speaker reach out to me and ask to put their smart speaker into one of my children's educational robots to do data collection on indigenous children. And they offered me nearly a million dollars to do it. More with Danielle after a quick break.
Let's talk about real food for a hot minute. We all know we should be eating more real whole foods, but seriously, who has the time to make sauces and dressings from scratch? Not me. And I'm not reading every single label at the grocery store just to avoid sketchy ingredients. That's why we love Primal Kitchen condiments. Primal Kitchen has nailed the perfect combo of high quality ingredients and delicious flavor. So your
meals will taste as real and homemade as it gets. My go-to is their original buffalo sauce. It's got that Goldilocks level of heat, not too hot, but still super flavorful. It pairs perfectly with Primal Kitchen's no-dairy ranch dressing that's made from real ingredients like avocado oil, not seed oils, and Primal Kitchen mayo is a must-have. It's made with six simple ingredients like organic cage-free eggs and avocado oil. So if you're all about real food but need a little help making it exciting,
you need Primal Kitchen condiments and sauces. Head to primalkitchen.com forward slash wellness podcast to save 20% on your next online order with code podcast20 at checkout. That's P-R-I-M-A-L kitchen.com forward slash wellness podcast and code podcast20. Primal Kitchen products are also available in stores nationwide. So visit primalkitchen.com forward slash wellness podcast to find a location near you.
This week on True Crime Reports: It's 2015 and we're in the tropical forests of the Democratic Republic of the Congo. A man, a member of a local indigenous community, enters the forest with his son in search of medicinal herbs. They come across a group of eco-guards who've been placed here to protect the area from poachers.
The guards open fire and the man's son is shot dead. So how far are Western groups willing to go in the name of conservation? True Crime Reports, a new global crime show from Al Jazeera. Subscribe and listen wherever you get your podcasts.
Okay, Danielle, let's try to turn the page to something a little bit more positive. And that is the amazing things that you have been inventing. One of your creations is called the Scobot. And I absolutely love these. They're like robot parrots that sit on your shoulder and you dress them up and it's just great. So tell me about the Scobot. What does it do? How did you come up with this?
Yeah, so a few years ago when I was 20, I invented a robot called the SkoBot. And Sko is reservation slang for let's go. And so it's let's go bot. And it's a robot that sits on your shoulder and actually speaks indigenous languages. And it uses ethical AI. So basically what it does is it listens to you speak in English or Spanish.
and it recognizes the words that you're speaking and then it syncs the words that you're saying in either English or Spanish and it syncs it to an indigenous audio recording. So basically I can say, listen, hello, and then it's going to say, which is hello in my language, back.
and it's going to have an audio recording. So that's kind of how we communicate with the robot, where AI is not touching the indigenous language or the recordings or things like that. And it's meant to supplement everyday language education. But it's using ethical AI where it's listening to what you're saying and it's interactive, but it's an internally based system, meaning we're not pulling stuff from the internet. Everything is created by us.
And that's something we created and have kind of perfected over like many, many years now where we've been thinking about as a group, as my organization, and also with my mentors who come from different indigenous communities in Canada and the US.
of what does ethical AI look like and how are we able to benefit and uplift our communities using technology, especially a robot meant for kids, kind of as a class pet. And it also speaks my language, Nishinaabemoe, and it speaks it in children's voices.
But for us, I think the main thing we were thinking about is what does everyday Nishinabe toys look like 100 years from now? What do indigenous toys look like? And so we designed the robots to actually represent and look like, you know, our youth and have elements of our regalia and elements of the children's personalities reflected in what we're doing.
And as with, I will always say when I bring up these robots, they are only brought into communities where they are requested. We never are like, hey, we're going to advertise this robot. You must have AI. You must have robots. Right. We bring it into the communities where it's accepted and we work on training them for years and making sure that everyone ethically consents to having the tech in their communities if they're interested. Right.
It does sound like this addresses all of the problems that you flagged earlier, right? It's funding, it's there, it's spoken language, and then it can recreate some of the stuff that some of these people have been lost during COVID, somebody to actually teach you the language in an ethical way. Yeah.
Yeah, and I think it's something that is exciting for kids because a lot of the times, especially when you're young, it can be difficult to learn the language or wanting more classroom resources can be something that teachers are requesting and things like that, especially language educators. And so we really wanted to fill that gap and make something that appealed to the youth and was exciting and encouraged them to learn our languages. Do you have one behind you? Is that a scobot behind you? Yeah.
I do actually have a scobot behind me. I, uh, this little guy right here and they're my babies. And you know what? As much as kids love their grandmothers, I would much rather play with a scobot than my grandmother and learn the language. No,
No, both at the same time. I actually have the cutest picture of me and my grandma with the Scobot. Well, there you go. Perfect. And she's the one who inspired me to build this and helped me create the robots too. But yeah, we actually have a student decorated one back here. It's a powwow princess. And so kids are actually able to decorate them and create them and build them into how they see themselves.
I love this. Absolutely love it. Okay, just as we wrap up, this is such an interesting issue because it forces us to question, you know, the very nature of language itself. AI might be able to save a language from dying, but it can't keep it alive, right? A language needs to be spoken, as you said, to avoid becoming an artifact of history as opposed to this living expression of culture. Yes. I mean, that's a bold challenge to take on, but it sounds like you're really making progress.
Yeah, I think it's something that I really want people to think about in connection to the living aspect of Indigenous languages. When we have AI tools or we have things that remove the community aspect and the human aspect of our language, what's the point? You know, what is the point of actually speaking it at all if not for a fun fact?
It's so important to actually preserve our language and to preserve our ways, to speak it with community members, to, you know, have it as a way of community, right? And so, yeah, I want people to see Indigenous language in that capacity. And I also want people to see that we're not just peoples of Western movies and of the past, but we're peoples of the present and also the future and people creating things like robots.
And that's The Take. This episode was produced by Chloe K. Lee, Marcos Bartolome, and Tamara Kandaker, with Manny Panaritos, Duhom El-Sad, Hajar Salah, Khalid Sultan, and me, Kevin Hurtin, in for Malika Bilal. It was edited by Alexander Locke. Our sound designer is Alex Roldan. Our video editor is Hisham Abusalah.
Alexander Locke is the take's executive producer. And Ney Alvarez is Al Jazeera's head of audio. We'll be back tomorrow.