Amazon q business is the generate A I assistant from A W S, because business can be slow, like waiting through mud, but amazon q helps streamline work, so tasks like some mary zing monthly results can be done in no time. Learn what amazon q business can do for you at aw 点 com flash, learn more high .
future of everything listeners. A quick note. Before we start today's episode, we've made a change for bringing you even more of the original reporting and interviews you've come to expect. Check us out weekly on fridays and let us know what you think. Drop a line to F A V podcast at W S J 点 com。 Also just a heads up that we say the names of some common voice assistant several times in the episode, if IT activites one in your home.
we apologize.
Michael cash lives in television israel.
I know, though I grew up in image.
He says he's forty nine years old and grew up in england. You might have trouble understanding .
him because he has terrible policy.
He says it's how he's always spoken. Though his friend's and family have always understood him, he says he needed many years of speech therapy to feel confident in speaking with everyone else.
He says so that they can understand him and his sense of humor. Digital tools can be useful for everyone, but for cash, IT means being able to make himself understood. He works as a product specialist at voice, at a company that makes voice tools for people with nonstandard speech, including one that helps you m communicate with a settle cabuli in one on one conversations.
IT even integrates with amazon alex a to control the T, V. And air conditioning around his home. But when IT comes to tools built into voice assistance like alexa or apple siri, he's found using them frustrating.
It's been about a decade .
since A I. Voice assistance became commonplace. They are baked into cell phones and smart speakers. And in that time, people have figured out how to put them to good use.
Hey, google. Play ocean sounds.
Alexa, call brother.
Sir played the radio.
Alexa, volume down. Volume down. Alexa, stop. But these ways assistance are far from perfect. Many of the big companies, such as google, say that the error rates of their automatics speech recognition tools can be less than ten percent.
But for people whose voices are affected by neurologic conditions like parkinson's or voice specific conditions, that era rate can be far higher as much as fifty or even ninety percent. I M, can you say that again? But what if A I voice assistance simply worked Better for everyone right out of the box, no training required from the wall street journal.
This is the future of everything. I'm alexo saleh. Today, we're talking about how academics and tech companies like amazon and google are working to make A I voice assistance Better for people with a typical voices. If they can make that happen, A I voice assistant could work Better for all of us. Stick around amazon .
q business is the new generative A I assistant from A W S, because many tasks can make business slow, as if waiting through .
mode a help. Luckily.
there's a faster, easier, less messy choice. Amazon q can securely understand your business data and use that knowledge to streamline task. Now you can summarize quarterly results or do complex analysis in no time.
Q got this. Learn what amazon q business can do for you at A W S dot com flash. Learn more.
How do you make a voice system anyway? Mark hossegor Johnson, a professor of electrical and computer engineering at the university of illini urban ish champagne, says it's change a lot in the past decade or so, making the models used to rely on audio data that's matched up with transcripts called label data. But these days you need about a thousand hours of audio without any associated. This is called, you guess, IT on label data.
And you have a neural network, which is an algorithm whose relationship between its input and its output is determined by a lot of numbers. And so we want to learn all of those numbers in a way that this neural network is able to predict the speech sounds that are coming up, given the speech sounds that have come .
before IT takes a lot of computing power to train up the algorithm.
Hosaka Johnson says. The last time that we tried that here at the university, that took three weeks to do that training run. So that's quite a lot of computer time.
But once that's done, the researchers contest IT so they can find to the algorithm.
we feel a bunch of label data, see how that does, and then figure out what we can do, order to make IT do Better and keep adJusting IT to make IT do Better.
Better sounds simple enough, right? Well, making these voice recognition models wasn't possible until relatively recently. In fact, host Johnson knows exactly .
when I would say that really. The key moment was probably december two thousand fourteen, when the first commercially viable end to end neural network automatics speech recognized was published.
In other words, some researchers made an algorithm that could recognize the sounds we make when we speak and transcribe them. What they didn't have was a ton of data to train their algorithm. Coincidently, a few months after that, a revolutionary data base went online liberal speech, which had hundreds of hours of amateur dio book recordings. Another group of researchers from johns hopkins university correlated those audio books with written words in the online free library project guttenberg, and suddenly they had a data set to work with.
I believe that the existence of that data made the deep learning automatic speech recognition revolution possible.
Sa w. Johnson says that data base was such a big deal because researchers finally had enough training data for their AI worth assistance to be actually usable, and that had the potential to make them commercially viable.
The much larger data sets and the end to end neural network training systems that would become available made IT possible for companies to create products that would actually be used by real people.
But those there are rates didn't drop quite so much for everyone because of the data the models were initially trained on. They were not so good at understanding people who spoke english as a second, like which, or spoke with an a typical voice.
Maybe not surprisingly, the algorithms tend to have difficulty with the same kinds of speech patterns that humans have difficulty with. So people with severe basic to south ria, because of surely poli or L S, for example, have a great deal of difficulty being understood by speech recognition algorithms.
And in the years since, though, algorithms m have gotten Better, they haven't always gotten Better for these groups. So companies have had to find their own ways to make their AI voice assistance more accessible for more people. One of the people working on this is josh merely. He's a principal accessibility researcher for devices at amazon in twenty twenty one. Merely was awarded a macarthy genius grant for his work on .
accessibility in tech. My work is really related to making sure that the products that amazon producers are accessible to as many people with unities as possible.
He also has a personal stake in his work.
Yeah so i'm a principal researcher, but i'm also blind. Not only do I know quite a bit about how to conduct research and development of products for people with disabilities, but of course, i'm pretty deeply ingrained in multiple communities of people with disabilities.
One of the products he works on is alex, a amazon's voice assistant I like is now in a number of amazon products, including its echo .
smart speakers. You can shop with your voice. You can read books with your voice. You can watch videos with your voice. For all of amazon customers, alexa is super cool. But for customers with disabilities, alex a can be transformative because IT provides access to these services. Movies, books, shopping that are all really frictionless for people with disabilities merely says .
amazon approach to accessibility features comes from its desire to make its products available for as many customers as possible that lead in some interesting directions, including a feature that doesn't require speech at all.
The customer who couldn't speak, let us know that he really wanted to be able to use aleta in her home. And we came up with tap to elea, which basically uses one of the echoes devices and allows you to set up any number of different types of interactions that are triggered by tapping on a particular tile.
There is another company called voice IT. Remember when I mentioned voice at earlier where Michael cash works well? IT was founded on principles .
of accessibility. Our core focus is how can speech recognition be used to help people with disabilities, illnesses, medical situations affecting speech to communicate and be understood using their voice.
That serious moly, cofounder of voice IT. In twenty twenty, voice IT launched an integration with amazon alex a with a voice commands capability. IT allowed people to program commands for the voice assistant to follow so that he could do things like turn on the lights in the living room, but that had limitations.
But for many people who wanted to speak more fluently and spontaneously, we learn that this could be limiting for them.
So there was room for improvement. In August twenty, twenty three, voice at released its newest product called voice at two. It's a browser based application. And yes, users still have to trade IT, but once they do the upgrades, new functionalities that unlock the digital world.
These functional ties and features include transcription and dictations, so writing notes, documents and emails using their voices as well as interacting with A I productivity tools. So being able to interact with ChatGPT by voice is extremely empower ing for these individuals.
But smalley says that the most significant capability is that was that can create real time transcripts of what people are saying and integrate that into common workplace software like webs.
The way elective describe IT sometimes is what a ramp was to an office building voice, that is, to today's remote workplace. If a person cannot join and interact, participate in a video call, then they can't really fulfill their potential and communicate at work.
I had the opportunity to try this speech in my interview with Michael cash, who we met earlier and who works at voice. We were using zoom, and the real time transcription popped up like captions under his video box. IT was pretty accurate, though not perfect, and then IT was helpful.
IT was also a little awkward for someone like me who's used to a quick back and fourth in conversation. IT was uncomfortable to wait for a few seconds for the transcription to catch up to cash as words you can hear IT. In this bit of our conversation here, I was waiting for a whole three seconds .
for the transcript to come up. We were you. Now .
that's school. You, we you had a clear need, and the company was able to address IT. Well, some companies are working to unlock the internet for people with a typical speech. Others are working to expand voice accessibility in the physical round more on these tools after the break.
Amazon q business is the new generative A I assistant from A W S because many tasks can make business slow, as if waiting through .
mod ah help. Luckily.
there's a faster, easier, less messy choice. Amazon q can securely understand your business data and use that knowledge to streamline task. Now you can summarize quarterly results or do complex analysis in no time.
Q got this. Learn what amazon q business can do for you at A W S dot com flash. Learn more.
Companies are getting creative with figuring out new ways to make a voice assistance more accessible to more people. Another big tech company working on this, google. About five years ago, people inside google started talking about how voice assistance can't reliably understand people with a typical speech. The result effort called project iphone ia.
The goal was that anyone who struggles to be understood by other people due to any kind of condition could participate. Record a number of phrases, and then in exchange for their contributions, we would give them a VISA gift. G.
that's duly cut. O, A product manager at google research.
fast forward to today, people have recorded over one million speech samples. H, two thousand people have participated, and they have a very broad rage of different disabilities. And so the data set has allowed us to study how to improve speed recognition for people. With that data has allowed .
google to launch a separate APP called project relate available to the public on the google play store.
The weight works is that users downal the APP. They are asked to record five hundred simple phrases, and then we have back in that automatically trains a personalize speed recognition model. And then users can use this model in the up IT can be used either to transcend ribe they're saying or we have a repeat feature that directly repeat what you've just said in a clear computerized voice.
Google says several thousand people have downloaded to project relate, though I wouldn't are exact numbers. Here's an example of how relate works from a demo video on google website. IT shows a person with muscular destripes ordering at a coffee shop, hot chocolate, hot chocolate.
对。
Relate has a few limitations, is only available in english and in certain countries. And though training a personalized model reduces the arrow rate for many users, that's not the case for everyone.
We still do see sometimes people who have a severe speech experiment. And in this case, the APP might actually be quite useless, to be Frank, because even if we have data from them, IT might not be enough to get signal to actually understand their voice.
Google has conducted test to see if the data collects, bring people with disabilities could be used to Better train its speech recognition system, called its universal speech model. IT can transcribed speech and more than one hundred languages. Right now, it's only a use on youtube, and a google spokesperson said the company currently has no plans to share on integrating IT into other products.
You can imagine that for people who want speak recognition to work really well, or maybe they have a quite a strong speech impairment, they might find personalization more useful. But for others, maybe we can make speech recognition simply work out of the box for them.
Katya says google's goal is speech recognition that works seamlessly for everyone right from the start, no user training required, and it's not the only company shooting for this voice is the company working with amazon is aiming for this too, and it's part of what mark hosea, a. Johnson, the researcher we heard from earlier, is trying to do to with the speech accessibility project.
I like to say that if this works, the only thing the notice is that the mistakes that their system was making before is not making as many of those mistakes anymore.
The AI systems that powers voices systems like amazon, the ex a, are trained on data. To make these systems Better at recognizing a typical voices, you have to feed them a lot more. Data of people with a typical voices seems logical, right? But getting this data isn't always so easy. How sga Johnson is the principle investigator on the speech accessibility project.
Our goal really is to make a data abase of speech of people with disabilities that's large enough and diverse enough that IT can make IT possible for tech companies and for universities to have an accessible speech recognized that will work for somebody with disabilities out of the box where they don't have to personalize IT further, because IT already has a really good model of the range of ways in which disability can affect speech.
The speech of accessibility project started in twenty twenty two with a grant to run through twenty twenty four. Some pretty big names are involved. Amazon, google, meta and microsoft are funding the project and providing guidance and feedback.
The project is aiming to compile the voices of a total of two thousand people whose speech is affected by parkinson's, terrible poli dn syndrome, stroke and less so far. Husk and Johnson says it's going well. They started recruiting patients with parkinson's in April of twenty three. And when I spoke to him in november, he said eight hundred people had signed up, though only two hundred eighty of them had speech affected enough by the disease to be useful for the project.
People have told us that it's fun to participate. They seem to, especially like the spontaneous speech prompts, will ask people things like who's your favorite musician or um how do you make breakfast for a group of four people? All kinds of crazy questions. And people have had fun responding to some of those participants.
People with one of those five conditions and who meet the researchers criteria answer prompts into the computer. If they complete IT, they receive a total of one hundred eighty dollars each. Host koba Johnson says the project has built in privacy protections around the data that IT collects. Any university or company that wants to use that data has to be here to terms that the project says.
For example, if a participant ever decides that they don't want their data and the data set anymore, the researchers, using the data that will delete that person's voice from the data set.
The companies I spoke with, hit some similar points about the privacy of the data that day themselves collect, an amazon spokesperson said. IT trains a lexi on real world request. Customers have the option to delete their recordings and transcripts, but the company may retain other records of their almost interactions, katya said.
Google project euphonic allows participants ants to ask that their data be deleted, or they can ask for a copy of their data if they want, and voice at smaller says that IT uses customer data to train its algorithm, but that the company follows quote the highest standards possible for data privacy, security and storage for training speech language models. Hosaka Johnson says the data set the project is assembling already seems promising. In one experiment, researchers tested and out of the box speech recognized on voices from people with parkin. Since they say initially the err rate was twenty percent, but by training the algorithm on the data the team had collected, they say that err rate went down by half.
I think there are a lot more gains to be had from this speech with more clever innovations and algorithms. But even just that very simple step has already reduced the era by a factor of two.
has got what Johnson says. He's hopeful that through effort, like the speech accessibility project getting out of the box, voice assistance to work well for people with a typical speech could happen in the next few years. But beyond that, he says that voice assistance can just be a lot more useful. Maybe in places beyond our call phones and smart speakers.
you might have an AI that's able to move between your device, whatever devices you get the permission to move between and with that voice assistant with you everywhere. I'm hoping that people will be able to use that to do a lot of the things that we use laptops for right now.
But amazon's josh milley says that even though there's more room for voice in the way we interact with our technology, accessibility means never relying too heavily on just one method.
Yes, voice is getting Better. Voice should get Better, but IT should never be the only way. And I think that we're in the process of finding and expLoring and really developing all of those other ways. In addition to voice that we want to interact with our devices.
The future of everything is a production of the wall street journal. Stephanie elegant fit is the editorial director of the future of everything. This episode was produced by knee alexo salla our fact checker is a perna Nathan Michael level and jasa offending our sound designers and root thy music cathode new sop is our supervising producer.
I shall all muslim is our development producer. Scott silly and fly are the deputy editors and philanthropy son is the head of news audio for the wall street journal. Like the show, tell your friends and leave us a five star review on your favorite platform. Thanks for listening.
Amazon q business is the new generative A I assistant from A W S, because many tasks can make business slow, as if waiting through .
mod a help. Luckily.
there's a faster, easier, less messy choice. Amazon q can security understand your business data and use that knowledge to streamline task? Now you can summarize orderly results or do complex analysis in no time.
Q, got this. Learn what amazon q business can do for you at AWS dot com. Slash, learn more.