cover of episode S7E9 [EN] 科技周会-AI Training, Data, Energy

S7E9 [EN] 科技周会-AI Training, Data, Energy

2024/5/21
logo of podcast 创意玩具@小宇宙| AI/科技/娱乐

创意玩具@小宇宙| AI/科技/娱乐

Chapters

The discussion explores the challenges of developing AI without sufficient data, including the potential use of high-quality data and curricular learning methods.

Shownotes Transcript

Hi there, I'm Maggie. We've got something really cool and looking for you to join us, our gratitude journal project. Now I know you are thinking another journal, but hold up. Whether you are looking to reduce stress, put your mind or just find a moment of peace in your daily life,

This AI-powered graduate journal like having a best friend who's always there to remind you of the good stuff. If you're someone who's into tech, loves the idea of self-growth, or just want to chill out with some AI magic, then you are in the right place. We are in beta, so you are not just a user, you are a pioneer. You are going to shape the future of this project and tell us what works, what doesn't, and what you love to see next.

Join us by emailing [email protected] and let's get this conversation started. We can't wait to have you on board. Alright, so now we can shift gear a little bit to the next section which is all AI oriented. So I feel like the few news sources we found here,

stories are actually kind of intertwined together. So I guess we'll just start chatting and if we found anything worth mentioning, we'll just mention it. So no need to go through story by story here. So I guess one of the common topics we found interesting is just the need for training data and

also energy in the next few years or decades for AI. I know Tracy, well, you are definitely like a professional working in this field right now. So what are your thoughts on AI, especially as regarding input data here? Yeah, so it does not occur to me until I read the Wall Street Journal. It says that we are running out of data to train large language models.

So I remember there is a law to calculate how much data you need based on the size of your model. And so like several years ago, when we think about internet scale data, we think that's huge. But now we are going to run out of data in two or three years. So how can we develop better AI without data? Any ideas?

develop better AIs without being a wow. Just imagine you're a student and you don't have any more books to read. How can you become a better student? Oh, that's a very good question. I know like some companies like

They've already run into this problem. I think they're starting using data generated by their AI models to retrain their models. But that's probably not a good approach to this, right? Yeah, because it may cause model collapse if a model's input is its output. So the article mentions that several companies are trying to change the way they train the AI.

So perhaps you have heard about the famous dataset Common Core. It across most of the public websites, but actually the quality is pretty low. So if you can somehow just use high quality data, you can achieve the same performance with much smaller datasets.

And it's also, there is also another method called curricular learning. So imagine if you have a teacher and you design a curricular and the students will learn more effectively. But I think the performance gain from this method is not validated yet. So it's still an open ended question. And I think another piece of news relevant to this is that scale AI has reached the

has reached a new high of valuation. It's $13 billion now. So, Scale AI is trying to help companies annotate their data. So they pay people, like real human beings, to annotate the data. But because they employ people in third world countries, so it's pretty cheap. And a lot of companies use their service, even like US Army, they also use their service. But

I'm not sure if we can generate more data from crowdsourcing because high quality data requires people who are highly skilled. So for example, if you want to train an AI to write code, then you need data like Stack Overflow, where a community full of software engineers. So I'm thinking about, do you think in the future, some people's job is to create data for AI?

but instead of those unskilled workers like an avid mechanic turks they are more educated like some college grads in third world countries wow yeah i've definitely read uh topics uh stories related to this topic uh yeah as you just mentioned i think i have learned that some countries i couldn't remember exactly maybe like uh like

Kenya in Africa at least it's those few like countries in Africa that they're one of their official languages in English because you know there's no like cost of adaptation so a lot of like young educated people there they do business like you know yeah annotating especially English written materials for AI companies they also do some like outsource like

document writing work. I think back in China, because you know China they are a little bit ahead in terms of like you know surveillance cameras and computer vision so I think one of the rising industry is people just hire like I think a lot of like stay at home like

Moms, they're not essentially like highly educated, but they are like educated enough to at least, you know, draw boxes around objects in those like surveillance camera videos and yeah, produce some good quality data because, you know, those actually I thought valuable for like, I guess, computer vision training.

Yeah, but I imagine we don't need much education to draw bounding boxes. But we may need a lot more education to write software or write documents. Yeah, I think data is definitely one of the pressing issues here. Another one I've learned is, I guess before we move to energy, I would also like to talk about input data in terms of

Music AI, have you heard about Suno AI recently? No, I haven't. Tell me what it does. Yeah, suno.ai, that's their web link. So they are a music generating AI. Yeah, I just actually bought their monthly membership. I think it's about $10. It's pretty amazing. Yeah, I can send you a link of some of the works I generated. It's basically you type

one sentence of prompt and generates like a two minute clip of music with or without lyrics. And for the lyrics, you can also choose to either auto AI generate or you can just type the lyrics and have it sing the lyrics.

I would say the quality is good enough to be used as background music for YouTube or like a song with lyrics played at a shopping mall. So it's definitely past the qualifying score there. Wow, does it compensate these artists?

That's actually the big issue I've been researching on. Seems like there's been an article on Rolling Stone, the music entertainment magazine. They tried to interview the founders but they won't disclose whether they actually paid for their screening music or not. So I would guess the answer is most likely no or they just used it with the

I guess licensed legally but they should have paid the artist more so yeah they're definitely like ripping off the artist in some way. Yeah I just feel what they do to artists is like open an eye to journalists so yeah but I really want to try that because I don't know how to write a song and it'd be so special to have a song about your personal stories, personal memories.

Yeah, it's pretty amazing the quality. It also supports, I think, most like main languages in the world like Russian, English, Chinese, Japanese, Korean, pretty much all the major languages. Yeah, I guess last thing, last topic, spent about two minutes on this, just the need for great energy to train the data.

Here's my other recommendation of podcasts. So this one is called BG Squared, I believe. It's hosted by two venture capitalists.

Well, one of them is pretty famous, like Ben Gurley. I think he's the main investor and cultivator of Uber. So yeah, it's a pretty new podcast. They've just been talking about all sorts of random tech topics. The most recent one I found interesting is that

They found out that the US is probably behind in terms of energy build-out compared to some other major players like China or even probably Russia in terms of new nuclear plants because you definitely need those stable energy sources to provide electricity to all the data centers.

Yeah, I've also done some research around my own living area. So Virginia is, I guess, slightly above average in terms of renewable energy. So they spent five years, I believe, building two wind turbines and then there are another over 100 offshore wind turbines to follow. So I guess in total it would be a little bit over two

gigawatts of power. I've also done some research on neighboring areas like I think Maryland, I think they are also a little bit ahead in terms of wind power. But I think for nuclear power, it's definitely slow. There's only like over 10 nuclear plants planned for the next few decades right now. Yeah. So what do you think? Do you believe in nuclear fusion? I think some ultimate has invested companies in this area.

True, nuclear. I know Bill Gates is also the other big investor in nuclear energy. He actually got elected by the Chinese Academy of Engineering, not for his accomplishment in computer science, but for his accomplishment in nuclear engineering or his investment. Nuclear, especially fusion, I think it's at least still like decades away from us.

technically, I guess solar energy is a good, cheap way to utilize the nuclear energy from the sun. So I think solar is probably the way to go in the next 10, 20 years. But I know there's been some beef between US and China right now, especially I think the Janet Yellen, the US Secretary of

of uh finance I believe uh yeah she's been like accusing of kind of building like cheap EVs and solar panels recently so yeah it's been an interesting story going on recently yeah so for the world cheaper solar panels are better for renewable energy development but for like individual countries since you want to protect your own companies true true

Yeah, just to wrap up the whole episode here. Yes, speaking of protection, yeah, I know the CEO of like TikTok who's been like questioned at the US Congress for the past like few months. He was actually the also the CFO for Xiaomi. So he basically has a Singaporean he helped both Xiaomi and

yes buy downs to be uh approaching like a ipo uh in his like whole career so pretty amazing backstory there but is tick tock going public or uh that has been a yeah plan for the past few years but now it's probably less likely let's see yeah all right so this is our uh

main recap of all the March and early April tech news. Yeah, so I guess when we meet again in May likely, we will probably be somewhere probably like sunny or I don't know California or somewhere else. So we're excited to talk to our audience again then. So yeah, thanks for everyone tuning. If you have any questions, feel free to

write an email to us. I will leave my email address in the show notes if you have any interesting stories or questions to share. That's all. Thanks again to my amazing co-host, Tracy. Thank you. Thanks, Roger. Thank you. Goodbye, everyone. Goodbye.