cover of episode EP16: Deep Big Data Learning (DBDL) a theory and a question

EP16: Deep Big Data Learning (DBDL) a theory and a question

2024/10/22
logo of podcast Geopolitics Unplugged

Geopolitics Unplugged

Frequently requested episodes will be transcribed first

Shownotes Transcript

Summary: In this episode we explore the concept of Deep Big Data Learning (DBDL), a new form of artificial intelligence (AI) that utilizes large datasets and advanced algorithms to uncover hidden relationships and patterns. We begin by outlining the evolution of AI from supervised learning to the current "sequence transduction" approach exemplified by Google's "transformer" and OpenAI's GPT models. These systems, trained on vast amounts of text data, have shown remarkable capabilities in language generation and other tasks. We then explore a proposed DBDL as a further advancement, suggesting that applying similar techniques to Big Data, which encompasses a broader range of data types, could lead to transformative discoveries, particularly in healthcare. While acknowledging the potential benefits, the author also highlights the ethical implications of this powerful new technology.   Questions to consider as you read/listen: 

  • What are the key distinctions between Deep Big Data Learning and traditional supervised learning?
  • How could Deep Big Data Learning be used in healthcare to advance disease prevention and treatment?
  • What are the potential ethical concerns associated with the development and application of Deep Big Data Learning?

  Long format:  Deep Big Data Learning (DBDL).

I in no ways mean to be rude, but I think in order for me to discuss what I mean by Deep Big Data Learning (DBDL ***N.B., I had to change what I was trying to convey from what I originally wrote as “Deep Data Learning” as that concept already exists and is not what I was looking to imagine or discuss), it might make sense to make sure that we are on the same page with the current most prevalent means of computer learning.

There was the old-school way of learning which is supervised learning which requires the computer to be provided with the correct answer for each and every training point. Very labor intensive, very slow and domain restrictive. The alternative is “sequence transduction” a sort of self surprised non-human based learning that requires a tremendous amount of data from different sources and an astronomical amount of computing power.   In 2017 with the introduction off the Google “transformer” we had the birth so to speak of this second form. It trained on huge quantities of text available on the Internet of Things. If we keep things simple this is what we Ean by deep language learning (DLL). After Google came Open AI’s GPT (GPT stands for generative pre-trained transformers). GPT-3 was trained on 45 terabytes of text which would take 500,000 lifetimes for humans to read. And since 2020 with this initial training, it has expanded ten times every year, adding capabilities at unbelievable exponential pace. As best as can be computed GPT-3 produced a gigantic model with 175 billion parameters.

If this sounds to you to be amazing, then wait until you consider what is now going on with Generative Adversarial Network (GAN) which is a deep learning architecture that takes all of the above but trains two neural networks to compete against each other to generate more authentic new data from a given training dataset. The n continues to generate more adversarial networks off of the networks and on and on. Some suggest it is evolutionary. GAN is thought of as an important step towards that Holy Grail of AGI. So yes, as Mr. Kilgore writes with an exclamation mark, it is truly amazing.

The DLL with GAN is largely built on information data sets from the Internet of Things.

What is not necessarily  or wholly on the IOT, Big Data.

I am wondering what this type of learning, referring to sequence transduction (unsupervised) and GAN, were to involve the dataset is not text from the Internet of Things, but instead the very intimate dataset that exists in Big Data collections. This is what I call Deep Big Data Learning.

The GAN and sequence transduction coupled with the CPU processing and chip power could discover relationships the likes of which we as humans could never discover. This application set is not just in the form of advertising, but think of the possibilities especially in healthcare. Disease prevention. Finding sources of carcinogens that are heretofore not known. Genomics. Therapies. Pretty wild stuff.

Like all new technology it has potential for abuse. When humans discovered fire, it was both good and bad. Good heat and light. Bad fire that burns down house and people. Same same with AI. I just got to thinking about DBDL. Thanks for going down the rabbit hole with me. All substantive comments welcome. Cheers and high fives. Get full access to GeopoliticsUnplugged Substack at geopoliticsunplugged.substack.com/subscribe)