Happy LLama 3 day folks! After a lot of rumors, speculations, and apparently pressure from the big Zuck himself, we finally can call April 18th, 2024, LLaMa 3 day!
I am writing this, from a lobby of the Mariott hotel in SF, where our annual conference is happening called Fully Connected, and I recorded today's episode from my hotel room. I really wanna shout out how awesome it was to meet folks who are listeners of the ThursdAI pod and newsletter subscribers, participate in the events, and give high fives.
During our conference, we had the pleasure to have Joe Spisak), the Product Director of LLaMa at Meta, to actually announce LLaMa3 on stage! It was so exhilarating, I was sitting in the front row, and then had a good chat with Joe outside of the show ๐
The first part of the show was of course, LLaMa 3 focused, we had such a great time chatting about the amazing new 8B and 70B models we got, and salivating after the announced but not yet released 400B model of LLaMa 3 ๐ฎ
We also covered a BUNCH of other news from this week, that was already packed with tons of releases, AI news and I was happy to share my experiences running a workshop a day before our conference, with focus on LLM evaluations. (If there's an interest, I can share my notebooks and maybe even record a video walkthrough, let me know in the comments)
Ok let's dive in ๐
Happy LLama 3 day ๐ฅ
The technical details
Meta has finally given us what we're all waiting for, an incredibly expensive (2 clusters of 24K H100s over 15 Trillion tokens) open weights models, the smaller 8B one and the larger 70B one.
We got both instruction fine tune and base models, which are great for finetuners, and worth mentioning that this is a dense model (not a mixture of experts, all the parameters are accessible for the model during inference)
It is REALLY good at benchmarks, with the 7B model beating the previous (LLaMa 2 70B) on pretty much all benchmarks, and the new 70B is inching on the bigger releases from the past month or two, like Claude Haiku and even Sonnet!
The only downsides are the 8K context window + non multimodality, but both are coming according to Joe Spisak who announced LLama3 on stage at our show Fully Connected ๐ฅ
I was sitting in the front row and was very excited to ask him questions later!
By the way, Joe did go into details they haven't yet talked about pulblicly (see? I told you to come to our conference! and some of you did!) and I've been live-tweeting his whole talk + the chat outside with the "extra" spicy questions and Joes winks haha, you can read that thread here)
The additional info
Meta has also partnered with both Google and Bing (take that OpenAI) and inserted LLama 3 into the search boxes of Facebook, Instagram, Messenger and Whatsapp plus deployed it to a new product called meta.ai) (you can try it there now) and is now serving LLama 3 to more than 4 Billion people across all of those apps, talk about compute cost!
Llama 3 also has a new Tokenizer (that Joe encouraged us to "not sleep on") and a bunch of new security tools like Purple LLama and LLama Guard. PyTorch team recently released finetuning library called TorchTune is now supporting LLama3 finetuning natively out of the box as well (and integrates Wandb as it's first party experiment tracking tool)
If you'd like more details, directly from Joe, I was live tweeting) his whole talk, and am working at getting the slides from our team. We'll likely have a recording as well, will post it as soon as we have it.
Here's a TL;DR (with my notes for the first time) of everything else we talked about, but given today is LLaMa day, and I still have to do fully connected demos, I will "open source" my notes and refer you to the podcast episode to hear more detail about everything else that happened today ๐ซก
TL;DR of all topics covered:
Meta releases LLama 3 -8B, 70B and later 400B (Announcement), Models), Try it), Run Locally))
Open Source LLMs
Meta LLama 3 8B, 70B and later 400B (X, Blog)
Trained 15T tokens!
70B and 8B modes released + Instruction finetuning
8K context length , not multi modal
70B gets 82% on MMLU and 81.7% on HumanEval
128K vocab tokenizer
Dense model not MoE
Both instruction tuned on human annotated datasets
Open Access
The model already uses RoPe
Bigxtral instruct 0.1 (Blog), Try it))
Instruct model of the best Apache 2 model around
Release a comparison chart that everyone started "fixing"
๐ค Mixtral 8x22B is Mistral AI's latest open AI model, with unmatched performance and efficiencyย
๐ฃ It is fluent in 5 languages: English, French, Italian, German, Spanish
๐งฎ Has strong math and coding capabilities ย
๐ง Uses only 39B parameters out of 141B total, very cost efficient
๐ Can recall info from large documents thanks to 64K token context window
๐ Released under permissive open source license for anyone to use
๐ Outperforms other open models on reasoning, knowledge and language benchmarks ย
๐ Has strong multilingual abilities, outperforming others in 4 languages
๐งช Excellent basis for customization through fine-tuning
New Tokenizer from Mistral (Docs))
Focusing on Tool Use with tokens ๐ฅ
WizardLM-2 8x22B, 70B and 7B (X), HF)
Released it and then pulled it back from HF and Github due to microsoft toxicity not passing
Big CO LLMs + APIs
OpenAI gives us Batch API + Assistants API v2
Batch is 50% cost and win win win
Assistants API V2 - new RAG
new file search tool
up to 10,000 files per assistant
new vector store
Reka gives us Reka Core (X), Try)
Multimodal that understands video as well
20 people team
Video understanding is very close to Gemini
128K context
Core has strong reasoning abilities including for language, math and complex analysis.
32 languages support
HuggingFace ios chat bot now
This weeks Buzz
Me + team led a workshop a day before the conference (Workshop Thread))
Fully Connected in SF was an incredible success, over 1000 AI attendies + Meta AI announcement on stage ๐ฅ
PyTorch new TorchTune finetuning library with first class WandB support (X))
Vision & Video
Microsoft VASA-1 animated avatars (X), Blog))
Amazing level of animation from 1 picture + Sound
Harry Potter portraits are here
They likely won't release this during Election year
Looks very good ,close to EMO but no code
๐บ Videos show faces speaking naturally with head movements and lip sync
๐ฌ Researchers are exploring applications in education, accessibility and more
HuggingFace updates IDEFICS2 8B VLM (X), HF))
Apache 2 license
Competitive with 30B models
12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1)
10x fewer parameters than Idefics 1
Supports image resolution up to 980 x 980+
Better OCR capabilities (thanks to more than 6TB of OCR pre-training data)
Adobe shows Firefly video + SORA support (X))
Voice & Audio
Rewind AI is now Limitless (X))
New service & Brand name
Transcription to you
Hardware device that looks sleek
100hours
Privacy support in cloud
AI Art & Diffusion & 3D
Stability - Stable Diffusion 3 is here
Available via API only
Partnered with Fireworks HQ for the release
Needs stability AI membership to use / access $$
Big step up in composition and notorious issues like hands, "AI faces" etc. (from
Seems to prefer simpler prompts.
Way more copyright-friendly. It's hard to get any kind of brands/logos.
Text is amazing.
Others
New AIrChat with amazing transcription is out, come join us in our AI corner there)
Humane AI pin was almost killed by MKBHD review
Rabbit reviews incoming
That's all for this week, next week we have an amazing guest, see you then! ๐ซก This is a public episode. If youโd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe)