Hey hey everyone, how are you this fine ThursdAI? 👋 I’m gud thanks for asking!
I’m continuing my experiment of spilling the beans, and telling you about everything we talked about in advance, both on the pod and in the newsletter, so let me know if this is the right way to go or not, for the busy ones it seems that it is. If you don’t have an hour 15, here’s a short video recap of everything we chatted about:
ThursdAI - Jan 11 2024 TL;DR
TL;DR of all topics covered + Show notes
Open Source LLMs
🔥 Donut from Jon Durbin is now top of the LLM leaderboard (X), HF), Wolframs deep dive and scoring))
OpenChat January Update - Best open source 7B LLM (X), Hugging Face))
Our friends at NousResearch announce a seed round of 5.2M as their models pass 1.2 million downloads (X))
Argilla improved (Distillabeled?) the DPO enhanced Neural Hermes with higher quality DPO pairs (X))
New MoEs are coming out like hotcakes - PhixTral and DeepSeek MoE (X), Omar Thread), Phixtral Thread))
Microsoft makes Phi MIT licensed 👏
Big CO LLMs + APIs
OpenAI adds personalization & team tiers (Teams announcement))
OpenAI launches GPT store (Store announcement), Store link))
Mixtral medium tops the LMsys human evaluation arena, is the best LLM overall after GPT4 👏 (X))
Hardware
Rabbit R1 is announced, $200/mo without a subscription, everybody has a take (X))
This weeks Buzz from Weights & Biases
Hackathon with Together, Langchain and WandB (and ME!) this weekend in AGI house (X), Signup))
Video
Bytedance releases MagicVideo-V2 video gen that looks great and passes Pika labs in human tests (X))
AI Art & Diffusion & 3D
Luma launched their online version of Genie and it's coming to the API (X))
Show notes and links mentioned
MergeKit (github))
Jon Durbins Contextual DPO dataset (HuggingFace))
Phixtral from Maxime Lebonne (X), HuggingFace))
WandGPT - out custom Weights & Biases GPT (GPT store))
Visual Weather GPT by me - https://chatg.pt/artweather)
Ask OpenAI to not train on your chats - https://privacy.openai.com/policies)
AI Hardware
It seems that the X conversation had a new thing this week, the AI hardware startup Rabbit, showcased their new $200 device (no subscriptions!) at CES and everyone and their mom had an opinion! We had quite a long conversation about that with (his first time on ThursdAI 👏) as we both pre-ordered one, however there were quite a few red flags, like for example, GPUs are costly, so how would an AI device that has AI in the cloud just cost a 1 time 200 bucks??
There were other interesting things they showed during the demo, and I’ll let you watch the full 30 minutes) and if you want to read more, here’s a great deeper dive into this from .
UPDATE: Ss I’m writing this, the CEO of Rabbit (who’s also on the board of Teenage Engineering, the amazing company that designed this device) tweeted that they sold out the initial first AND second batch of 10K unites, netting a nice $2M in hardware sales in 48 hours!
Open Source LLMs
Mixtral paper dropped (ArXiv, Morgans take)
Mistral finally published the paper on Mixtral of experts, the MoE that's the absolutel best open source model right now, and it's quite the paper. Nisten did a full paper reading with explanations on X space, which I co-hosted and we had almost 3K people tune in to listen. Here's the link) to the live reading X space by Nisten).
And here's some notes courtecy Morgan McGuire) (who's my boss at WandB btw 🙌)
Strong retrieval across the entire context window
Mixtral achieves a 100% retrieval accuracy regardless of the context length or the position of passkey in the sequence.
Experts don't seem to activate based on topic
Surprisingly, we do not observe obvious patterns in the assignment of experts based on the topic. For instance, at all layers, the distribution of expert assignment is very similar for ArXiv papers (written in Latex), for biology (PubMed Abstracts), and for Philosophy (PhilPapers) documents.
However...
The selection of experts appears to be more aligned with the syntax rather than the domain
**Datasets - **No info was provided to which datasets Mixtral used to pretrain their incredible models 😭
Upsampled multilingual data
Compared to Mistral 7B, we significantly upsample the proportion of multilingual data during pretraining. The extra capacity allows Mixtral to perform well on multilingual benchmarks while maintaining a high accuracy in English
Mixtral Instruct Training
*We train Mixtral – Instruct using supervised fine-tuning (SFT) on an instruction dataset followed by Direct Preference Optimization (DPO) on a paired feedback dataset and *was trained on @CoreWeave)
Jon Durbin Donut is the 🤴 of open source this week
6 of the top 10 are donut based models or merges of it. If you remember Auroborous, Donut includes that dataset, and there are two varieties there, the DPO and the non DPO versions of Bagel, including two merges from Cloudyu, which are non trained merges with mergekit, based on Donut. Jon pro tip for selecting DPO vs Non DPO models is
FYI, the DPO version is more factual, truthful, better at math, etc., but is not great for RP, creative writing, etc. Use non-DPO for those tasks!
Donut includes an impressive amount of dataset mixed together, which are all linked from the model card but here they are:
"ai2_arc, airoboros, apps, belebele, bluemoon, boolq, capybara, cinematika, drop, emobank, gutenberg, lmsys_chat_1m, mathinstruct, mmlu, natural_instructions, openbookqa, pippa, piqa, python_alpaca, rosetta_code, slimorca, spider, squad_v2, synthia, winogrande, airoboros 3.1 vs airoboros 2.2.1, helpsteer, orca_dpo_pairs"
Jon also shared his end of the year WandB report nad has trained a whopping 917 models this year for a total of ~2500 hours and is in the top 10% of the top active users (among 800K or so users)
I didn't know that Jon is going to join, but was so happy that he joined the live recording that we ended up chatting for 20 minutes, and there was so many nuggets in that conversation, about how to prepare DPO datasets, which other ones Jon has been releasing, and just a bunch more gold, that I decided to CUT that out and post it as a separate special deepdive episode that's going to get released on the Sunday special. Stay tuned for that!
Nous Research announces $5.2 million funding seed round as they cross 1.1 million model downloads on the hub
Congrats to Karan, Emozilla, Teknium, Bowen, Shivani and the rest of the Nous team on this great news! 👏 We expect to hear more from them in the coming year, with a consistent commitment to open source, keep open sourcing the best models, and the upcoming Forge news!
With investors like Balaji, OSS capital, Vipul from Together, Nous completes the $5.2M seed round, and we had Karan (one of the co-founders of Nous) on the pod to chat to use about what they are planning to do with that money and what are their continuous commitments to open source!
In addition, they just recently passed 1.1 million downloads on the hub with Nous-Hermes-2-34B being their best model! 🤴
OpenChat Jan update becomes the leading open source 7B model (X, Hugging Face))
This update mainly enhanced training methodology, in-context learning & coding skills, outperforming the last 1210 release on 7 out of 8 benchmarks! and scores **71.3 **on HumanEval, 65.8% on MMLU 👏
The previous version of OpenChat trails just behind OpenHermes on the human evals on Lmsys arena, but both are incredible 7B models.
Argilla
Argilla used their Distilabel tool to build a preference dataset from ratings and critiques of AI response pairs, taking around 3 hours
The original dataset assumed the GPT-4/3.5 responses were always best, but Argilla found this was not always the case
Their dataset confirmed 4,000 pairs had the same rating, 7,000 pairs were unchanged, and **2,000 times the rejected response was preferred**
Improving existing DPO datasets with higher quality pairs is important for model fine-tuning
They are releasing an improved version of the popular Orca Pairs DPO dataset from Intel, and a new OpenHermes model outperforming baselines with 54% fewer DPO pairs
Big CO LLMs + APIs
OpenAI has a big week, launches GPTs store and team pro accounts (Blog))
Things of note about the store:
My GPTs are getting feedback and crossed 10K chats , was #6 on lifestyle and the disappeared, but has gained 2x more chats in 24 hours since the store has launched!
Discoverability is great, trending GPTs are shown clearly, and folks are getting a lot of exposure
Copycats already started copying a bunch of the great GPTs, see this example of what happens when you search for Gymstreak, most of the top GPTs are already being copy-catted.
Team accounts:
$25/mo per user for annual plans and at least 2 teams
The biggest confusion was from folks who didn't understand that OpenAI trains on Pro conversations, and there's an option to Opt-out!
This weeks Buzz (What I learned with WandB this week)
Weights and Biases (and ME!) are going to AGI house to lead a Rag vs Finetune hackathon with cool prizes!
There's still time to RSVP, will incredible guests speakers, this Hackathon is organized together with... LangChain, TogetherCompute and AGI house - If you're in the SF area, and you wanna hack on some cool RAG things and get awesome prizes (and meet me!) join the waitlist here https://partiful.com/e/AlntdLtxh9Jh1J6Pcsma)
Vision & Video
Luma released GENIE on Web and IOS, if you remember, we covered the GENIE text-to-3d model they first released on discord a while ago, and now it's incorporated into the luma website, and is significantly higher quality 3D assets.
The generations are free for now, and they look awesome! Here are some of mine, I created a Bee holding a Wand (get it? WandB? 😆) and a polish bear (internal joke) and they look so cool!
Friend of the pod and recent LUMA hire Arthur Islamov) jumped on and also told us that this is coming to the API, so developers would be able to automate asset creation and generate tons of 3D objects programmatically, and use cool prompt techniques to make sure they are a bit better every time maybe? Great news!
AI Art & Diffusion
Bytedance announces MagicVideo-V2 (Arxiv), Project))
We didn't get anything besides quite a few cherry picked videos and a paper, so we can't use this yet, but wow some of these videos look incredible!
MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale
Lastly, I had the greatest time to interview my new friend João Moura, the creator of Crew AI, which been popping off, was the #1 trending on Github and #2 of the day on Product hunt, and is essentially an AI framework that lets you create a crew of AI agents to do tasks for you. I will be polishing up that conversation and post it together with the deep dive with Jon, so stay tuned, but here’s a sneak preview of how cool this is and expect that episode to drop soon! This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe)