In this episode, Robert and Haley explore the latest breakthrough in AI multimodal models: Leopard, a new AI developed to tackle complex, text-rich image tasks. Designed by researchers from the University of Notre Dame, Tencent AI Seattle Lab, and UIUC, Leopard is the first model to truly excel at understanding and reasoning across multiple text-heavy images, like presentation slides, web snapshots, and scanned documents.
Join us as we break down how Leopard’s adaptive high-resolution multi-image encoding and innovative pixel shuffling set it apart from traditional models. Unlike its predecessors, Leopard can keep high-resolution details without sacrificing accuracy, meaning it’s primed for real-world uses like analyzing multi-page reports, data charts, and visual presentations. We discuss:
Get ready to dive into how this model reshapes the landscape for AI in business, education, and research. Leopard just might be the game-changer multimodal AI has been waiting for!