ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)

2024/9/27

Neural Search Talks — Zeta Alpha

Frequently requested episodes will be transcribed first

Chapters

Shownotes Transcript

In this episode of Neural Search Talks, we're chatting with Manuel Faysse, a 2nd year PhD student from CentraleSupélec & Illuin Technology, who is the first author of the paper "ColPali: Efficient Document Retrieval with Vision Language Models". ColPali is making waves in the IR community as a simple but effective new take on embedding documents using their image patches and the late-interaction paradigm popularized by ColBERT. Tune in to learn how Manu conceptualized ColPali, his methodology for tackling new research ideas, and why this new approach outperforms all classic multimodal embedding models. A must-watch episode!

Timestamps: 0:00 Introduction with Jakub & Manu 4:09 The "Aha!" moment that led to ColPali 7:06 Challenges that had to be solved 9:16 The main idea behind ColPali 13:20 How ColPali simplifies the IR pipeline 15:54 The ViDoRe benchmark 18:23 Why ColPali is superior to CLIP-based retrievers 20:41 The training setup used for ColPali 24:00 Optimizations to make ColPali more efficient 29:00 How ColPali could work with text-only datasets 31:21 Outro: The next steps for this line of research

ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)

Neural Search Talks — Zeta Alpha

Chapters

Introduction with Jakub & Manu

What was the 'Aha!' moment that led to ColPali?

What challenges did you face while developing ColPali?

What is the main idea behind ColPali?

How does ColPali simplify the IR pipeline?

What is the ViDoRe benchmark and how does it relate to ColPali?

Why is ColPali superior to CLIP-based retrievers?

What training setup was used for ColPali?

What optimizations were made to make ColPali more efficient?

Can ColPali work with text-only datasets?

What are the next steps for this line of research?

Shownotes Transcript

ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse) 34:48 Share

Neural Search Talks — Zeta Alpha

Chapters

Introduction with Jakub & Manu

What was the 'Aha!' moment that led to ColPali?

What challenges did you face while developing ColPali?

What is the main idea behind ColPali?

How does ColPali simplify the IR pipeline?

What is the ViDoRe benchmark and how does it relate to ColPali?

Why is ColPali superior to CLIP-based retrievers?

What training setup was used for ColPali?

What optimizations were made to make ColPali more efficient?

Can ColPali work with text-only datasets?

What are the next steps for this line of research?

Shownotes Transcript

ColPali: Document Retrieval with Vision-Language Models only (with Manuel Faysse)