This podcast introduces MMSEARCH, a new benchmark designed to evaluate how well large multimodal models (LMMs) can function as AI-powered search engines that understand both text and images. The authors argue that existing AI search engines are limited by their focus on text-only settings, neglecting the wealth of information found in images and the way text and images are combined on websites. To address this, they created MMSEARCH, a dataset of 300 diverse search queries spanning 14 subfields, ensuring the answers cannot be found within the training data of current LMMs. They also propose MMSEARCH-ENGINE, a pipeline that allows any LMM to be evaluated on its ability to perform three key tasks involved in searching: reformulating user queries into search engine-friendly formats, ranking the relevance of retrieved websites, and summarizing the answer from the most relevant webpage