cover of episode SQL Meets Vector Search with Linpeng Tang of MyScale

SQL Meets Vector Search with Linpeng Tang of MyScale

2024/4/2
logo of podcast Software Huddle

Software Huddle

Frequently requested episodes will be transcribed first

Shownotes Transcript

Welcome back to an episode where we're talking Vectors, Vector Databases, and AI with Linpeng Tang, CTO and co-founder of MyScale. MyScale is a super interesting technology. They're combining the best of OLAP databases with Vector Search. The project started back in 2019 where they forked ClickHouse and then adapted it to support Vector Storage, Indexing, and Search.

The really unique and cool thing is you get the familiarity and usability of SQL with the power of being able to compare the similarity between unstructured data.

We think this has really fascinating use cases for analytics well beyond what we're seeing with other vector database technology that's mostly restricted to building RAG models for LLMs. Also, because it's built on ClickHouse, MyScale is massively scalable, which is an area that many of the dedicated vector databases actually struggle with.

We cover a lot about how vector databases work, why they decided to build off of ClickHouse, and how they plan to open source the database.

Timestamps

02:29 Introduction

06:22 Value of a Vector Database

12:40 Forking ClickHouse

18:53 Transforming Clickhouse into a SQL vector database

32:08 Data modeling

32:56 What data can be Vectorized

38:37 Indexing

43:35 Achieving Scale

46:35 Bottlenecks

48:41 MyScale vs other dedicated Vector Databases

51:38 Going Open Source

56:04 Closing thoughts