Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee

2022/12/13

Neural Search Talks — Zeta Alpha

Frequently requested episodes will be transcribed first

Chapters

Shownotes Transcript

Marzieh Fadaee — NLP Research Lead at Zeta Alpha — joins Andrew Yates and Sergi Castella to chat about her work in using large Language Models like GPT-3 to generate domain-specific training data for retrieval models with little-to-no human input. The two papers discussed are "InPars: Data Augmentation for Information Retrieval using Large Language Models" and "Promptagator: Few-shot Dense Retrieval From 8 Examples".

InPars: https://arxiv.org/abs/2202.05144

Promptagator: https://arxiv.org/abs/2209.11755

Timestamps:

00:00 Introduction

02:00 Background and journey of Marzieh Fadaee

03:10 Challenges of leveraging Large LMs in Information Retrieval

05:20 InPars, motivation and method

14:30 Vanilla vs GBQ prompting

24:40 Evaluation and Benchmark

26:30 Baselines

27:40 Main results and takeaways (Table 1, InPars)

35:40 Ablations: prompting, in-domain vs. MSMARCO input documents

40:40 Promptagator overview and main differences with InPars

48:40 Retriever training and filtering in Promptagator

54:37 Main Results (Table 2, Promptagator)

1:02:30 Ablations on consistency filtering (Figure 2, Promptagator)

1:07:39 Is this the magic black-box pipeline for neural retrieval on any documents

1:11:14 Limitations of using LMs for synthetic data

1:13:00 Future directions for this line of research

Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee

Neural Search Talks — Zeta Alpha

Chapters

Introduction

Background and Journey of Marzieh Fadaee

Challenges of Leveraging Large LMs in Information Retrieval

InPars: Motivation and Method

Vanilla vs GBQ Prompting: What's the Difference?

Evaluation and Benchmark: How Does InPars Perform?

Main Results and Takeaways from InPars

Ablations: Prompting, In-Domain vs. MSMARCO Input Documents

Promptagator: Overview and Main Differences with InPars

Retriever Training and Filtering in Promptagator

Main Results from Promptagator

Ablations on Consistency Filtering: Is It the Magic Black-Box Pipeline?

Limitations of Using LMs for Synthetic Data

Future Directions for This Line of Research

Shownotes Transcript

Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee 01:16:14 Share

Neural Search Talks — Zeta Alpha

Chapters

Introduction

Background and Journey of Marzieh Fadaee

Challenges of Leveraging Large LMs in Information Retrieval

InPars: Motivation and Method

Vanilla vs GBQ Prompting: What's the Difference?

Evaluation and Benchmark: How Does InPars Perform?

Main Results and Takeaways from InPars

Ablations: Prompting, In-Domain vs. MSMARCO Input Documents

Promptagator: Overview and Main Differences with InPars

Retriever Training and Filtering in Promptagator

Main Results from Promptagator

Ablations on Consistency Filtering: Is It the Magic Black-Box Pipeline?

Limitations of Using LMs for Synthetic Data

Future Directions for This Line of Research

Shownotes Transcript

Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee