Papers Read on AI

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

2024/11/1

Enabling large language models to utilize real-world tools effectively is crucial for achieving embo

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

2024/10/31

GPT-4o, an all-encompassing model, represents a milestone in the development of large multi-modal la

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

2024/10/30

Recent advances in latent diffusion-based generative models for portrait image animation, such as Ha

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

2024/10/18

This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matchin

LightRAG: Simple and Fast Retrieval-Augmented Generation

2024/10/17

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating ext

Aria: An Open Multimodal Native Mixture-of-Experts Model

2024/10/16

Information comes in diverse modalities. Multimodal native AI models are essential to integrate real

AgentKit: Structured LLM Reasoning with Dynamic Graphs

2024/10/15

We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offe

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

2024/10/14

Document understanding is a challenging task to process and comprehend large amounts of textual and

Diffusion Models are Evolutionary Algorithms

2024/10/10

In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary a

Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering

2024/10/9

The potential effectiveness of counterspeech as a hate speech mitigation strategy is attracting incr

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

2024/10/8

Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reaso

Internal Consistency and Self-Feedback in Large Language Models: A Survey

2024/10/7

Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To addres

On the Diagram of Thought

2024/10/2

We introduce Diagram of Thought (DoT), a framework that models iterative reasoning in large language

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

2024/10/1

The increasing demand for high-quality 3D assets across various industries necessitates efficient an

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

2024/9/30

Tuning-free personalized image generation methods have achieved significant success in maintaining f

On the limits of agency in agent-based models

2024/9/24

Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a colle

Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization

2024/9/23

In many modern LLM applications, such as retrieval augmented generation, prompts have become program

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

2024/9/23

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

2024/9/21

Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

2024/9/20

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method

Episodes

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

LightRAG: Simple and Fast Retrieval-Augmented Generation

Aria: An Open Multimodal Native Mixture-of-Experts Model

AgentKit: Structured LLM Reasoning with Dynamic Graphs

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Diffusion Models are Evolutionary Algorithms

Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Internal Consistency and Self-Feedback in Large Language Models: A Survey

On the Diagram of Thought

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

On the limits of agency in agent-based models

Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

PuLID: Pure and Lightning ID Customization via Contrastive Alignment