Keeping you up to date with the latest trends and best performing architectures in this fast evolvin
Enabling large language models to utilize real-world tools effectively is crucial for achieving embo
GPT-4o, an all-encompassing model, represents a milestone in the development of large multi-modal la
Recent advances in latent diffusion-based generative models for portrait image animation, such as Ha
This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matchin
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating ext
Information comes in diverse modalities. Multimodal native AI models are essential to integrate real
We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offe
Document understanding is a challenging task to process and comprehend large amounts of textual and
In a convergence of machine learning and biology, we reveal that diffusion models are evolutionary a
The potential effectiveness of counterspeech as a hate speech mitigation strategy is attracting incr
Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reaso
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To addres
We introduce Diagram of Thought (DoT), a framework that models iterative reasoning in large language
The increasing demand for high-quality 3D assets across various industries necessitates efficient an
Tuning-free personalized image generation methods have achieved significant success in maintaining f
Agent-based modeling (ABM) seeks to understand the behavior of complex systems by simulating a colle
In many modern LLM applications, such as retrieval augmented generation, prompts have become program
We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method
Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby