CIIR Talk Series: Danqi Chen (Princeton University), From RAG to Long-context Models: Benchmarks and Model Developments
Content
Speaker
Title
From RAG to Long-context Models: Benchmarks and Model Developments
Abstract
Retrieval-augmented generation (RAG) is crucial for enhancing language models by incorporating external knowledge, enabling more accurate and up-to-date responses. Effective RAG systems require long-context models that can efficiently process and synthesize retrieved information. In this talk, I will present our recent research on: (1) benchmarking the long-context capabilities of LLMs, including RAG as a core application (ALCE, HELMET, BRIGHT) and the ability to follow structured procedures and generate long, coherent outputs (LongProc); and (2) developing long-context models through careful data engineering and evaluation (ProLong), which achieves state-of-the-art performance with significantly lower computational cost than industry standards. I will share insights from these studies and discuss key challenges and open research questions in this space.
Bio
Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP group. She is also an Associate Director of Princeton Language and Intelligence. Her recent research focuses on training, adapting, and understanding large language models, especially with the goal of making them more accessible to academia. Before joining Princeton, Danqi was a visiting scientist at Facebook AI Research. She received her Ph.D. from Stanford University (2018) and her B.E. from Tsinghua University (2012), both in Computer Science. Her research was recognized by a Sloan Fellowship, an NSF CAREER award, a Samsung AI Researcher of the Year award, and outstanding paper awards from ACL and EMNLP.
About
The CIIR Talk Series is an initiative for researchers and practitioners working on information retrieval and related disciplines to present their work.
Subscribe to mailing list by sending an email to ciir-talks-request [at] cs [dot] umass [dot] edu (ciir-talks-request[at]cs[dot]umass[dot]edu) with "subscribe" as the email subject (without the quotation marks) for Zoom link/passcode notifications, or click here for Zoom link and reach out to zamani [at] cs [dot] umass [dot] edu (subject: CIIR%20Talks%20Passcode) (Hamed Zamani) for the passcode.