Information Retrieval Meets Large Language Models Workshop
on the Web Conference 2024 (WWW'24)
Singapore, Monday 13 - Friday 17 May 2024

Summary

The advent of large language models (LLMs) presents both opportunities and challenges for the information retrieval (IR) community. On one hand, LLMs will revolutionize how people access information, meanwhile the retrieval techniques can play a crucial role in addressing many inherent limitations of LLMs. On the other hand, there are open problems regarding the collaboration of retrieval and generation, the potential risks of misinformation, and the concerns about cost-effectiveness. To seize the critical moment for development, it calls for the joint effort from academia and industry on many key issues, including identification of new research problems, proposal of new techniques, and creation of new evaluation protocols. It has been one year since the launch of ChatGPT in November last year, and the entire community is currently undergoing a profound transformation in techniques. Therefore, this workshop will be a timely venue to exchange ideas and forge collaborations.

Schedule

Welcome to our workshop! We are pleased to host four distinguished researchers who will deliver keynote presentations on cutting-edge technologies in large language models and information retrieval. Additionally, authors of eight accepted papers will provide oral presentations. We eagerly anticipate your participation as we explore the development and future prospects of information retrieval technology in the era of large language models. Join us for an engaging discussion on this dynamic field! The full schedule is available at the official websit of WWW 2024.

Keynote

  • Nearest Neighbor Search on High-Dimensional Vector Data
    • Time: May 13, 2024, 9:30 AM – 10:00 AM
    • Speaker: Cheng Long, Associate Professor, Nanyang Technological University
    • Abstract: Vector data is becoming ubiquitous in the era of deep learning and generative AI. A fundamental operator on vector data is the approximate K nearest neighbor (AKNN) search, which retrieves from a database of vectors those that are close to a query vector. AKNN has also been used as a popular information retrieval method. Many algorithms have been developed for AKNN. We observe that in high-dimensional space, the time consumption of nearly all AKNN algorithms is dominated by that of the distance comparison operations (DCOs). For each operation, it scans full dimensions of an object and thus, runs in linear time wrt the dimensionality. In this talk, I will introduce a new randomized algorithm named ADSampling which runs in logarithmic time wrt to the dimensionality for the majority of DCOs and succeeds with high probability. In addition, I will introduce one general and two algorithm-specific techniques based on ADSampling, which can be used as plugins to enhance existing AKNN algorithms. I will also briefly introduce a new quantization method for high-dimensional vectors, which can be used for enhancing the efficiency of AKNN, and finally discuss some future research directions of AKNN.
    • Bio: Cheng Long is an Associate Professor at the School of Computer Science and Engineering (SCSE), Nanyang Technological University (NTU). He earned his Ph.D. degree from Hong Kong University of Science and Technology (HKUST) in 2015 and his Bachelor's degree from South China University of Technology (SCUT) in 2010. He has research interests broadly in data management and data mining. Specifically, he works in high-dimensional vector data management (and its applications in large models such as retrieval-augmented generative AI), spatial data management with machine learning-based techniques, spatial data mining in the urban domain (e.g., traffic and mobility analysis), and graph data mining (including dense subgraph mining and graphlet mining). His work has garnered recognition and accolades, including the prestigious "Best Research Award" from ACM-Hong Kong, the "Fulbright-RGC Research Award" granted by the Research Grant Council (Hong Kong), the "PG Paper Contest Award" bestowed by IEEE-HK, and the "Overseas Research Award" received from HKUST.
  • Understanding and Patching LLMs from a Compositional Reasoning Perspective
    • Time: May 13, 2024, 10:00 AM – 10:30 AM
    • Speaker: Ying Wei, Assistant Professor, Nanyang Technological University
    • Abstract: LLMs have marked a revolutonary shift, yet they falter when faced with compositional reasoning tasks. This talk introduces our empirical findings that (1) most of compositional reasoning failures of LLMs stem from the improperly generated or leveraged implicit reasoning results, (2) implicit reasoning results indeed surface within middle layers and play a causative role in shaping the final explicit reasoning results, and (3) multi-head self-attention (MHSA) modules within these layers emerge as the linchpins in accurate generation and leveraging of implicit reasoning results. The second part of this talk comes our proposed method CREME, which is a lightweight method grounded on the above findings to patch errors in compositional reasoning via editing the located MHSA modules. CREME paves the way for autonomously and continuously enhancing compositional reasoning capabilities in language models.
    • Bio: Ying Wei is a Nanyang Assistant Professor in the School of Computer Science and Engineering at Nanyang Technological University. Prior to that, she was an Assistant Professor in the Department of Computer Science, City University of Hong Kong and a Senior Researcher at the Machine Learning Center of Tencent AI Lab. She works on machine learning, and is especially interested in solving challenges in transfer, meta, and continual learning through compositionality. She received her Ph.D. degree from Hong Kong University of Science and Technology in 2017 with the support of Hong Kong PhD Fellowship. She has published over 55 papers in conferences such as ICML and NeurIPS, and received the SIGKDD’14 Best Paper nomination.She has served as action editor for TMLR, area chairs for ICML and NeurIPS, a senior program committee member for AAAI, and a committee member for conferences such as ICLR and ACL.
  • Coffee Break
    • Time: May 13, 2024, 10:30 AM – 11:00 AM
  • Efficient Multimodal Large Language Model
    • Time: May 13, 2024, 11:00 AM – 11:30 AM
    • Speaker: Ao Zhang, Ph.D. Student, National University of Singapore
    • Abstract: Recent years have witnessed a great rise in multimodal large language models (MLLMs) in ushering the human-like artificial intelligence. However, the construction of MLLMs typically requires heavy computational costs. In this talk, we will introduce efficient MLLM construction from several aspects. What is an efficient MLLM architecture to achieve high performance? How to choose and organize the data to build a powerful MLLM? Are there training strategies to build new MLLMs or extend function scope efficiently?
    • Bio: Ao Zhang is currently a Ph.D. student in the School of Computing, the National University of Singapore. His research interests mainly lie in multimodal large language models, multimodal prompt learning, and structured scene understanding. He has published several papers on top-tier conferences including ICCV, ECCV, ACL, EMNLP, AAAI, and NeurIPS.
  • Towards Generative Search and Recommendation in the Era of Large Language Models
    • Time: May 13, 2024, 11:30 AM – 12:00 AM
    • Speaker: Wenjie Wang, Research Fellow, National University of Singapore
    • Abstract: With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In recent decades, search and recommendation have experienced synchronous technological paradigm shifts, including machine learning-based and deep learning-based paradigms. Recently, the superintelligent generative large language models have sparked a new paradigm in search and recommendation --- generative search (retrieval) and recommendation, which aims to address the matching problem in a generative manner. In this talk, we provide a comprehensive survey of this emerging generative paradigm and summarize the developments in generative search and recommendation from a unified perspective. Besides, we distinguish generative search and recommendation with their unique challenges, identify open problems and future directions, and envision the next information-seeking paradigm.
    • Bio: Wenjie Wang is a research fellow at National University of Singapore, working with Prof. Chua Tat-Seng and Prof. Ng See-Kiong. He received the PhD degree from School of Computing, National University of Singapore in 2023, supervised by Prof. Chua Tat-Seng. His research interests cover LLM-based search and recommendation, causal inference, and multimedia. He has over 40 publications appeared in several top conferences and journals such as SIGIR, KDD, WWW, TIP, and TOIS. Moreover, he has been served as the guest editor, PC member and reviewer for the top conferences and journals including TOIS, TOIS, TKDE, SIGIR, KDD, WWW, and WSDM.

Oral Presentation

  • ConvSDG: Session Data Generation for Conversational Search
    • Time: May 13, 2024, 2:00 PM – 2:15 PM
    • Speaker: Fengran Mo, Department of Computer Science and Operations Research, Université de Montréal
  • Heterogeneous Knowledge Grounding for Medical Question Answering with Retrieval Augmented Large Language Model
    • Time: May 13, 2024, 2:15 PM – 2:30 PM
    • Speaker: Wenting Zhao, University of Illinois Chicago
  • Weakly Supervised Video Moment Retrieval via Location-irrelevant Proposal Learning
    • Time: May 13, 2024, 2:30 PM – 2:45 PM
    • Speaker: Wei Ji, National University of Singapore
  • One-step Reach: LLM-based Keyword Generation for Sponsored Search Advertising
    • Time: May 13, 2024, 2:45 PM – 3:00 PM
    • Speaker: Zheyi Sha, Baidu Inc.
  • Coffee Break
    • Time: May 13, 2024, 3:00 PM – 3:30 PM
  • Neural Retrievers are Biased Towards LLM-Generated Content
    • Time: May 13, 2024, 3:30 PM – 3:45 PM
    • Speaker: Liang Pang, Institute of Computing Technology, Chinese Academy of Sciences (ICT)
  • A Case Study of Enhancing Sparse Retrieval using LLMs
    • Time: May 13, 2024, 3:45 PM – 4:00 PM
    • Speaker: Michael Ayoub, University of Copenhagen
  • LLM Driven Web Profile Extraction for Identical Names
    • Time: May 13, 2024, 4:00 PM – 4:15 PM
    • Speaker: Prateek Sancheti, IIIT Hyderabad
  • Bi-CAT: Improving Robustness of LLM-based Text Rankers to Conditional Distribution Shifts
    • Time: May 13, 2024, 4:15 PM – 4:30 PM
    • Speaker: Sriram Srinivasan & Rishabh Deshmukh, Amazon

Call for Papers

At the heart of the "Information Retrieval Meets Large Language Models Workshop" lies the ambition to pioneer research that bridges the gap between information retrieval and large language models (LLMs). This workshop is dedicated to exploring how LLMs can enhance information retrieval algorithms, introducing a new era of data processing and analysis. We aim to delve into the potential of generative models, particularly in creating AI-generated content (AIGC), to supplement and diversify the information available, catering to a broader array of user preferences and information needs. A significant focus will be on the transformative potential of LLMs in reshaping user interactions with information retrieval systems, harnessing the latest advancements in conversational AI and user experience design. In parallel, the workshop will address the critical issues of trust and ethics in AI-driven information retrieval. This involves scrutinizing content authenticity, mitigating algorithmic biases, and ensuring adherence to evolving ethical and legal standards.

Furthermore, the workshop endeavors to promote the development and adoption of innovative evaluation methodologies. These new approaches, including advanced metrics and human-centered evaluation techniques, are essential for assessing the effectiveness and impact of LLM-enhanced information retrieval systems. Through these multifaceted objectives, the workshop aspires to set a new benchmark in the integration of information retrieval and large language models, paving the way for future innovations in the field.

Topics of Interest

  • LLMs in Query Understanding and Reformulation:
    • Exploring the use of LLMs for interpreting and rephrasing ambiguous queries, including query expansion with semantic understanding and contextual query reformulation.
  • LLMs in Understanding User Behavior:
    • Utilizing LLMs to predict user satisfaction in search sessions and personalize search results based on analysis of historical user data.
  • Personalized Search Techniques Using LLMs:
    • Developing user profiles with the aid of LLMs to improve search relevance and constructing personalized knowledge graphs.
  • Conversational Search Powered by LLMs:
    • Innovations in dialogue systems for search and continuous learning from user interactions in conversational information retrieval.
  • LLM-driven Indexing Strategies:
    • Implementing LLM-based models for generative retrieval, index pruning, optimization, and creating abstract document representations.
  • Ranking and Matching with LLMs:
    • Using LLMs for contextual ranking, semantic query-document matching, and multi-modal search result ranking.
  • LLMs in Evaluation Metrics for Information Retrieval:
    • Developing new IR evaluation metrics leveraging LLM language understanding, automating relevance judgment, and emulating user satisfaction testing.
  • Data Augmentation for IR with LLMs:
    • Generating synthetic queries and enhancing IR corpora diversity using LLM-generated content.
  • Incorporating IR Techniques in LLM Pre-training:
    • Merging traditional IR methods with LLM pre-training for domain adaptation, retrieval-enhanced strategies, and impact analysis.
  • Retrieval Adapters for Enhancing LLMs:
    • Creating modular retrieval adapters for specific IR tasks and customizable IR features within LLMs, improving transfer learning.
  • Knowledge-Enriched LLMs for IR:
    • Integrating external knowledge bases with LLMs, using IR for real-time data feeding, and enhancing factual accuracy with dynamic retrieval.
  • Retrieval Augmented Generation for LLMs:
    • Leveraging document retrieval to enrich LLM responses, comparing RAG with end-to-end models, and examining complex reasoning strategies.
  • Hybrid Models of LLMs and Classic IR:
    • Evaluating hybrid models in specialized domains, enhancing classic IR models' features with LLMs, and maintaining system interpretability.
  • Training and Reasoning Strategies for LLMs in IR:
    • Implementing feedback loops, multi-task learning, meta-learning, few-shot learning, explainable AI, transfer learning, and scalability in LLM training for IR.
  • Extensions in Multi-Lingual and Multi-Modal Scenarios:
    • Investigating LLMs in cross-lingual retrieval, enhancing multi-lingual corpora, interpreting and indexing multi-modal data, and integrating LLMs with other modalities for unified search platforms.

Submission Guidelines

Submissions must be in a single PDF file, formatted according to the ACM WWW 2024 template. Papers may range from 4 to 8 pages, with additional up to 2 pages for references and appendix. Authors can choose the length of their paper, as no distinction will be made between long and short papers. All submissions will undergo a "double-blind" review process, evaluated for their relevance, scientific novelty, and technical quality by expert reviewers.

Submission site: https://easychair.org/conferences/?conf=thewebconf2024_workshops.

Important Dates

Paper Submission Deadline: February 5, 2024
Acceptance Notification: March 4, 2024
Workshop Date: May 13, 2024

Workshop Organizers

 

Zheng Liu

Researcher

Beijing Academy of Artificial Intelligence

 

Yujia Zhou

Ph.D Candidate

Renmin University of China

 

Yutao Zhu

Postdoc

Renmin University of China

 
 

Jianxun Lian

Researcher

Microsoft Research Asia

 

Chaozhuo Li

Researcher

Microsoft Research Asia

 

Zhicheng Dou

Professor

Renmin University of China

 
 

Defu Lian

Professor

University of Science and Technology of China

 

Jian-Yun Nie

Professor

University of Montreal

 

Contact

In case of questions, contact us via an email to Zheng Liu, Yujia Zhou or Yutao Zhu.