MMIR Workshop

Welcome to 1st MMIR Workshop co-located with ACM Multimedia 2023!

Information retrieval (IR) is a fundamental technique that aims to acquire information from a collection of documents, web pages, or other sources. While traditional text-based IR has achieved great success, the under-utilization of varied data sources in different modalities (i.e., text, images, audio, and video) would hinder IR techniques from giving its full advancement and thus limits the applications in the real world. Within recent years, the rapid development of deep multimodal learning paves the way for advancing IR with multimodality. Benefiting from a variety of data types and modalities, some latest prevailing techniques are invented to show great facilitation in multimodal and IR learning, such as CLIP, ChatGPT, GPT4, etc. In the context of IR, deep multimodal learning has shown the prominent potential to improve the performance of retrieval systems, by enabling them to better understand and process the diverse types of data that they encounter. Given the great potential shown by multimodal-empowered IR, there can be still unsolved challenges and open questions in the related directions. With this workshop, we target providing a platform for discussion about multimodal IR among scholars, practitioners, and other interested parties.


Call for Papers

In this workshop, we welcome three types of submissions:

  1. Position or perspective papers (4~8 pages in length, plus unlimited pages for references): original ideas, perspectives, research vision, and open challenges in the topics of the workshop;
  2. Featured papers (title and abstract of the paper, plus the original paper): already published papers or papers summarizing existing publications in leading conferences and high-impact journals that are relevant for the topics of the workshop;
  3. Demonstration papers (up to 2 pages in length, plus unlimited pages for references): original or already published prototypes and operational evaluation approaches in the topics of the workshop.

We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.

Topics and Themes

Topics of interests include but not limited to:

  • Image-text Multimodal Learning and Retrieval, such as
    • - Vision-language Alignment Analysis
    • - Multimodal Fusion and Embeddings
    • - Vision-language Pre-training
    • - Structured Vision-language Learning
    • - Commonsense-aware Vision-language Learning
  • Video-text Understanding and Retrieval, such as
    • - Video-text Retrieval
    • - Video (Corpus) Moment Retrieval
    • - Video Relation Detection
    • - Video Question Answering
    • - Video Dialogue
  • Dialogue Multimodal Retreival, such as
    • - Multimedia Pre-training in Dialogue
    • - Multimedia Search and Recommendation
    • - Multimodal Response Generation
    • - User-centered Dialogue Retreival
    • - New Applications on ChatGPT\&Visual-GPT and Beyond
  • Reliable Multimodal Retrieval, such as
    • - Explainable Multimodal Retrieval
    • - Typical Failures of ChatGPT and other Large Models
    • - Adversarial Attack and Defense
    • - New Evaluation Metrics
  • Multimedia Retrieval Applications, such as
    • - Multimodal-based Reasoning
    • - Unapired Image Captioning
    • - Multimodal Information Extraction
    • - Multimodal Translation
    • - Opinion/Sentiment-oriented Multimodal Analysis for IR

Submission Instructions

Page limits include diagrams and appendices. Submissions should be written in English, and formatted according to the current ACM two-column conference format. Authors are responsible for anonymizing the submissions. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).

Review Process

All submissions will be peer-reviewed by at least two reviewers of experts in the field. The reviewing process will be two-way anonymized. Acceptance will be dependent on the relevance to the workshop topics, scientific novelty, and technical quality. The accepted workshop papers will be published in the ACM Digital Library.

Important Dates

  • Paper Submission: July 21, 2023 (AoE)
  • Notification of Acceptance: July 30, 2023 (AoE)
  • Camera-ready Submission: August 6, 2023 (AoE)
  • Workshop dates: October 28, 2023 - November 3, 2023 (AoE)

Workshop Schedule




Invited Speakers



Program Committee

Long Chen

The Hong Kong University of Science and Technology

Xun Yang

University of Science and Technology of China

Chenliang Li

Wuhan University

Yixin Cao

Singapore Management University

Peiguang Jing

Tianjin University

Junyu Gao

Chinese Academy of Sciences

Zheng Wang

Wuhan University

Ping Liu


Qingji Guan

Beijing Jiaotong University

Weijian Deng

Australian National University

Jieming Zhu

Huawei Noah’s Ark Lab


Workshop Organizers

Wei Ji

National University of Singapore

Yinwei Wei

Monash University

Zhedong Zheng

National University of Singapore

Hao Fei

National University of Singapore

Tat-seng Chua

National University of Singapore


Contact us

Join and post at our Google Group!
Email the organziers at mmir23@googlegroups.com .