About
MMIR Workshop
Welcome to 1st MMIR Workshop co-located with ACM Multimedia 2023!
Information retrieval (IR) is a fundamental technique that aims to acquire information from a collection of documents, web pages, or other sources. While traditional text-based IR has achieved great success, the under-utilization of varied data sources in different modalities (i.e., text, images, audio, and video) would hinder IR techniques from giving its full advancement and thus limits the applications in the real world. Within recent years, the rapid development of deep multimodal learning paves the way for advancing IR with multimodality. Benefiting from a variety of data types and modalities, some latest prevailing techniques are invented to show great facilitation in multimodal and IR learning, such as CLIP, ChatGPT, GPT4, etc. In the context of IR, deep multimodal learning has shown the prominent potential to improve the performance of retrieval systems, by enabling them to better understand and process the diverse types of data that they encounter. Given the great potential shown by multimodal-empowered IR, there can be still unsolved challenges and open questions in the related directions. With this workshop, we target providing a platform for discussion about multimodal IR among scholars, practitioners, and other interested parties.
Calls
Call for Papers
In this workshop, we welcome three types of submissions:
- Position or perspective papers (4~8 pages in length, plus unlimited pages for references): original ideas, perspectives, research vision, and open challenges in the topics of the workshop;
- Featured papers (title and abstract of the paper, plus the original paper): already published papers or papers summarizing existing publications in leading conferences and high-impact journals that are relevant for the topics of the workshop;
- Demonstration papers (up to 2 pages in length, plus unlimited pages for references): original or already published prototypes and operational evaluation approaches in the topics of the workshop.
We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.
Topics and Themes
Topics of interests include but not limited to:
- Image-text Multimodal Learning and Retrieval, such as
- - Vision-language Alignment Analysis
- - Multimodal Fusion and Embeddings
- - Vision-language Pre-training
- - Structured Vision-language Learning
- - Commonsense-aware Vision-language Learning
- Video-text Understanding and Retrieval, such as
- - Video-text Retrieval
- - Video (Corpus) Moment Retrieval
- - Video Relation Detection
- - Video Question Answering
- - Video Dialogue
- Dialogue Multimodal Retreival, such as
- - Multimedia Pre-training in Dialogue
- - Multimedia Search and Recommendation
- - Multimodal Response Generation
- - User-centered Dialogue Retreival
- - New Applications on ChatGPT&Visual-GPT and Beyond
- Reliable Multimodal Retrieval, such as
- - Explainable Multimodal Retrieval
- - Typical Failures of ChatGPT and other Large Models
- - Adversarial Attack and Defense
- - New Evaluation Metrics
- Multimedia Retrieval Applications, such as
- - Multimodal-based Reasoning
- - Unapired Image Captioning
- - Multimodal Information Extraction
- - Multimodal Translation
- - Opinion/Sentiment-oriented Multimodal Analysis for IR
Submission Instructions
Page limits include diagrams and appendices. Submissions should be written in English, and formatted according to the current ACM two-column conference format. Authors are responsible for anonymizing the submissions. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).
Review Process
All submissions will be peer-reviewed by at least two reviewers of experts in the field. The reviewing process will be two-way anonymized. Acceptance will be dependent on the relevance to the workshop topics, scientific novelty, and technical quality. The accepted workshop papers will be published in the ACM Digital Library.
Important Dates
- Paper Submission:
July 29, 2023 (AoE) - Notification of Acceptance:
August 7, 2023 (AoE) - Camera-ready Submission:
August 22, 2023 (AoE) - Workshop dates: October 28, 2023 - November 3, 2023 (AoE)
Papers
Accepted Papers
-
Self-Distilled Dynamic Network for Language-based Fashion Retrieval
Hangfei Li, Yiming Wu, Fangfang Wang -
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji, Meng Cao, Tengtao Song, Long Chen, Yi Wang, Yuexian Zou -
Boon: A Neural Search Engine for Cross-Modal Information Retrieval
Yan Gong, Georgina Cosma -
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Tommaso Di Noia -
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content
Avinash Anand, Raj Shivprakash Poonam Jaiswal, Pijush Bhuyan, Mohit Gupta, Siddhesh Bangar, Md Modassir Imam, Rajiv Ratn Shah, Shin'ichi Satoh -
Metaverse Retrieval: Finding the Best Metaverse Environment via Language
Ali Abdari, Alex Falcon, Giuseppe Serra -
Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical
Feng Gao, Yao Chen, Maofu Liu
Workshop Schedule
Program
Date: November 2, 2023 (full day). Room: Provinces 1. Please note the schedule is in Ottawa time zone. The program at a glance can be downloaded here.
Also you can online join via Zoom Meeting (click to enter), ID: 335 825 3206, Passcode: 118404
09:00 - 09:10 | | | Welcome Message from the Chairs |
09:10 - 09:40 | | | Keynote 1: Hypergraphs for Multimedia Retrieval and Insight, by Prof. Marcel Worring |
09:40 - 10:10 | | | Keynote 2: Revisiting Pseudo Relevance Feedback: New Developments and Applications, by Prof. Shin’ichi Satoh |
10:10 - 10:40 | | | Coffee Break |
10:40 - 11:00 | | | Presentation 1: Video Referring Expression Comprehension via Transformer with Content-conditioned Query |
11:00 - 11:20 | | | Presentation 2: Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator |
11:20 - 11:40 | | | Presentation 3: Self-Distilled Dynamic Network for Language-based Fashion Retrieval |
11:40 - 12:00 | | | Presentation 4: Metaverse Retrieval: Finding the Best Metaverse Environment via Language |
12:00 - 12:30 | | | Keynote 3: Task Focused IR in the Era of Generative AI, by Prof. Chirag Shah |
12:30 - 15:00 | | | Lunch Break |
15:00 - 15:20 | | | Presentation 5: On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis |
15:20 - 15:40 | | | Presentation 6: Boon: A Neural Search Engine for Cross-Modal Information Retrieval |
15:40 - 16:10 | | | Coffee Break |
16:10 - 16:30 | | | Presentation 7: TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content |
16:30 - 16:50 | | | Presentation 8: Zero-Shot Composed Image Retrieval with Textual Inversion |
16:50 | | | Workshop Closing |
Talks
Invited Speakers
Marcel Worring
University of AmsterdamShin'ichi Satoh
National Institute of InformaticsChirag Shah
University of WashingtonCommittee
Program Committee
Long Chen
The Hong Kong University of Science and TechnologyXun Yang
University of Science and Technology of ChinaChenliang Li
Wuhan UniversityYixin Cao
Singapore Management UniversityPeiguang Jing
Tianjin UniversityJunyu Gao
Chinese Academy of SciencesZheng Wang
Wuhan UniversityPing Liu
A*STARQingji Guan
Beijing Jiaotong UniversityWeijian Deng
Australian National UniversityJieming Zhu
Huawei Noah’s Ark LabOrganization
Workshop Organizers
Wei Ji
National University of SingaporeYinwei Wei
Monash UniversityZhedong Zheng
National University of SingaporeHao Fei
National University of SingaporeTat-seng Chua
National University of SingaporeContact
Contact us