MMIR23@ACM-MM2023

About

MMIR Workshop

Welcome to 1^st MMIR Workshop co-located with ACM Multimedia 2023!

Information retrieval (IR) is a fundamental technique that aims to acquire information from a collection of documents, web pages, or other sources. While traditional text-based IR has achieved great success, the under-utilization of varied data sources in different modalities (i.e., text, images, audio, and video) would hinder IR techniques from giving its full advancement and thus limits the applications in the real world. Within recent years, the rapid development of deep multimodal learning paves the way for advancing IR with multimodality. Benefiting from a variety of data types and modalities, some latest prevailing techniques are invented to show great facilitation in multimodal and IR learning, such as CLIP, ChatGPT, GPT4, etc. In the context of IR, deep multimodal learning has shown the prominent potential to improve the performance of retrieval systems, by enabling them to better understand and process the diverse types of data that they encounter. Given the great potential shown by multimodal-empowered IR, there can be still unsolved challenges and open questions in the related directions. With this workshop, we target providing a platform for discussion about multimodal IR among scholars, practitioners, and other interested parties.

Calls

Call for Papers

In this workshop, we welcome three types of submissions:

Position or perspective papers (4~8 pages in length, plus unlimited pages for references): original ideas, perspectives, research vision, and open challenges in the topics of the workshop;
Featured papers (title and abstract of the paper, plus the original paper): already published papers or papers summarizing existing publications in leading conferences and high-impact journals that are relevant for the topics of the workshop;
Demonstration papers (up to 2 pages in length, plus unlimited pages for references): original or already published prototypes and operational evaluation approaches in the topics of the workshop.

All the accepted papers will be archived in the ACM MM proceedings. Authors of accepted papers will be presented at the workshop.

We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.

Topics and Themes

Topics of interests include but not limited to:

Image-text Multimodal Learning and Retrieval, such as
- - Vision-language Alignment Analysis
- - Multimodal Fusion and Embeddings
- - Vision-language Pre-training
- - Structured Vision-language Learning
- - Commonsense-aware Vision-language Learning
Video-text Understanding and Retrieval, such as
- - Video-text Retrieval
- - Video (Corpus) Moment Retrieval
- - Video Relation Detection
- - Video Question Answering
- - Video Dialogue
Dialogue Multimodal Retreival, such as
- - Multimedia Pre-training in Dialogue
- - Multimedia Search and Recommendation
- - Multimodal Response Generation
- - User-centered Dialogue Retreival
- - New Applications on ChatGPT&Visual-GPT and Beyond
Reliable Multimodal Retrieval, such as
- - Explainable Multimodal Retrieval
- - Typical Failures of ChatGPT and other Large Models
- - Adversarial Attack and Defense
- - New Evaluation Metrics
Multimedia Retrieval Applications, such as

- Multimodal-based Reasoning
- Unapired Image Captioning
- Multimodal Information Extraction
- Multimodal Translation
- Opinion/Sentiment-oriented Multimodal Analysis for IR

Submission Instructions

Page limits include diagrams and appendices. Submissions should be written in English, and formatted according to the current ACM two-column conference format. Authors are responsible for anonymizing the submissions. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).

Review Process

All submissions will be peer-reviewed by at least two reviewers of experts in the field. The reviewing process will be two-way anonymized. Acceptance will be dependent on the relevance to the workshop topics, scientific novelty, and technical quality. The accepted workshop papers will be published in the ACM Digital Library.

Important Dates

Paper Submission: ~~July 29, 2023 (AoE)~~
Notification of Acceptance: ~~August 7, 2023 (AoE)~~
Camera-ready Submission: ~~August 22, 2023 (AoE)~~
Workshop dates: October 28, 2023 - November 3, 2023 (AoE)

Papers

Accepted Papers

Self-Distilled Dynamic Network for Language-based Fashion Retrieval
Hangfei Li, Yiming Wu, Fangfang Wang
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji, Meng Cao, Tengtao Song, Long Chen, Yi Wang, Yuexian Zou
Boon: A Neural Search Engine for Cross-Modal Information Retrieval
Yan Gong, Georgina Cosma
On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
Daniele Malitesta, Giandomenico Cornacchia, Claudio Pomo, Tommaso Di Noia
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content
Avinash Anand, Raj Shivprakash Poonam Jaiswal, Pijush Bhuyan, Mohit Gupta, Siddhesh Bangar, Md Modassir Imam, Rajiv Ratn Shah, Shin'ichi Satoh
Metaverse Retrieval: Finding the Best Metaverse Environment via Language
Ali Abdari, Alex Falcon, Giuseppe Serra
Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical
Feng Gao, Yao Chen, Maofu Liu

Workshop Schedule

Program

Date: November 2, 2023 (full day). Room: Provinces 1. Please note the schedule is in Ottawa time zone. The program at a glance can be downloaded here.

Also you can online join via Zoom Meeting (click to enter), ID: 335 825 3206, Passcode: 118404

09:00 - 09:10	\|	Welcome Message from the Chairs
09:10 - 09:40	\|	Keynote 1: Hypergraphs for Multimedia Retrieval and Insight, by Prof. Marcel Worring
09:40 - 10:10	\|	Keynote 2: Revisiting Pseudo Relevance Feedback: New Developments and Applications, by Prof. Shin’ichi Satoh
10:10 - 10:40	\|	Coffee Break
10:40 - 11:00	\|	Presentation 1: Video Referring Expression Comprehension via Transformer with Content-conditioned Query
11:00 - 11:20	\|	Presentation 2: Prescription Recommendation based on Intention Retrieval Network and Multimodal Medical Indicator
11:20 - 11:40	\|	Presentation 3: Self-Distilled Dynamic Network for Language-based Fashion Retrieval
11:40 - 12:00	\|	Presentation 4: Metaverse Retrieval: Finding the Best Metaverse Environment via Language
12:00 - 12:30	\|	Keynote 3: Task Focused IR in the Era of Generative AI, by Prof. Chirag Shah
12:30 - 15:00	\|	Lunch Break
15:00 - 15:20	\|	Presentation 5: On Popularity Bias of Multimodal-aware Recommender Systems: a Modalities-driven Analysis
15:20 - 15:40	\|	Presentation 6: Boon: A Neural Search Engine for Cross-Modal Information Retrieval
15:40 - 16:10	\|	Coffee Break
16:10 - 16:30	\|	Presentation 7: TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content
16:30 - 16:50	\|	Presentation 8: Zero-Shot Composed Image Retrieval with Textual Inversion
16:50	\|	Workshop Closing