MMGR25@ACM-MM2025

About

MMGR Workshop

Welcome to 3^rd MMGR Workshop co-located with ACM Multimedia 2025!

Information generation (IG) and information retrieval (IR) are two key representative approaches of information acquisition, i.e., producing content either via generation or via retrieval. While traditional IG and IR have achieved great success within the scope of languages, the under-utilization of varied data sources in different modalities (i.e., text, images, video, touch, 3D point cloud, EEG signals, and more) would hinder IG and IR techniques from giving the full advances and thus limits the applications in the real world. Knowing the fact that our world is replete with multimedia information, this special issue encourages the development of deep multimodal learning for the research of IG and IR. Benefiting from a variety of data types and modalities, some latest prevailing techniques are extensively invented to show great facilitation in multimodal IG and IR learning, such as DALL-E, Stable Diffusion, GPT4, Sora, etc. Given the great potential shown by multimodal-empowered IG and IR, there can be still unsolved challenges and open questions in these directions. With this workshop, we aim to encourage more explorations in Deep Multimodal Generation and Retrieval, providing a platform for researchers to share insights and advancements in this rapidly evolving domain.

Calls

Call for Papers

In this workshop, we welcome three types of submissions:

Position or perspective papers (The same format & template as the main conference, but the manuscript’s length is limited to one of the two options: a) 4 pages plus 1-page reference; or b) 8 pages plus up to 2-page reference.): original ideas, perspectives, research vision, and open challenges in the topics of the workshop;
Featured papers (title and abstract of the paper, plus the original paper): already published papers or papers summarizing existing publications in leading conferences and high-impact journals that are relevant for the topics of the workshop;
Demonstration papers (up to 2 pages in length, plus unlimited pages for references): original or already published prototypes and operational evaluation approaches in the topics of the workshop.

All the accepted papers will be archived in the ACM MM proceedings. Authors of accepted papers will be presented at the workshop. Also, high-quality papers can be recommended to ACM ToMM Special Issue of MMGR.

We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.

Topics and Themes

Topics of interests include but not limited to:

Multimodal Semantics Understanding, such as
- - Vision-Language Alignment Analysis
- - Multimodal Fusion and Embeddings
- - Large-scale Vision-Language Pre-training
- - Structured Vision-Language Learning
- - Visually Grounded Interaction of Language Modeling
- - Commonsense-aware Vision-Language Learning
- - Visually Grounded Language Parsing
- - Semantic-aware Vision-Language Discovery
- - Large Multimodal Models
Generative Models for Image/Video Synthesis, such as
- - Text-free/conditioned Image Synthesis
- - Text-free/conditioned Video Synthesis
- - Temporal Coherence in Video Generation
- - Image/Video Editing/Inpainting
- - Visual Style Transfer
- - Image/Video Dialogue
- - Panoramic Scene Generation
- - Multimodal Dialogue Response Generation
- - LLM-empowered Multimodal Generation
Multimodal Information Retrieval, such as
- - Image/Video-Text Compositional Retrieval
- - Image/Video Moment Retrieval
- - Image/Video Captioning
- - Image/Video Relation Detection
- - Image/Video Question Answering
- - Multimodal Retrieval with MLLMs
- - Hybrid Synthesis with Retrieval and Generation
Explainable and Reliable Multimodal Learning, such as

- Explainable Multimodal Retrieval
- Relieve Hallucination of LLMs
- Adversarial Attack and Defense
- Multimodal Learning for Social Good
- Multimodal-based Reasoning
- Multimodal Instruction Tuning
- Efficient Learning of MLLMs

Multimodal LLM, such as
- - Multimodal Foundation Model
- - Continue Learning for Multimodal LLM
- - Unified Architectures for Multimodal LLM
- - Vision-Language Reasoning and Chain-of-thought
- - Retrieval-augmented Multimodal LLMs
- - Parameter-efficient Adaptation and Fine-tuning
- - Multi-agent and Compositional Approaches
- - Safety, Fairness and Hallucination in Multimodal LLMs
- - Evaluation Protocols and Benchmarks

Submission Instructions

Page limits include diagrams and appendices. Submissions should be written in English, and formatted according to the current ACM two-column conference format. Authors are responsible for anonymizing the submissions. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).

Review Process

All submissions will be peer-reviewed by at least two reviewers of experts in the field. The reviewing process will be two-way anonymized. Acceptance will be dependent on the relevance to the workshop topics, scientific novelty, and technical quality. The accepted workshop papers will be published in the ACM Digital Library.

Important Dates

Paper Submission: July 11, 2025 (AoE)
Notification of Acceptance: August 1, 2025 (AoE)
Camera-ready Submission: August 11, 2025 (AoE) [Firm Deadline]
Workshop dates: 27 October, 2025 (AoE)

Papers

Accepted Papers

DSS:Implicit Representation-Based Face Restoration with Diffusion Prior
Gang He, YingFu Zhang, SiqiWang, Kepeng Xu
Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search
Rahul Raja, Arpita Vats
StgcDiff: Spatial-Temporal Graph Condition Diffusion for Sign Language Transition Generation
Jiashu He, Jiayi He, Shengeng Tang, Huixia Ben, Lechao Cheng, Richang Hong
AMR-CSI: Adaptive Multimodal RAG for Cold Start Indexing
Siva Prasad, Shreya Saxena, Mukkamala Venkata Sai Prakash, Zishan Ahmad, Vishal Vaddina
LViCAR: Diffusion Models for Perceptual Quality Enhancement in Video Compression Artifact Reduction
Shiv Gehlot, Guan-Ming Su
Prospective Analysis of Semantic Image Retrieval: Comparing Scene Graph, Visual Features, and Captions
Takahiro Komamizu
Mamba-Based Multimodal Continual Learning for Audio-Visual Classification with Prototype-Enhanced Anti-Forgetting Mechanism
Jingyang Lin, Xinru Ying, Jiaqi Mo, Lina Wei, Fangfang Wang, Canghong Jin, Guanlin Chen
RenderTXT: High-Fidelity Text Rendering in Images with LLM
Shengqiong Wu, Bobo Li, Meishan Zhang, Jun Yu, Min Zhang, Tat-Seng Chua

Workshop Schedule

Program

Date: October 27, 2025 (All times are in Dublin, Winter Time, UTC+0). Meeting Room: Hyatt / Dean Swift 2.

13:30–13:40	\|	Welcome Message from the Chairs
13:40-14:25	\|	Keynote 1: Where Learned Data Structures Meet Computer Vision, by Prof. Yusuke Matsui
14:25-14:40	\|	Presentation 1: AMR-CSI: Adaptive Multimodal RAG for Cold Start Indexing
14:40-14:55	\|	Presentation 2: LViCAR: Diffusion Models for Perceptual Quality Enhancement in Video Compression Artifact Reduction
14:55-15:10	\|	Presentation 3: DSS:Implicit Representation-Based Face Restoration with Diffusion Prior
15:10-15:25	\|	Presentation 4: Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search
15:25-15:35	\|	Coffee Break
15:35-16:20	\|	Keynote 2: Bridging Representation and Retrieval: Deep Learning Advances in Multimodal IG and IR, by Prof. Jialie Shen
16:20-16:35	\|	Presentation 5: StgcDiff: Spatial-Temporal Graph Condition Diffusion for Sign Language Transition Generation
16:35-16:50	\|	Presentation 6: Prospective Analysis of Semantic Image Retrieval: Comparing Scene Graph, Visual Features, and Captions
16:50-17:05	\|	Presentation 7: Mamba-Based Multimodal Continual Learning for Audio-Visual Classification with Prototype-Enhanced Anti-Forgetting Mechanism
17:05-17:20	\|	Presentation 8: RenderTXT: High-Fidelity Text Rendering in Images with LLM
17:20-17:30	\|	Workshop Closing

Talks

Invited Speakers

Organization

Workshop Organizers

Welcome to 2025 Workshop of 3^rd MMGR
The 3rd International Workshop on Deep Multimodal Generation and Retrieval

About

Calls

Topics and Themes

Submission Instructions

Review Process

Important Dates

Papers

Workshop Schedule

Talks

Organization

Wei Ji

Hong Liu

Lizi Liao

Yuchong Sun

Yadan Luo

Xin Wang

Shin'ichi Satoh

Contact

Join and post at our Google Group!

Email the organziers at mmgr25@googlegroups.com .

Welcome to 2025 Workshop of 3rd MMGR The 3rd International Workshop on Deep Multimodal Generation and Retrieval

About

Calls

Topics and Themes

Submission Instructions

Review Process

Important Dates

Papers

Workshop Schedule

Talks

Organization

Wei Ji

Hong Liu

Lizi Liao

Yuchong Sun

Yadan Luo

Xin Wang

Shin'ichi Satoh

Contact

Join and post at our Google Group!

Email the organziers at mmgr25@googlegroups.com .

Welcome to 2025 Workshop of 3^rd MMGR
The 3rd International Workshop on Deep Multimodal Generation and Retrieval