Accepted Workshops

Workshop 1

Title: (ACMMM CL ’25) 2nd International Workshop on Continual Learning meets Multimodal Foundation Models: Fundamentals and Advances

URL: https://acmmm-cl.github.io/2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: Hyatt / Weavers room

Abstract: In recent years, with the advancement of multimodal foundation models (MMFMs), there has been a growing interest in enhancing their generalization abilities through continual learning (CL) to process diverse data types, from text to visuals, and continuously update their capabilities based on real-time inputs. Despite significant advancements in theoretical research and applications of continual learning, the community remains confronted with serious challenges. Our workshop aims to provide a venue where academic researchers and industry practitioners can come together to discuss the principles, limitations, and applications of multimodal foundation models in continual learning for multimedia applications and promote the understanding of multimodal foundation models in continual learning, innovative algorithms, and research on new multimodal technologies and applications.

Workshop 2

Title: (I2M-MM ’25) International Workshop on Intelligent Immersification in the Metaverse: AI-Driven Immersive Multimedia

URL: https://sites.google.com/view/i2m-mm25/home

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: Hyatt / Distillers

Abstract: This workshop is organized by the ACM I2M Chapter consisting of industry members (such as NVIDIA and Meta) and academia members (such as Singapore Institute of Technology & National University of Singapore). The rapid convergence of Artificial Intelligence (AI), Human-Computer Interaction (HCI), and immersive multimedia is redefining the landscape of intelligent and adaptive digital experiences. As ACM Multimedia 2025 emphasizes cutting-edge multimedia systems, this workshop directly contributes to its vision by exploring AI’s transformative role in immersive media. Through AI-driven multimedia interaction, adaptive virtual environments, and intelligent content generation, this workshop will showcase how AI is enhancing the creation and experience of digital worlds.

Workshop 3

Title: (MUCG ’25) The 1st International Workshop on MLLM for Unified Comprehension and Generation

URL: https://mllm-mucg.github.io/MM2025

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Higgins 2

Abstract: As Multimodal Large Language Models (MLLMs) continue to advance, there is a growing need to bridge the gap between their comprehension and generation capabilities within unified frameworks. This workshop MLLM for Unified Comprehension and Generation (MUGC) aims to explore and address the fundamental challenges in developing truly integrated MLLMs that can seamlessly understand and create multimodal content. We focus on three interconnected areas: (1) sophisticated multimodal comprehension, targeting robust understanding of complex visual content and semantic relationships; (2) controllable content generation, addressing challenges in high-fidelity synthesis and cross-modal consistency; and (3) unified frameworks that enable semantic alignment between understanding and generation tasks. Unlike previous approaches that treat these capabilities separately, our workshop specifically targets their integration through MLLMs, fostering focused discussions on shared architectures, bidirectional knowledge transfer, and end-to-end training strategies. The MUGC workshop will bring together researchers from multi-modal communities to explore novel methodologies and establish theoretical frameworks for next-generation multimodal systems. Our website is at https://MUGC.github.io/.

Workshop 4

Title: (CogMAEC ’25) The 1st International Workshop on Cognition-oriented Multimodal Affective and Empathetic Computing

URL: https://CogMAEC.github.io/MM2025/

Paper submission deadline: 30 June, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Higgins 2

Abstract: In this workshop, we aim to reflect on the achievements of the multimodal affective computing community thus far, while also charting the path forward, especially in the context of the ongoing revolution of large language models. We envision a future where emotional and empathetic computing can go beyond simple recognition, aiming to: i) not only identify an individual’s emotions but also provide a comprehensive understanding of the underlying causes, offering detailed rationales; and ii) empower computational models to mimic human-like reasoning, gradually analyzing and constructing a complete emotional landscape based on the given context, rather than merely performing one-off predictions.

Workshop 5

Title: (AMM ’25) International Workshop on Automotive and Medical Multimedia: Bridging the Gap Between Mobility and Healthcare

URL: https://sites.google.com/e-uvt.ro/amm/home

Paper submission deadline: 11 June, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Field 1 & 2

Abstract: The workshop aims to explore the intersection of multimedia technologies in automotive and medical fields, focusing on innovative applications, challenges, and future directions. It will bring together researchers, industry professionals, and students to discuss advancements in areas such as in-vehicle infotainment systems, medical imaging, telemedicine, AI-driven solutions, and cross-domain applications like mobility for the elderly and disabled.

Workshop 6

Title: (AIQAM ’25) The 2nd ACM Workshop in AI-powered Question & Answering Systems

URL: https://sites.google.com/view/aiqam25

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Field 1 & 2

Abstract: Research into Question-Answering (QA) tasks has become increasingly popular thanks to the development of large language models (LLM). LLM models can now provide answers to questions in many areas such as economics, mathematics, etc. However, the knowledge of LLM is mainly embedded in the parameters of the LLMs, making it unexplainable. With the enhanced model can improve the accuracy of the answer. On top of that, constructing a QA model enhanced by multimedia data may encounter diverse challenges. For example, the use of different data types, e.g., text, images, audio, and video requires sophisticated algorithms and techniques capable of processing and retrieving meaningful information, in many cases across media. In addition, ensuring the quality and accuracy of retrieved data raises a significant challenge for the QA model. More research is also needed in this field to improve the trustworthiness of the QA model by indicating specific data source.

This workshop aims to explore the essential role of AI-powered Q&A systems using multimedia data. Participants will present novel findings and engage in comprehensive discussions about state-of-the-art methodologies and technologies enabling the retrieval, analysis, and utilisation of multimedia data for providing valuable answers and feedback, along with application case studies across sectors such as education, finance and lifelogging.

Workshop 7

Title: (MuSe ’25) The 6th International Multimodal Sentiment Analysis Workshop

URL: https://www.muse-challenge.org/

Paper submission deadline: 11 July, 2025

Workshop Date: See CogMAEC’25 – Workshop 4

Abstract: MuSe 2025 advances multimodal sentiment analysis, emphasizing empathetic and naturalistic human-AI interaction. This year highlights the role of affect in text-to-speech synthesis and the ability of LLMs to interpret and respond to emotional cues across modalities. Key topics include multimodal emotion recognition, humor, non-verbal communication (gestures, facial expressions, and vocal prosody), social perception (trust, dominance, and deception detection), interpretability, and emotional adaptability. Additionally, we explore multimodal synergies, continual learning, and practical challenges such as efficiency, robustness, and deployment on edge devices. MuSe 2025 brings together researchers and industry leaders to push the boundaries of affective computing and human-AI collaboration.

Workshop 8

Title: (DFF ’25) 1st Deepfake Forensics Workshop: Detection, Attribution, Recognition, and Adversarial Challenges in the Era of AI-Generated Media

URL: https://iplab.dmi.unict.it/mfs/acm-dff-ws-2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Goldsmiths 2

Abstract: The rapid advancements in deep learning, particularly in generative models such as Generative Adversarial Networks (GANs) and Diffusion Models (DM), have significantly improved the quality and realism of synthetic media, commonly referred to as deepfakes. While these technologies unlock creative possibilities, they simultaneously raise critical concerns regarding digital content authenticity. Deepfake generation and detection are now at the core of multimedia forensics, requiring robust and generalizable methods to identify manipulated content effectively.
This workshop aims to bring together researchers and practitioners from diverse fields, including computer vision, multimedia forensics and adversarial machine learning, to explore emerging challenges and solutions in deepfake detection, attribution, recognition and counter-forensic strategies. Specifically, it will address the limitations of current detection models in generalizing to real-world scenarios, the interpretability of forensic results, and the risks posed by synthetic content. Additionally, the workshop will promote discussions on dataset biases, multimodal deepfake analysis, the forensic ballistics of synthetic media, and the legal and ethical implications of deepfake technology, including regulatory challenges and forensic admissibility in court.

Workshop 9 (MERGED WITH LAVA’25 – Workshop 18)

Title: (AOCV ’25) 1st International Workshop on Ambiguous Object Analysis in Computer Vision

URL: https://selab.hcmus.edu.vn/events/aocv

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: See LAVA’25 – Workshop 18

Abstract: Ambiguous objects, such as camouflaged, transparent, or in-mirror objects, are challenging to detect due to factors like lighting, occlusion, and size variance. Limited datasets further complicate research in this area. However, advancements in recognizing and segmenting such objects have significant applications in wildlife conservation, camouflage materials, search-and-rescue missions, and medical imaging. This workshop aims to advance multimedia research on detecting and segmenting ambiguous objects, with applications in forensics, AR/VR, video analysis, and content moderation, etc.

Workshop 10

Title: (MA-LLM ’25) ACM MM 2025 Workshop on Multimedia Analytics with Multimodal Large Language Models

URL: https://ma-llm25.github.io/

Paper submission deadline: 27 June, 2025 (ACMMM Fast-track: 11 July, 2025)

Workshop Date: 27 October, 2025

Venue: Hyatt / Weavers

Abstract: The First Workshop on Multimedia Analytics with Multimodal Large Language Models at ACM Multimedia 2025 aims to explore the potential and pitfalls of bringing Multimodal Large Language Models into multimedia analytics, and the new forms of interaction between system and experts that emerge from this.

Workshop 11

Title: (MMGR ’25) 3rd International Workshop on Deep Multimodal Generation and Retrieval

URL: https://videorelation.nextcenter.org/MMGR2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: Hyatt / Dean Swift 2

Abstract: Information generation (IG) and information retrieval (IR) are two key representative approaches of information acquisition, i.e., producing content either via generation or via retrieval. While traditional IG and IR have achieved great success within the scope of languages, the under-utilization of varied data sources in different modalities (i.e., text, images, audio, and video) would hinder IG and IR techniques from giving the full advances and thus limits the applications in the real world. This workshop of Deep Multimodal Generation and Retrieval (MMGR) encourages to dive deep into the related topics.

Workshop 12

Title: (LGM3A ’25) 3rd International Workshop on Large Generative Models Meet Multimodal Applications

URL: https://lgm3a.github.io/LGM3A2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: Hyatt / Tanners

Abstract: This workshop aims to explore the potential of large generative models to revolutionize the way we interact with multimodal information. A Large Language Model (LLM) represents a sophisticated form of artificial intelligence engineered to comprehend and produce natural language text, exemplified by technologies such as GPT, LLaMA, Flan-T5, ChatGLM, and Qwen, etc. These models undergo training on extensive text datasets, exhibiting commendable attributes including robust language generation, zero-shot transfer capabilities, and In-Context Learning (ICL). With the surge in multimodal content—encompassing images, videos, audio, and 3D models—over the recent period, Large MultiModal Models (LMMs) have seen significant enhancements. These improvements enable the augmentation of conventional LLMs to accommodate multimodal inputs or outputs, as seen in BLIP, Flamingo, KOSMOS, LLaVA, Gemini, GPT-4, etc. Concurrently, certain research initiatives have delved into generating specific modalities, with Kosmos2 and MiniGPT-5 focusing on image generation, and SpeechGPT on speech production. There are also endeavors to integrate LLMs with external tools to achieve a near ‘any-to-any’ multimodal comprehension and generation capacity, illustrated by projects like Visual-ChatGPT, ViperGPT, MMREACT, HuggingGPT, and AudioGPT. Collectively, these models, spanning not only text and image generation but also other modalities, are referred to as large generative models. This workshop will provide an opportunity for researchers, practitioners, and industry professionals to explore the latest trends and best practices in the field of multimodal applications of large generative models. We also remark that the submissions are not limited to the use of such models. The workshop will also focus on exploring the challenges and opportunities of integrating large language models with other AI technologies such as computer vision and speech recognition. Additionally, the workshop will provide a platform for participants to present their research, share their experiences, and discuss potential collaborations.

Workshop 13

Title: (MRAC ’25) 3rd International Workshop on Multimodal and Responsible Affective Computing

URL: https://react-ws.github.io/2025_mrac/

Paper submission deadline: 25 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Goldsmiths 3

Abstract: Affective Computing involves the creation, evaluation and deployment of Emotion AI and Affective technologies to make people’s lives better. The creation, evaluation and deployment stages of the emotion-ai model require large amounts of multimodal data from RGB images to video, audio, text, and physiological signals. In principle, the development of any AI system must be guided by a concern for its human impact. The aim should be striving to augment and enhance humans, not replace humans; while taking inspiration from human intelligence, safely. To this end, the MRAC’25 workshop aims to transfer the same concepts from a small-scale, lab-based environment to a real-world, large-scale corpus enhanced with responsibility. The workshop also aims to bring to the attention of researchers and industry professionals of the potential implications of generative technology along with its ethical consequences.

Workshop 14

Title: (McGE ’25) The 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

URL: https://cjinfdu.github.io/workshop

Paper submission deadline: 05 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Goldsmiths 1

Abstract: The workshop aims to bring together researchers and practitioners to discuss state- of-the-art research, novel techniques, and practical applications in multimedia content generation, quality assessment, datasets, and construction.

Workshop 15

Title: (SVC ’25) 1st International Workshop & Challenge on Subtle Visual Computing

URL: https://sites.google.com/view/svc-mm25

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Swift 1 & 2

Abstract: Subtle visual signals, though often imperceptible to the human eye, contain subtle yet crucial information that can reveal hidden patterns within visual data. By applying advanced computer vision and representation learning techniques, we can unlock the potential of these signals to better understand and interpret complex environments. This ability to detect and analyze subtle signals has profound implications across various fields, e.g., 1) from medicine, where early identification of minute anomalies in medical imaging can lead to life-saving interventions, 2) from industry, where spotting micro-defects in production lines can prevent costly failures, 3) from affective computing, where understanding micro-expression and micro-gesture under human interaction scenarios can benefit the deception detection. In an era overwhelmed by information, the capacity to detect and decode these ‘subtle visual signals’ offers a novel and powerful approach to anticipating trends, identifying emerging threats, and discovering new opportunities. These signals, often ignored or overlooked, may hold key insights into future developments across different societal contexts.

Although recent advances in subtle visual computing have demonstrated significant potential, several challenges persist regarding effectiveness, robustness, and generalization. Specifically, these challenges include: 1) limited representation of subtle visual signals; 2) insufficient generalization ability, and 3) limited performance in multi-task and multimodal scenarios. This workshop seeks to develop innovative representation learning models specifically designed to capture and interpret subtle visual signals. By doing so, it will provide new ways of perceiving and acting on visual information, empowering decision-making in fields such as healthcare, industrial processes, and affective computing. Ultimately, this workshop aspires to demonstrate how hidden visual cues, when properly decoded, can offer critical foresight and actionable insights in an increasingly complex and interconnected world.

Workshop 16

Title: (SUMAC ’25) The 7th International Workshop on analySis, Understanding and proMotion of heritAge Contents

URL: https://sumac-workshops.github.io/2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: Hyatt / Dean Swift 1

Abstract: The ambition of SUMAC is to bring together researchers and practitioners from different disciplines to share ideas and methods on current trends in the analysis, understanding and promotion of heritage contents. These challenges are reflected in the corresponding sub-fields of machine learning, signal processing, multi-modal techniques and human-machine interaction. We welcome research contributions for the following (but not limited to) topics:
• Information retrieval for multimedia heritage
• Automated archaeology and heritage data processing
• Multi-modal deep learning, generative modeling
• Time series analysis for heritage data
• Heritage visualization, virtualization and narratives
• Smart digitization and reconstruction of heritage data
• Open heritage data and bench-marking
The scope of targeted applications is extensive and includes:
• Analysis, archaeometry of artifacts
• Diagnosis and monitoring for restoration and preventive conservation
• Geosciences / Geomatics for cultural heritage
• Inclusive education
• Smart and sustainable tourism
• Urban planning
• Digital twins

Workshop 17

Title: (UAVM ’25) 3rd International Workshop on UAVs in Multimedia: Capturing the World from a New Perspective

URL: https://www.zdzheng.xyz/ACMMM2025Workshop-UAV/

Paper submission deadline: 07 July, 2025

Workshop Date: 27 October, 2025

Venue: Hyatt / Dean Swift 2

Abstract: Unmanned Aerial Vehicles (UAVs), also known as drones, have become increasingly popular in recent years due to their ability to capture high-quality multimedia data from the sky. With the rise of multimedia applications, such as aerial photography, cinematography, and mapping, UAVs have emerged as a powerful tool for gathering rich and diverse multimedia content. This workshop aims to bring together researchers, practitioners, and enthusiasts interested in UAV multimedia to explore the latest advancements, challenges, and opportunities in this exciting field. The workshop will cover various topics related to UAV multimedia, including aerial image and video processing, machine learning for UAV data analysis, UAV swarm technology, and UAV-based multimedia applications. In the context of the ACM Multimedia conference, this workshop is highly relevant as multimedia data from UAVs is becoming an increasingly important source of content for many multimedia applications. The workshop will provide a platform for researchers to share their work and discuss potential collaborations, as well as an opportunity for practitioners to learn about the latest developments in UAV multimedia technology. Overall, this workshop will provide a unique opportunity to explore the exciting and rapidly evolving field of UAV multimedia and its potential impact on the wider multimedia community.

Workshop 18

Title: (LAVA ’25) The Second International Workshop on Large Vision – Language Model Learning and Applications

URL: https://lava-workshop.github.io/workshop

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: Hyatt / Distillers

Abstract: The primary objective of this workshop is to unleash the full potential of research in large vision-language models (LVLMs) by emphasizing the convergence of diverse modalities, including text, images, and video. Furthermore, the workshop provides a platform for delving into the practical applications of LVLMs across a broad spectrum of domains, such as healthcare, education, entertainment, transportation, finance, etc.

Workshop 19

Title: (APP3DV ’25) International Workshop on Application-driven Point Cloud Processing and 3D Vision

URL: https://mm2025-app3dv-workshop.github.io/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Swift 1 & 2

Abstract: Point cloud is an explicit representation for 3D objects and scenes, providing precise coordinates and attributes to depict the real world. It lies a solid foundation for wide applications, such as digital entertainment, autonomous driving, remote sensing, robotics, unmanned aerial vehicles, etc. The huge amount of data of point clouds motivates the research and development of efficient compression algorithms to relieve the burden of data transmission and storage. Therefore, international industry companies, research institutions and standardization organizations become very interested in devising better compression methods, as well as powerful standards. Moreover, point cloud enhancement algorithms are also very popular, including upsampling, denoising, completion, compression artifacts removal, frame interpolation, etc., which play a critical role in elevating the point cloud quality. Besides the quality evaluation and enhancement for human visual experience, researchers also pay much attention to those for machine analysis tasks. Due to increased efforts on immersive media, embodied artificial intelligence (AI) and unmanned system applications, the 3D vision analysis based on point clouds or point cloud-based multi-modalities is becoming a very interesting research trend. Point cloud technologies can boost the advancements of both the traditional analysis tasks, including classification, segmentation, detection, tracking, etc., and the emerging analysis tasks, including captioning, grounding, task decomposition, question-answering, navigation, etc. The 3D reconstruction and rendering tasks of point clouds have promoted the emergence of 3D Gaussian splatting, which have raised new explorations of 3D visual data compression, reconstruction, generation, rendering, and representation. The advanced algorithms on point cloud compression, enhancement and analysis are solicited, and contributions that advance the state-of-the-art 3D visual data compression, reconstruction, generation, rendering, and representation are also welcomed.

Workshop 20

Title: (RoboSoft ’25) 1st International Workshop on Vision-Language in Soft Robot

URL: https://buaa-colalab.github.io/Robo-Soft-25/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Higgins 2

Abstract: Embodied intelligence enables cognition and decision-making through interaction between robots and their environments. Its development has evolved from rule-based control to autonomous systems powered by deep learning and reinforcement learning. Currently, research in embodied intelligence predominantly focuses on rigid-bodied robots. However, the inherent characteristics of rigid materials limit flexibility, increase collision risk, and reduce adaptability in unstructured and constrained environments. To overcome these limitations, researchers have drawn inspiration from the biological properties of soft-bodied organisms and incorporated flexible materials into robot design, fostering the advancement of embodied intelligence centered around soft-bodied platforms. Due to their deformable nature, soft robots offer highly adaptive and safe solutions, particularly suited for human-robot collaboration and tasks in complex environments. Nevertheless, their underactuated structure and strongly nonlinear dynamics pose significant challenges for the design of intelligent systems.

This workshop focuses on multimodal perception and decision-making in soft robotics, aiming to explore cutting-edge technologies and bring together researchers to explore emerging challenges and solutions from diverse fields, including multimodal embodied navigation, multimodal embodied manipulation, embodied perception and control methods for soft robots. Participants and speakers will present novel research findings and engage in in-depth discussions on cutting-edge methodologies and technologies.

Workshop 21

Title: (MSMA ’25) 1st International Workshop on Multi-Sensorial Media and Applications

URL: https://msma2025.github.io/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Higgins 2

Abstract: Nowadays, a more immersive multimedia system increasingly calls for emerging multi-sensorial media (e.g., text, audio, video, haptics, olfaction, human motion, as well as other new media types). With all these new media signals, effectively processing and seamlessly incorporating them into state-of-the-art multimedia systems will be critical. This necessitates advancements in the entire multi-sensorial media system, as well as in human factors and ergonomics to support its implementation. The 1st International Workshop on Multi-Sensorial Media and Applications (MSMA), collocated with ACM Multimedia Conference 2025, aims to attract contributions relating to multi-sensorial media system, including its system design and evaluation, coding and delivery, analysis and interpretation, multi-modal interaction, human factors and ergonomics, applications, and so on. We hope our workshop can provide a new platform to bridge existing research in this area, inspire new ideas, and expand the boundaries of multimedia research.

Workshop 22

Title: (MCHM ’25) 2nd International Workshop on Multimedia Computing for Health and Medicine

URL: https://weizhou-geek.github.io/workshop/MM2025.html

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Field 1 & 2

Abstract: In health and medicine, an immense amount of data is being generated by distributed sensors and cameras, as well as multimodal digital health platforms that support multimedia, such as audio, video, image, 3D geometry, and text. The availability of such multimedia data from medical devices and digital record systems has greatly increased the potential for automated diagnosis. The past several years have witnessed an explosion of interest, and a dizzyingly fast development, in computer-aided medical investigations using MRI, CT, X-rays, images, point clouds, etc. This proposed workshop focuses on various multimedia computing techniques (including mobile solutions and hardware solutions) for health and medicine, which targets real-world data/problems in healthcare, involves a large number of stakeholders, and is closely connected with people’s health.

Workshop 23

Title: (PILM ’25) International Workshop on Personalized Incremental Learning in Medicine

URL: https://acmmm-pilm-2025.github.io/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Field 1 & 2

Abstract: The Personalized Incremental Learning in Medicine (PILM) workshop at ACM Multimedia 2025 focuses on integrating incremental learning with multimedia applications in healthcare. It aims to discuss advancements that enhance personalized medicine, addressing challenges such as catastrophic forgetting, few-shot learning, and domain shifts in diverse medical data. The workshop encourages interdisciplinary collaboration among researchers in machine learning, multimedia, and clinical practice, emphasizing data privacy and scalable solutions. We welcome submissions that contribute to these themes and connect theory with practical applications in personalized medicine.

Workshop 24

Title: (MFMSI ’25) The 1st International Workshop on Multimodal Foundation Models for Spatial Intelligence

URL: https://sites.google.com/view/mm25-spatial

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Swift 1 & 2

Abstract: Multimodal foundation models have transformed artificial intelligence by enabling scalable and transferable representations across diverse modalities, facilitating applications in vision-language understanding, text-to-image/video generation, and AI-driven assistants. However, their reliance on predominantly linguistic and 2D visual representations limits their ability to interact effectively with the physical world, where deep 3D spatial reasoning is crucial. Spatial intelligence, which encompasses perception, comprehension, and reasoning about spatial relationships and 3D structures, is essential for advancing AI models beyond static, task-specific functions toward embodied capabilities. Achieving robust spatial intelligence is critical for applications such as autonomous systems, robotics, augmented reality, and digital twins.
This workshop seeks to bring together researchers and practitioners from multimedia and related communities to discuss Multimodal Foundation Models for Spatial Intelligence. There are a lot of open problems to be explored, no matter the aspects of multimedia data and benchmarks, framework designs, training techniques, or trust-worthy algorithms. By uniting insights from researchers from various backgrounds, we aim to step towards reshaping the future of spatially-aware foundation models and paving the way for next-generation AI systems capable of perceiving, reasoning, and acting in complex 3D environments.

Workshop 25

Title: (MMFood ‘25) 1st International Workshop on Multi-modal Food Computing

URL: https://mm-food.github.io/2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Goldsmiths 2

Abstract: Human perception of food is influenced by a blend of sensory experiences and cognitive associations, making it inherently multimodal. The way we perceive and choose food is shaped by a combination of sensory inputs – such as sight, smell, taste, and touch – alongside language, memory, and social factors. Beyond individual preferences, food systems and dietary habits are shaped by geographic, cultural, and socio-economic influences, which are further reinforced by digital communities, social media trends, and AI-driven personalization. Advances in artificial intelligence (AI), computer vision, natural language processing, and sensory modeling have enabled new ways to recognize, retrieve, recommend, predict, and monitor food, addressing key challenges in health, nutrition, sustainability, and food culture. Yet, food computing remains underrepresented in the multimedia community, despite its vast potential to leverage multimodal intelligence.
This workshop intends to explore multimodal innovations centering around food, a few examples of which are given below:
Computer Vision based applications targeted at food recognition, portion estimation, visual appeal analysis etc. Natural Language Processing (NLP) and Large Language Models (LLMs) aimed at synthesis and analysis of food descriptions, opinions, recipe generation, dietary guidelines and so on. Machine Learning & AI Planning tasks like generating personalized food recommendations, assessing dietary adherence, addressing problems related to food supply-chain and associated logistics. Sensory Science & Multisensory AI aimed at perception modeling of taste, smell, texture and other dimensions of food. Behavioral & Social Computing studies around analysis of food trends, growth and spread of digital food communities, understanding cultural influences, detecting early warning signals of food safety and security for risk analysis. It will also try to address technical and ethical challenges related to food computing. This includes explorations in the areas of Data standardization across diverse modalities of images, text, nutrition data and use of biometrics for information collection. Interpretability of AI-driven food recommendations and their impact. Ensuring inclusivity in AI-powered food solutions across populations and dietary needs and eating cultures
The ambition of this workshop is to bring together researchers and practitioners working on a wide range of problems related to food computing using different types of food data. It will provide a platform to brainstorm on critical global food-related problems and exchange ideas on how multimodal information can be leveraged to solve them.

Workshop 26

Title: (MMSports ’25) 8th International ACM Workshop on Multimedia Content Analysis in Sports

URL: http://mmsports.multimedia-computing.de/mmsports2025/index.html

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Goldsmiths 1

Abstract: Athletic endeavors, both at amateur and professional level, have a tremendous economic, political, and cultural influence on our society. At the same time, the influence of rapidly developing technologies has changed the way we sense, participate, watch, analyze, understand, and research sports. For example, television broadcasts augment live video footage with computer vision-based and social media-based graphics in real time to emphasize different aspects of a game or performance and assist focus and understanding of viewers. The astonishing impact of wearables plays a pivotal role in how we pursue and evaluate our personal training goals. In a professional setting, coaches and training scientists directly benefit from the latest technological research and sensors, reshaping the way we think about improving the performance and technique of athletes, understanding sport injuries or enhancing the qualitative and quantitative analyses of performances.

While research fields like computer vision, sensor technology, machine learning and data-driven approaches have recently made huge advancements and massively influenced many aspects of sports, the joint assessment of multiple modalities for sport technologies offers appealing innovations to advance the field. For example, audio-visual cues are used to classify different sports types or perform crowd sentiment analyses. Computer vision systems using high-speed camera arrays generate performance coefficients and perform 3D technical game analyses, while force predictions from force plates and wearable sensors can be utilized to predict impending injuries. The ambition of this workshop is to bring together researchers and practitioners from many different disciplines to share ideas and methods on current multimedia/multimodal content analysis research in sports.

Workshop 27

Title: (RichMediaGAI ’25) 3rd International Workshop on Rich Media with Generative AI

URL: https://richmediagai.github.io/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: Hyatt / Dean Swift 1

Abstract: This workshop aims to showcase the latest developments of generative AI (GAI) for creating, editing, restoring, and compressing rich media data, such as images, videos, and 3D contents. GAI enables anyone to design and generate synthetic and realistic content without professional artistic and technical skills. This empowers immeasurable market growth for gaming and entertainment, and even more profound impacts to provide crucial simulated data for training embodied AI agents. This workshop also features a competition with 4 tracks focusing on media generation and transmission with GAI. Track 1~3 target reducing computation and transmission for efficient media delivery, and Track 4 targets controlled novel content creation.

Workshop 28

Title: (GENEA ’25) International Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents

URL: https://genea-workshop.github.io/2025/workshop/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: Hyatt / Tanners

Abstract: Embodied Social Artificial Intelligence in the form of conversational virtual humans and social robots are becoming key aspects of human-machine interaction. For several decades, researchers from varying fields such as human-computer interaction and robotics, have been proposing methods and models to generate non-verbal behaviour for conversational agents in the form of facial expressions, gestures, and gaze. This workshop aims at bringing together these researchers. The aim of the workshop is to stimulate discussions on how to improve both generation methods and the evaluation of the results, and spark an exchange of ideas and cue possible collaborations.

Workshop 29

Title: (DHOW ’25) International Workshop on Diffusion of Harmful Content on Online Web

URL: https://dhow-workshop.github.io

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: DRCC / Goldsmiths 3

Abstract: With the advancement of digital technologies and gadgets, online content has become easily accessible. At the same time, harmful content also spread widely. There are different harmful content types present on various platforms in multiple languages. The topic of harmful content is broad and covers multiple research directions. Users of platforms are affected by all of them. In research, the different forms are mostly analysed separately, e.g. misinformation, cyber-bullying and hate speech. Most research has been conducted for only one platform, for a monolingual situation or on a particular issue. Counter-measures like blocking are down-ranking can make harmful content spreaders to switch platforms and languages to continuously reach a user base. Harmful content does not only appear on social media but also on news media. Spreader share harmful content in posts, news articles, comments and hyperlinks. There is a great need to study harmful content across platforms, languages, and topics. We plan to bring the research on harmful content under one umbrella such that different approaches and novel methods can be shared. The workshop will also cover the currently ongoing issues of war and elections. The theme of the DHOW workshop is centered around the development of robust methods and approaches for mitigation of misinformation in AI generated multimodal contents, and so brings together the research on different topics of harmful content. We expect that the workshop will generate insights and discussions that will help advance the field of societal artificial intelligence (AI) for the development of safer internet. In addition to attracting high quality research contributions to the workshop, one of the aims of the workshop is to mobilise the researchers working on the related areas to form a community.

Workshop 30

Title: (MUWS ’25) The 4th International Workshop on Multimodal Human Understanding for the Web and Social Media

URL: https://muws-workshop.github.io/

Paper submission deadline: 11 July, 2025

Workshop Date: 28 October, 2025

Venue: Hyatt / Dean Swift 2

Abstract: Multimodal human understanding and analysis are emerging research areas that cut through several disciplines like computer vision (CV), natural language processing (NLP), speech processing, human-computer interaction (HCI), and multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and social media applications, there is the need to model the human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences and psychology.
The core is to understand various cross-modal relations, quantifying bias such as social biases, and the applicability of models to real-world problems. Interdisciplinary theories such as semiotics or gestalt psychology can provide additional insights on perceptual understanding through signs and symbols across multiple modalities. In general, these theories provide a compelling view of multimodality and perception that can further expand computational research and multimedia applications on the Web and social media.

Workshop 31

Title: (IXR ’25) 3rd International Workshop on Interactive eXtended Reality

URL: https://ixr-workshop.github.io/2025/

Paper submission deadline: 11 July, 2025

Workshop Date: 27 October, 2025

Venue: DRCC / Higgins 1

Abstract: The goal of the workshop is to advance multimedia, networks, and end-user infrastructures to enable the next generation of interactive Extended Reality (XR) applications and services. While XR has significantly enhanced remote communication by fostering presence and interactivity, most applications remain local and individual experiences. IXR addresses key challenges hindering fully immersive remote interactions, including content realism, motion-to-photon latency, and human-centric quality assessment. By exploring novel solutions across the end-to-end transmission chain, it aims to bridge the gap between current XR technologies and truly interconnected, interactive experiences. Researchers from academia and industry are invited to contribute innovative work tackling these critical barriers.

Accepted Workshops

Workshop 1

Workshop 2

Workshop 3

Workshop 4

Workshop 5

Workshop 6

Workshop 7

Workshop 8

Workshop 9 (MERGED WITH LAVA’25 – Workshop 18)

Workshop 10

Workshop 11

Workshop 12

Workshop 13

Workshop 14

Workshop 15

Workshop 16

Workshop 17

Workshop 18

Workshop 19

Workshop 20

Workshop 21

Workshop 22

Workshop 23

Workshop 24

Workshop 25

Workshop 26

Workshop 27

Workshop 28

Workshop 29

Workshop 30

Workshop 31

About ACM

Contacts