Multi-modal Learning via Multi-objective Optimization (MIMO)

This is the official repo of the multi-modal learning via multi-objective optimization (MIMO) algorithm designed for efficient modality imbalance mitigation (see the paper Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization).

Introduction

Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities. Methods have been proposed to alleviate this imbalance issue using different heuristics, which often lead to computationally intensive subroutines. In this paper, we reformulate the MML problem as a multi-objective optimization (MOO) problem that overcomes the imbalanced learning issue among modalities and propose a gradient-based algorithm to solve the modified MML problem. The resulting algorithm shows improved performance on popular MML benchmarks compared to existing baselines, while demonstrating up to ∼20× reduction in subroutine computation time.

Example Results

Below we give some representative results. For complete results, please see the paper. The left figure provides a comparison of the training and testing performance of the MIMO algorithm with vanilla MML (joint training with sum fusion) on the CREMA-D dataset. The middle and right figures provide a comparison of the loss landscape of vanilla MML and MIMO after 1500 iterations on the CREMA-D dataset. Here, the black contours denote the multi-modal training loss, and the yellow dashed contours denote the multi-modal testing loss. The red star denotes the convergent point of each method. The color of the heatmap denotes the difference between uni-modal training accuracies at the given point of the loss landscape, where blue denotes the audio modality is dominating, green denotes the visual modality is dominating, and higher color intensity denotes larger differences in accuracy. As illustrated by the training curves and loss landscapes, MIMO achieves lower multi-modal test loss (i.e., better generalization) by balancing the learning of each modality.

Installation

Please follow the instructions in the README files corresponding to each codebase environment in the src folder for setting up the required datasets and the environments for running the experiments. We use src/agm_base to run experiments with the CREMA-D, UR-Funny, AV-MNIST, CMU-MOSEI, and AVE datasets, and src/ogm_ge_base to run experiments with the VGGSound and Kinetics-Sound datasets.

An Example Implementation

We demonstrate training with MIMO using the CREMA-D dataset. An example experiment parameter configuration is given below:

dir=cremad_logs
mkdir -p $dir
data_root=data/cremad
dataset=CREMAD
epochs=100
seed=1000
cuda=0
methods=MTL-MIMO
device=cuda:$cuda
lr="1e-3"
lambd_mimo=100.0
mu_mimo=0.01
modulation_ends=$epochs
fusion_type=late_fusion
modality=Multimodal

Then, run the following commands to run the experiments and log the results:

cd src/agm_base
logname=$dir/$dataset-$methods-lr-$lr-epochs-$epochs-$fusion_type-$modality-$seed.out
echo "python -u main.py --data_root $data_root --dataset $dataset  --device $device --methods $methods --lambd_mimo $lambd_mimo --mu_mimo $mu_mimo --modality $modality --fusion_type $fusion_type --random_seed $seed --expt_dir checkpoint --expt_name test --batch_size 64 --EPOCHS $epochs --modulation_ends $modulation_ends --learning_rate $lr --lr_decay_ratio 0.9 > $logname 2>&1 "
python -u main.py --data_root $data_root --dataset $dataset  --device $device --methods $methods --lambd_mimo $lambd_mimo --mu_mimo $mu_mimo --modality $modality --fusion_type $fusion_type --random_seed $seed --expt_dir checkpoint --expt_name test --batch_size 64 --EPOCHS $epochs --modulation_ends $modulation_ends --learning_rate $lr --lr_decay_ratio 0.9 > $logname 2>&1

Acknowledgement

We would like to thank the authors of OGM-GE_CVPR2022 and AGM codebases, upon which this codebase is built!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-modal Learning via Multi-objective Optimization (MIMO)

Introduction

Example Results

Installation

An Example Implementation

Acknowledgement

About

Uh oh!

Releases

Packages

heshandevaka/MIMO

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Learning via Multi-objective Optimization (MIMO)

Introduction

Example Results

Installation

An Example Implementation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages