A novel framework for micro-video recommendation that leverages user co-occurrence graphs to collaboratively mine personalized multi-modal fusion patterns and overcome modality-missing challenges.
Qifan Wang1, Yinwei Wei2*, Jianhua Yin1*, Jianlong Wu1, Xuemeng Song1, Liqiang Nie1
1 College of Computer Science and Technology, Shandong University, Qingdao, China
2 School of Computing, National University of Singapore, Singapore
* Corresponding authors
- Paper: IEEE Transactions on Multimedia 2023
- Code Repository: GitHub
- Introduction
- Highlights
- Method / Framework
- Project Structure
- Installation
- Dataset / Benchmark
- Citation
- Acknowledgement
This is the official implementation of the paper DualGNN: Dual Graph Neural Network for Multimedia Recommendation.
Existing micro-video recommender systems often fuse multi-modal user preferences (visual, acoustic, textual) in a unified manner, ignoring the fact that users tend to place different emphasis on different modalities. Furthermore, modality-missing is ubiquitous in real-world scenarios, which negatively affects fusion operations. To overcome these disadvantages, we propose DualGNN, a novel framework built upon the user-microvideo bipartite graph and user co-occurrence graph. It leverages correlations between users to collaboratively mine the particular fusion pattern for each user and inductively learn the multi-modal preference.
- Personalized Fusion Patterns: Replaces unified multi-modal fusion operations by explicitly modeling users' individual attentions over different modalities.
- Robust to Modality-Missing: Effectively reduces the negative impact of missing multi-modal information by capturing co-occurrence relationships and injecting them into user nodes.
- State-of-the-Art Performance: Significantly outperforms existing models like MMGCN, LR-GCCF, and LightGCN on real-world micro-video benchmarks.
The DualGNN framework consists of three main components:
- Single-Modal Representation Learning Module: Performs simplified graph operations on the user-microvideo graph in each modality to capture single-modal user preferences.
- Multi-Modal Representation Learning Module: Disentangles the learning process into information construction and aggregation operations to inductively learn multi-modal user preferences on the user co-occurrence graph.
- Prediction Module: Computes the inner product between the final representations of users and micro-videos to rank potential candidates.
.
├── train_DualGNN.py # Standard pre-training and fine-tuning entry point
├── parse.py # Configuration and hyper-parameter settings
├── evaluation.py # Evaluation metrics (Precision, Recall, NDCG)
├── Preprocess.py # Data preprocessing and graph construction (user/item graphs)
├── DataLoad.py # Dataset loading and batching (imported in training)
├── Model.py # Core implementation of the proposed DualGNN model
├── Model_LightGCN.py # Baseline implementation: LightGCN
├── Model_LRGCCF.py # Baseline implementation: LRGCCF
├── Model_MMGCN.py # Baseline implementation: MMGCN
├── Model_VBPR.py # Baseline implementation: VBPR
├── Model_stargcn.py # Baseline implementation: Star-GCN
└── results/ # Directory for saving training logs and embeddings (auto-generated)
Note: The dataset files are expected to be located in a ../Data/ directory relative to the project root (e.g., ../Data/Movielens/, ../Data/tiktok_new/).
git clone https://github.com/iLearn-Lab/TMM23-DualGNN
cd TMM23-DualGNNThe code requires Python 3.x and relies heavily on PyTorch and PyTorch Geometric. We recommend setting up a virtual environment (e.g., via Conda):
conda create -n dualgnn python=3.8
conda activate dualgnnInstall the required dependencies:
# Core framework
pip install torch torchvision torchaudio
# Graph neural network libraries
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric
# Utilities and metrics
pip install numpy scipy tqdm networkx matplotlib tensorboard(Note: Please ensure the versions of torch-scatter, torch-sparse, and torch-geometric match your specific CUDA version.)
Before training, construct the user and item graphs and adjacency matrices from the raw interaction data:
python Preprocess.pyYou can train the DualGNN model or any of the baselines by running train_DualGNN.py and specifying the required arguments (defined in parse.py).
Example command to run DualGNN on the Movielens dataset:
python train_DualGNN.py \
--model_name DualGNN \
--dataset Movielens \
--l_r 1e-4 \
--weight_decay 0.1 \
--batch_size 8192 \
--num_epoch 1000 \
--sampling 40To run the baseline models (e.g., LightGCN, MMGCN), simply change the --model_name parameter:
python train_DualGNN.py --model_name LightGCN --dataset Tiktok
python train_DualGNN.py --model_name MMGCN --dataset MovielensWe conducted extensive experiments on two public datasets designed for micro-video recommendation: Tiktok and Movielens.
You can find the full version of the recommendation datasets via the official Tiktok and Movielens sites. Considering the copyright of the datasets, we cannot release them directly.
To facilitate this line of research, we provide toy datasets which can be downloaded here:
- BaiduPan: Download Link (Extraction code:
zsye) - Google Drive: Download Link
If you need access to the full datasets, please contact the respective dataset owners.
If you find this work helpful for your research, please consider citing our paper:
@article{wang2023dualgnn,
title={DualGNN: Dual Graph Neural Network for Multimedia Recommendation},
author={Wang, Qifan and Wei, Yinwei and Yin, Jianhua and Wu, Jianlong and Song, Xuemeng and Nie, Liqiang},
journal={IEEE Transactions on Multimedia},
volume={25},
pages={1074--1084},
year={2023},
publisher={IEEE},
doi={10.1109/TMM.2021.3138298}
}This work was supported in part by the National Natural Science Foundation of China under Grants 62172261 and 61802231, and in part by the Shandong Provincial Natural Science Foundation under Grant ZR2019QF001.
This project is released under the terms of the LICENSE file included in this repository.
