PanoSent

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Meng Luo · Hao Fei · Bobo Li · Shengqiong Wu · Qian Liu ·
Soujanya Poria · Erik Cambria · Mong-Li Lee · Wynne Hsu

National University of Singapore · Wuhan University · The University of Auckland ·
Singapore University of Technology and Design · Nanyang Technological University

Abstract

While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversational ABSA, where two novel subtasks are proposed: Panoptic Sentiment Sextuple Extraction, panoramically recognizing holder, target, aspect, opinion, sentiment, rationale from multi-turn multi-party multimodal dialogue. Sentiment Flipping Analysis, detecting the dynamic sentiment transformation throughout the conversation with the causal reasons. To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit & explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism. Extensive evaluations demonstrate the superiority of our methods over strong baselines, validating the efficacy of all our proposed methods. The work is expected to open up a new era for the ABSA community.

Sentica

We develop a novel MLLM, Sentica, which adopts the FlanT5 (XXL) as the core LLM for semantics understanding and decision-making. For non-text inputs, we use multimodal models to encode signals into LLM-understandable representations. We use ImageBind as the unified encoder for all three non-text modalities due to its strong capabilities, followed by a linear layer that connects ImageBind to the LLM for representation projection.

1. Code Structure

PanoSent/                     
├── data/
│   ├── T-X_pair_data/                 
│   │   ├── LLaVA/
│   │   ├── miniGPT-4/
│   │   └── VideoChat/
│   ├── PanoSent_train.json            
│   └── PpV_train.json                 
├── PanoSent/
│   ├── model/
│   │   ├── imagebind_encoder.py       
│   │   ├── flant5_model.py          
│   │   ├── projection_layer.py       
│   │   └── lora_utils.py             
│   ├── utils/
│   │   └── imagebind_utils.py        
│   └── datasets/
│       ├── stage1_caption_dataset.py 
│       ├── stage2_sextuple_dataset.py 
│       └── stage3_entailment_dataset.py 
├── scripts/
│   ├── train_stage1.sh               
│   ├── train_stage2.sh               
│   └── …           
├── train.py                           
├── evaluate_subtask1.py              
├── evaluate_subtask2.py               
├── requirements.txt                  
└── README.md

2. Environment Preparation

conda create -n sentica python=3.10
conda activate sentica

git clone https://github.com/PanoSent/PanoSent.git

3. Preparing Pre-trained Checkpoints

ImageBind
Download the official imagebind_huge.pth checkpoint from here, and place it at:
```
./imagebind/imagebind_huge.pth
```
Flan-T5
We use Flan-T5 XXL as the LLM backbone.

4. Preparing Datasets

Sentica consists of three instruction tuning stages. The corresponding datasets are:

4.1 ‘Text+X’ pairs

LLaVA
miniGPT-4
VideoChat

After downloading these datasets, organize them as:

./data/T-X_pair_data/
├── LLaVA/
├── miniGPT-4/
└── VideoChat/

4.2 PanoSent train set

PanoSent_train.json

./data/PanoSent_train.json

4.3 Paraphrase pairs

PpV_train.json

./data/PpV_train.json

5. Training Sentica

Sentica follows a three-stage training process:

Stage 1: Multimodal Understanding Stage

bash scripts/train_stage1.sh

Stage 2: Sextuple Extraction Understanding

bash scripts/train_stage2.sh

…

6. Evaluation

Subtask-I: Panoptic Sentiment Sextuple Extraction

python evaluate_subtask1.py --pred pred.json --gt gold.json

Subtask-II: Sentiment Flipping Analysis

python evaluate_subtask2.py --pred pred.json --gt gold.json

Contact

If you have any questions or feedback, feel free to open an issue or reach out to us at mluo@u.nus.edu

Citation

@inproceedings{luo2024panosent,
  title={Panosent: A panoptic sextuple extraction benchmark for multimodal conversational aspect-based sentiment analysis},
  author={Luo, Meng and Fei, Hao and Li, Bobo and Wu, Shengqiong and Liu, Qian and Poria, Soujanya and Cambria, Erik and Lee, Mong-Li and Hsu, Wynne},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={7667--7676},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly