Skip to content
View PanoSent's full-sized avatar

Block or report PanoSent

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
PanoSent/README.md

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Meng Luo · Hao Fei · Bobo Li · Shengqiong Wu · Qian Liu ·
Soujanya Poria · Erik Cambria · Mong-Li Lee · Wynne Hsu

National University of Singapore · Wuhan University · The University of Auckland ·
Singapore University of Technology and Design · Nanyang Technological University

arXiv PDF Project Page


avatar

Abstract

While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversational ABSA, where two novel subtasks are proposed: Panoptic Sentiment Sextuple Extraction, panoramically recognizing holder, target, aspect, opinion, sentiment, rationale from multi-turn multi-party multimodal dialogue. Sentiment Flipping Analysis, detecting the dynamic sentiment transformation throughout the conversation with the causal reasons. To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit & explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism. Extensive evaluations demonstrate the superiority of our methods over strong baselines, validating the efficacy of all our proposed methods. The work is expected to open up a new era for the ABSA community.

Sentica

We develop a novel MLLM, Sentica, which adopts the FlanT5 (XXL) as the core LLM for semantics understanding and decision-making. For non-text inputs, we use multimodal models to encode signals into LLM-understandable representations. We use ImageBind as the unified encoder for all three non-text modalities due to its strong capabilities, followed by a linear layer that connects ImageBind to the LLM for representation projection.


1. Code Structure

PanoSent/                     
├── data/
│   ├── T-X_pair_data/                 
│   │   ├── LLaVA/
│   │   ├── miniGPT-4/
│   │   └── VideoChat/
│   ├── PanoSent_train.json            
│   └── PpV_train.json                 
├── PanoSent/
│   ├── model/
│   │   ├── imagebind_encoder.py       
│   │   ├── flant5_model.py          
│   │   ├── projection_layer.py       
│   │   └── lora_utils.py             
│   ├── utils/
│   │   └── imagebind_utils.py        
│   └── datasets/
│       ├── stage1_caption_dataset.py 
│       ├── stage2_sextuple_dataset.py 
│       └── stage3_entailment_dataset.py 
├── scripts/
│   ├── train_stage1.sh               
│   ├── train_stage2.sh               
│   └── …           
├── train.py                           
├── evaluate_subtask1.py              
├── evaluate_subtask2.py               
├── requirements.txt                  
└── README.md

2. Environment Preparation

conda create -n sentica python=3.10
conda activate sentica

git clone https://github.com/PanoSent/PanoSent.git

3. Preparing Pre-trained Checkpoints

  • ImageBind
    Download the official imagebind_huge.pth checkpoint from here, and place it at:

    ./imagebind/imagebind_huge.pth
    
  • Flan-T5
    We use Flan-T5 XXL as the LLM backbone.

4. Preparing Datasets

Sentica consists of three instruction tuning stages. The corresponding datasets are:

4.1 ‘Text+X’ pairs

  • LLaVA
  • miniGPT-4
  • VideoChat

After downloading these datasets, organize them as:

./data/T-X_pair_data/
├── LLaVA/
├── miniGPT-4/
└── VideoChat/

4.2 PanoSent train set

  • PanoSent_train.json
./data/PanoSent_train.json

4.3 Paraphrase pairs

  • PpV_train.json
./data/PpV_train.json

5. Training Sentica

Sentica follows a three-stage training process:

  • Stage 1: Multimodal Understanding Stage
bash scripts/train_stage1.sh
  • Stage 2: Sextuple Extraction Understanding
bash scripts/train_stage2.sh

6. Evaluation

Subtask-I: Panoptic Sentiment Sextuple Extraction

python evaluate_subtask1.py --pred pred.json --gt gold.json

Subtask-II: Sentiment Flipping Analysis

python evaluate_subtask2.py --pred pred.json --gt gold.json

Contact

If you have any questions or feedback, feel free to open an issue or reach out to us at mluo@u.nus.edu

Citation

@inproceedings{luo2024panosent,
  title={Panosent: A panoptic sextuple extraction benchmark for multimodal conversational aspect-based sentiment analysis},
  author={Luo, Meng and Fei, Hao and Li, Bobo and Wu, Shengqiong and Liu, Qian and Poria, Soujanya and Cambria, Erik and Lee, Mong-Li and Hsu, Wynne},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={7667--7676},
  year={2024}
}

Popular repositories Loading

  1. PanoSent PanoSent Public

    This repository hosts the code, data and model weight of PanoSent.

    Python 58 5

  2. PanoSent.github.io PanoSent.github.io Public

    JavaScript 5 1

  3. MM25-challenge MM25-challenge Public

    Multimodal Conversational Aspect-based Sentiment Analysis Challenge (MCABSA)

    JavaScript 1 1

  4. MM25-challenge-1 MM25-challenge-1 Public

    Forked from AvaMERG/MM25-challenge

    HTML