Skip to content

spatigen/vhub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound

arXiv Hugging Face Datasets GitHub Repo Homepage

📗 Overview

To gauge and diagnose the capacity of multimodal large language models (MLLMs) for humor understanding, we introduce v-HUB, a novel video humor understanding benchmark. It comprises a curated collection of non-verbal short videos, reflecting real-world scenarios where humor can be appreciated purely through visual cues.

📐 Dataset Examples

🔮 Data Curation and Evaluation Pipeline

📍 Filtering

We deploy the Whisper model and only retain videos with less than 10 characters.

python ./filter/extract_speech_text.py

📍 Annotation

Our annotation platform is Label Studio, please refer to Annotation_Manual and Label Studio for setting up the platform.

📍 Evaluation

Step 1: Get the Code and Data

git clone https://github.com/spatigen/vhub.git
cd vhub
# Make sure git-lfs is installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/Foreverskyou/v-HUB

Step 2: Configure and Run

  1. Prepare Data: Unzip the all_data.zip file located in the dataset directory you just cloned. This will create an all_data folder.

  2. Update Paths: Open the evaluation script you wish to use (e.g., ./scripts/Text_Only/example_QA.sh). Update the VIDEO_DIR, QUESTIONS_CSV and CAND_FILE variables to the absolute paths of your dataset files.

  3. Run Evaluation: After updating variables and installing the necessary dependencies for the model, try to execute the script.

./scripts/Text_Only/example_QA.sh

Here we provide example scripts for the three tasks under the three settings: Text-Only, Video-Only, and Video+Audio.

You can specify different tasks, such as: ['QA','explanation','matching']. And you can also specify different models, for example:['Qwen2.5-Omni','Qwen2.5-VL','Gemini2.5-flash','GPT-4o','InterVL 3.5','Minicpm 2.6-o','video SALMONN 2']

📮 Contact

If you have any questions, please feel free to contact us:

shi_zpeng@sjtu.edu.cn

yannzhao.ed@gmail.com

📝 License

v-HUB is only used for academic research. Commercial use in any form is prohibited.
It contains a collection of funny videos collected from two complementary domains.
Therefore, the copyright of all videos belongs to the video owners.
If there is any infringement in v-HUB, please email shi_zpeng@sjtu.edu.cn, and we will remove it immediately.
Without prior approval, you cannot distribute, publish, copy, disseminate, or modify v-HUB in whole or in part. 
You must strictly comply with the above restrictions.

Please send an email to shi_zpeng@sjtu.edu.cn.

✒️ Citation

If you find our work helpful for your research, please consider citing our work.

@misc{shi2026vhubbenchmarkvideohumor,
      title={v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound}, 
      author={Zhengpeng Shi and Yanpeng Zhao and Jianqun Zhou and Yuxuan Wang and Qinrong Cui and Wei Bi and Songchun Zhu and Bo Zhao and Zilong Zheng},
      year={2026},
      eprint={2509.25773},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.25773}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors