Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
This code is based on following libraries:
python=3.8pytorch=1.7.0(with cuda 10.2)
To create virtual environment with all necessary libraries:
conda env create -f environment.ymlBy default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:
ln -s {path_to_your_data_directory} dataWe use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).
We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!
Default configuration is provided in code/config.py. To run with this configuration:
python cli.pyTo run with custom configuration, either modify code/config.py or execute:
python cli.py with {{flags_at_your_disposal}}Model weight is saved under ./data/log directory. To run inference only:
python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pthIf you find our work useful in your research, please consider citing:
@InProceedings{Yun2021PanoAVQA,
author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
booktitle = {ICCV},
year = {2021}
}If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.
