Implementation of
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
in ACM Multimedia 2024
Request for data used in our work, available to be read by webdataset for both GeoSound and SoundingEarth dataset. Each sample contains CLAP-processed mel-spectrogram features for audio, satellite image and associated metadata.
To download raw audios from the audio sources: aporee, freesound, and iNaturalist, create account (and request for API key if necessary), then find respective metadata .csv files located here, and utilise the following data-download scripts for each of these sources:
aporee: ./geoclap/data_prep/get_SoundingEarth_raw_audio.sh
iNaturalist: ./geoclap/data_prep/iNaturalist_download.py
freesound: ./geoclap/data_prep/freesound_download.py
yfcc: For YFCC, first yfcc-videos need to be downloaded and then audio should be extracted from those videos. Refer to yahoo100m section of ./geoclap/data_prep/README.md for details on this.
-
Clone this repo
git clone git@github.com:mvrl/PSM.git cd PSM/geoclap -
Setting up enviornment
conda env create --file environment.yml conda activate sat2audioNote: Instead of
condait could be easier to pull docker imageksubash/sat2audio:2.0for the project we provide using following steps:docker pull ksubash/sat2audio:2.0 docker run -v $HOME:$HOME --gpus all --shm-size=64gb -it ksubash/geoclap source /opt/conda/bin/activate /opt/conda/envs/sat2audio_demo
-
Copy the pre-trained checkpoint of
SATMAEnamed aspretrain-vit-base-e199.pthprovided in this google drive folder to the location pointed bycfg.satmae_pretrained_ckpt. -
Check
config.pyand setup paths by manually creating relevant directories if needed. -
Assuming that the data is downloaded and paths in
config.pyare properly setup, we are now ready to run experiments related to PSM. Change directory such that we can rungeoclapas a python module.cd ../ -
Assuming wandb is set up correctly for logging purpose, we can now launch the PSM training as follows:
python -m geoclap.train --num_workers 8 \ --probabilistic true \ --metadata_type latlong_month_time_asource_tsource \ --run_name GeoSound_pcmepp_metadata_sentinel \ --dataset_type GeoSound \ --sat_type sentinel \ --mode train \ --wandb_mode online -
Once the training is complete and we have the appropriate checkpoint of the model, we can evaluate the cross-modal retrevial performance of the model. For example,
python -m geoclap.evaluate --ckpt_path GeoSound_pcmepp_metadata_sentinel_best_ckpt_path \ --loss_type pcmepp \ --dataset_type GeoSound \ --test_zoom_level 0 \ --sat_type sentinel \ --metadata_type latlong_month_time_asource_tsource \ --add_text true \ --meta_droprate 0 \ --test_mel_index 0
The best checkpoints for our experiments in the paper can be found here. Please note that these checkpoints are saved under directory with wandb-generated random name for each experiments, therefore refer to the file: ./geoclap/ckpt_paths.py to find appropriate checkpoint path.
@inproceedings{khanal2024psm,
title = {PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping},
author = {Khanal, Subash and Eric, Xing and Sastry, Srikumar and Dhakal, Aayush and Xiong Zhexiao and Ahmad, Adeel and Jacobs, Nathan},
year = {2024},
month = nov,
booktitle = {ACM Multimedia},
}
Follow more works from our lab: The Multimodal Vision Research Laboratory (MVRL)
