- We introduce Frame Level ALIgment and tRacking (FLAIR), which leverages the video understanding of Segment Anything Model 2 (SAM2) and the vision-language capabilities of Contrastive Language-Image Pre-training (CLIP)
- FLAIR takes a drone video as input and outputs segmentation masks of the species of interest across the video
- Leverages a zero-shot approach, eliminating the need for labeled data, training a new model, or fine-tuning an existing model to generalize to other species
- Readily generalizes to other shark species without additional human effort
- Can be combined with novel heuristics to automatically extract relevant information including length and tailbeat frequency
- FLAIR also requires markedly less human effort and expertise than traditional machine learning workflows, while achieving superior accuracy
You can run the FLAIR pipeline directly in Google Colab for easy setup and visualization:
For optimal performance, please see Step-by-step usage.
# (1) Download FLAIR from GitHub
$ git clone https://github.com/conservation-technology-group/FLAIR.git
$ cd FLAIR
# (2) Resolve dependencies
# Necessary packages can be installed using the following environment and requirements files.
# We strongly recommend using conda to install dependencies.
$ conda env create -f conda_env.yaml
# (3) Activate conda environment
$ conda activate FLAIR
# (4) Install pip requirements
$ pip install -r requirements.txtWe currently support running FLAIR on CUDA-enabled GPUs (i.e. NVIDIA). Mac MPS support will be coming soon! Running on CPU is not recommended.
Set the following parameters in config.yaml
video_dir: Input directory containing individual image framesoutput_txt: Output directory for predicted bounding boxesoutput_dir_prefix: Output directory for predicted masksoutput_pdf_path: Output pdf for visualizing predicted masks
run_every_n_frames: Parameter for running FLAIR every __ frames – typically equal to fps (30)min_mask_length: Minimum object size in pixels (50)max_mask_length: Maximum object size in pixels (150)window_length: Window length for SAM2 Video Propagation (3)
prompts: List of CLIP prompts to usemin_mask_length: Indices of the correct CLIP prompts
Additional parameters can be found in config.yaml
Parameters should be set in config.yaml prior to running
# Run FLAIR with the specified config file (we strongly recommend utilizing a GPU)
$ python3 flair.py --config config.yamlMasks in each frame are saved as polygon coordinates in a .npy file.
# Perform manual mask pruning by keeping or removing specific keys to retain correct objects of interest
# Exactly one of --keep_keys or --remove_keys must be specified
$ python3 prune.py --mask_dir --output_dir --keep_keys --remove_keys
# Example: Keep only keys 1 and 2
$ python3 prune.py --mask_dir path/to/input --output_dir path/to/output --keep_keys 1 2
# Example: Remove keys 3 and 4
$ python3 prune.py --mask_dir path/to/input --output_dir path/to/output --remove_keys 3 4mask_dir: Input directory containing predicted masksoutput_dir: Output directory containing pruned maskskeep_keys: List of mask keys to keep - selected after manual review of output_pdf_pathremove_keys: List of mask keys to remove - selected after manual review of output_pdf_path
If you have used FLAIR in your work, please consider citing our paper!
@misc{lalgudi2025zeroshotsharktrackingbiometrics,
title={Zero-shot Shark Tracking and Biometrics from Aerial Imagery},
author={Chinmay K Lalgudi and Mark E Leone and Jaden V Clark and Sergio Madrigal-Mora and Mario Espinoza},
year={2025},
eprint={2501.05717},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.05717},
}



