Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

readme.md

HTM-Align Dataset [website]

HTM-Align is a manually annotated 80-video subset of HowTo100M (HTM) dataset, to evaluate the alignment performance. It is a test set randomly sampled from Food & Entertaining category of HTM. These videos are not used for any training in our project.

Download

How To Load

import json
htm_align = json.load(open('htm_align.json'))

print(len(htm_align))  # 80

print(htm_align['-3CEg4y7mQM'][1])
# [1, 10.739, 17.535, 'add extra virgin olive oil and garlic']
# format: [alignability (1/0), start(second), end(second), text]

Performance on HTM-Align

method time window for inference HTM-Align R@1 HTM-Align ROC-AUC
CLIP ViT-B32 global 17.5 70.9*
CLIP ViT-B32 64s moving window** 23.4 70.9*
MIL-NCE global 28.7 73.3*
MIL-NCE 64s moving window** 34.2 73.4*
TAN (HTM-370K) exp-D 64s moving window 49.8 75.1

*: since the model does not have a binary classifier for alignability, for each sentence, we first compute the sentence-visual similarity scores, then take its maximum score over time as the alignability measurement to compute ROC-AUC.

**: in the paper, we only reported CLIP and MIL-NCE results with the 'global' time window setting, since CLIP and MIL-NCE do not use long-range temporal context. Here we also show their results with the 'moving window' setting for a fair comparison.

Note: After fixing a bug (I did not divide in this line) in ROC-AUC metric, the reproduced ROC-AUC scores are different with the numbers originally reported in the paper Table 1. Please consider comparing with the new results here. We will update our arXiv paper for this correction.

Reference

If you find this dataset useful for your project, please consider citing our paper:

@InProceedings{Han2022TAN,
    author       = "Tengda Han and Weidi Xie and Andrew Zisserman",
    title        = "Temporal Alignment Networks for Long-term Video",
    booktitle    = "CVPR",
    year         = "2022",
}