Action100M: A Large-scale Video Action Dataset

Meta FAIR HKUST University of Amsterdam Sorbonne Université

Load Action100M Annotations

Our data can be loaded from the 🤗 huggingface repo at facebook/action100m-preview where we released 10% of the full Action100M for preview. For examples of loading from local parquet files (from cloned repo) and visualization, see usage.ipynb. The data/hySSAAw4t24.json stored in this repo shows a sample.

from datasets import load_dataset

dataset = load_dataset(
    "parquet",
    data_files=f"hf://datasets/facebook/Action100M-preview/data/*.parquet",
    streaming=True,
)
it = iter(dataset["train"])

sample = next(it)

Each sample loaded above contains all annotations for one video, and it has three fields:

video_uid (string): YouTube video id of the source video.
metadata (dict): video-level metadata (title / description / ASR transcript, etc.)
nodes (list[dict]): annotations for each segments.

Each element in nodes is a temporally localized segment in the hierachical Tree-of-Captions, it contains:

start, end (float): segment boundaries in seconds within the full video.
node_id (string): unique id of this segment node.
parent_id (string or null): id of the parent segment. The root node (corresponding to the entire video) has parent_id = null.
level (int): depth in the hierarchy. Smaller level is coarser (longer segments); larger level is finer (shorter segments).
plm_caption (string or null): a caption generated by PLM-3B for this segment.
plm_action (string or null): a short action label produced by PLM-3B.
llama3_caption (string or null): middle frame caption produced by LLama-3.2-Vision-11B for leaf nodes.
gpt (dict or null): main Action100M annotations, available for segments that is not too short:
- gpt["summary"]["brief"]: one-sentence concise caption of the segment.
- gpt["summary"]["detailed"]: longer, detailed summarization of the video segment.
- gpt["action"]["brief"]: short verb phrase naming the step.
- gpt["action"]["detailed"]: imperative-style instruction describing how the action is done.
- gpt["action"]["actor"]: who/what performs the action (noun phrase).

Exampls

Texts shown correspond to brief action description (i.e., gpt["action"]["brief"]).

License

Action100M is under FAIR Noncommercial Research License, as found in the LICENSE file.

Citation

@article{chen2026action100m,
  title={Action100M: A Large-scale Video Action Dataset},
  author={Chen, Delong and Kasarla, Tejaswi and Bang, Yejin and Shukor, Mustafa and Chung, Willy and Yu, Jade and Bolourchi, Allen and Moutakanni, Théo and Fung, Pascale},
  journal={arXiv preprint arXiv:2601.10592},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
utils		utils
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
usage.ipynb		usage.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Action100M: A Large-scale Video Action Dataset

Load Action100M Annotations

Exampls

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

facebookresearch/Action100M

Folders and files

Latest commit

History

Repository files navigation

Action100M: A Large-scale Video Action Dataset

Load Action100M Annotations

Exampls

License

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages