mveb: fix and unify domain tags across all 50 source datasets#4738
Merged
Conversation
The MVEB+ video task set had inconsistent and partially-wrong `domains`
tags. Issues fixed:
- MSR-VTT had no domain tags at all (empty list). Now tagged ["Web"].
- AVMeme-Exam was tagged with "Music" (it's internet memes, not music
content). Now ["Entertainment", "Web"].
- AudioCaps_AV was tagged "Encyclopaedic" (it's audio captioning). Now
["AudioScene", "Web"].
- VGGSound was tagged just ["Web"] despite being audio-visual events.
Now ["AudioScene", "Web"]. Same fix for VGGSound_AV_RETRIEVAL.
- AV-SpeakerBench was tagged ("Web") on the base task and ("Spoken")
on the PC variant --- same source data, inconsistent tags. Unified
to ("Spoken").
- WorldSense_1min was over-tagged with Entertainment+Music in some
files and just ["Web"] in others. Unified to ["AudioScene", "Scene",
"Web"].
- Several datasets tagged "Spoken" without speech-driven content
(DiDeMo, MSVD, ActivityNetCaptions, VATEX, panda-70m, TUNA-Bench).
Removed the Spoken tag from those.
- AVE-Dataset clustering tasks tagged with ["Music", "Scene", "Spoken"]
(clearly wrong). Now aligned with the rest of AVE-Dataset:
["AudioScene", "Web"].
- MELD was tagged just ["Entertainment"] across base and clustering
variants; MELD is the Friends sitcom, so dialogue is central.
Added "Spoken" -> ["Entertainment", "Spoken"].
- UCF101 missing "Sport" tag. UCF101 has substantial Sport content.
Now ["Scene", "Sport", "Web"].
- Human-Animal-Cartoon missing "Entertainment" tag despite the cartoon
domain. Now ["Entertainment", "Scene", "Web"].
- PerceptionTest missing "Scene" tag despite being a scene-perception
benchmark. Now ["Scene", "Web"].
- Video-MME missing "Spoken" tag despite the narration-heavy content.
Now ["Spoken", "Web"].
- HMDB51 missing "Web" tag (sourced largely from web video). Now
["Scene", "Web"].
- VideoCon, Vinoground (zachz/*) missing "Web" tag. Added.
- RAVDESS tag list kept at ["Spoken"] (speech-emotion primary).
- AVQA tag list extended with "AudioScene" (it's an audio-visual QA
benchmark).
All 50 unique source datasets across 184 video tasks now have
consistent, non-empty domain tags. Verified by re-importing every
task: 184 tasks load cleanly.
Tags use only the existing TaskDomain Literal vocabulary in
task_metadata.py; no new domains added.
KennethEnevoldsen
approved these changes
May 26, 2026
KennethEnevoldsen
left a comment
Contributor
There was a problem hiding this comment.
Didn't go through all but sampled a lot and they seem a lot better. I had qustion of the difference between audio scene and scene
…datasets Adds 5 video content domains to TaskDomain (Activity, Instructional, Egocentric, Nature, Animation) and re-tags datasets that were mislabeled or under-characterized, so the domain set actually reflects benchmark content: - Action recognition (Kinetics-400/600/700, HMDB51, UCF101, SSv2, ActivityNet, VATEX, NExT-QA, Vinoground, VideoCon) -> Activity (was the catch-all "Scene", which means visual place/setting). - Breakfast, YouCook2 -> Instructional (cooking / how-to). - Diving48 -> Activity + Sport. - EgoSchema -> Egocentric (was bare "Web"). - Human-Animal-Cartoon -> Activity + Animation + Nature. - AVMeme-Exam -> + Social (internet memes). - PerceptionTest -> drop misapplied "Scene". Scene is now reserved for genuine visual-scene content (WorldSense). All 184 video tasks load; every domain validates against TaskDomain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
|
@AdnanElAssadi56 should we finalize this one? |
Samoed
approved these changes
Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The MVEB+ video task set had inconsistent and partially-wrong
domainstags.If you add a model or a dataset, please add the corresponding checklist: