Auto-Annotation from Expert-Crafted Guidelines

The 1st workshop on AutoExpert

This website is under construction.

Location: tbd, Denver CO

Time: tbd

in conjunction with CVPR 2026, Denver CO, USA


Overview

Machine-learned visual systems are transforming numerous fields such as autonomous driving, biodiversity assessment, and ecological monitoring, but they hunger for vast, high-quality annotated data. While asking domain experts to manually annotate large-scale data is unrealistic, the current paradigm to scale up data annotation is to have domain experts craft annotation guidelines using visual examples and descriptions for non-expert annotators to apply. This paradigm is commonly adopted by companies which provide data labeling services. Lacking domain knowledge, ordinary annotators often produce annotations that are erroneous, subjective, biased, and inconsistent. Further, this process is labor-intensive, tedious, and costly. This workshop aims to pioneer auto-annotation, developing AI agents that can interpret expert-crafted annotation guidelines and generate labels automatically. In essence, we seek to replace ordinary human annotators with AI.


Topics

This workshop aims to bring together computer vision researchers and practitioners from both academia and industry who are interested in the topic of auto-anontation from expert-crafted guidelines (AutoExpert). It involves multiple research topics as listed below.

  • data: web-scale of data, domain-specific data, multimodal data, synthetic data, etc.
  • concepts: taxonomy, ontology, vocabulary, expert/human-in-the-loop, etc.
  • models: foundation models, expert models, Large Multimodal Models (LMMs), Large Language Models (LLMs), Vision-Language Models (VLMs), Large Vision Models (LVMs), etc.
  • learning: foundation model adaptation, few-shot learning, semi-supervised learning, domain adaptation, active learning, etc.
  • social impact: inter-disciplinary research, real-world application, responsible AI, etc.
  • misc: dataset curation, annotation guidelines, machine-expert interaction, etc.

Speakers


Shu Kong
UMacau

Serge Belongie
University of Copenhagen

Jason Corso
UMich & Voxel51



Organizers

Please contact Shu Kong with any questions: aimerykong [at] gmail [dot] com


Shu Kong
UMacau

Jason Corso
UMich & Voxel51


Advisory Board

Serge Belongie
University of Copenhagen



Challenge Organizers


Shu Kong
UMacau

Carlos Jaramillo
Smithsonian Institution

Alexander E. White
Smithsonian Institution

Marc-Ellie Adaime
Smithsonian Institution

Tian Liu
Texas A&M

Yunhan Zhao
Deepmind / UCI

Di Wu
UMacau

Di Wu
UMacau




Important Dates and Details



Program Schedule

This section is under construction! Content below are copied from somewhere else!

CDT
Event
Presenter / Title
Links
09:00 - 09:20
Opening remarks
Shu Kong University of Macau
Visual Perception via Learning in an Open World
09:20 - 10:00
Invited talk #1
Kristen Grauman, UT Austin
Human activity in the open world
10:00 - 10:40
Invited talk #2
Gunshi Gupta, Yarin Gal, University of Oxford
tba
10:40 - 11:20
Invited talk #3
Grant Van Horn, UMass-Amherst
Merlin Sound ID
11:20 - 12:00
Invited talk #4
Abhinav Gupta, CMU
Scaling Robotics via Open World Visual Learning
12:00 - 13:00
Lunch
13:00 - 13:40
Invited talk #5
Yuxiong Wang, UIUC
Putting Context First in Open-World Perception
13:40 - 14:20
Invited talk #6
Deepak Pathak, CMU
Learning to Reason via RL In the Open World
14:20 - 15:00
Invited talk #7
Liangyan Gui, UIUC
Animating Human-Object Interactions in the Wild
15:00 - 15:05
Coffee break
15:05 - 15:45
Challenge-1
Challenge 1: InsDet
Object Instance Detection Challenge
15:45 - 16:25
Challenge-2
Challenge-2: Foundational FSOD
Foundational Few-Shot Object Detection Challenge v2
16:25 - 16:30
Closing remarks