Now we already know that to do representation learning we need some learning task, in literature researchers have defined many such tasks on video, some of the prominent ones have been discussed below
-
Video Classification Task (Video Level Task and Frame Level Task)
-
Object Tracking Task (Frame Level Task) : This is nothing but doing object detection on each frame of the video. We have already seen how to do frame level tasks here, you can use a similar approach just that here you will use a image object detection model instead of using a image classifiction model !!
- Video Reconstruction / Data Augmentation Task (Video Level Task)