TRKT:Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring
By Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang Liu*
Accepted by ICCV2025
Following PLA/env.yaml to construct the virtual environment.
-
For object detection results, we use pre-trained object detector VinVL, you can follow steps in our baseline PLA to generate them on your own, or directly use our pre-processed detection results in step 3 below.
-
For dataset, download from Action Genome.
-
Download necessary weakly-supervised annotation files and pre-trained weight (stored in Google Drive:Link or in Baidu Cloud:Link with password 1234)), the final data structure should be like
| -- data
| -- action-genome
| -- frames
| -- videos
| -- annotations
| -- AG_detection_results_refine
| -- refine
| -- output # pre-trained relation aware transformer weight
|--checkpoint.pth
| -- PLA
| -- models # pre-trained scene graphe generation weight
|--model.tar
| -- RAFT
cd refine
python scripts/evaluate.py # evaluate the performance of object detection
| Model | AP@1 | AP@10 | AR@1 | AR@10 | Weight |
|---|---|---|---|---|---|
| PLA(baseline) | 11.4 | 11.6 | 33.3 | 37.6 | - |
| Ours | 23.0 | 25.2 | 28.8 | 43.8 | Google Drive: weight Baidu Cloud: weight password 1234 |
cd PLA
python test.py --cfg configs/final.yml # for final scene graph generation performance evaluation
| Model | W/R@10 | W/R@20 | W/R@50 | N/R@10 | N/R@20 | N/R@50 | weight |
|---|---|---|---|---|---|---|---|
| PLA(baseline) | 14.32 | 20.42 | 25.43 | 14.78 | 21.72 | 30.87 | - |
| Ours | 17.56 | 22.33 | 27.45 | 18.76 | 24.49 | 33.92 | Google Drive : weight Baidu Cloud: weight password 1234 |
We use RAFT to generate the optical flow in our data, you can either use our pre-processed optical flow (stored in Link) or generate them on you own by following steps:
cd RAFT ## download the RAFT ckpt accordingly
python process_optical_flow.py
python post_process.py
Then place the generated optical flow file for train and test set under folder ~/data/action-genome/.
cd refine
python scripts/train.py
cd PLA
python train.py --cfg configs/oneframe.yml # after this line training, select the best oneframe ckpt as the model_path parameter in oneframe.yml for next line training
python train.py --cfg configs/final.yml # for video SGG model
We build our project upon PLA, RAFT, thanks for their works.
@misc{xu2025graph,
title={Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced In-domain Knowledge Transferring},
author={Zhu Xu and Ting Lei and Zhimin Li and Guan Wang and Qingchao Chen and Yuxin Peng and Yang Liu},
year={2025},
booktitle={ICCV},
organization={IEEE}
}
