Dongrui Liu

Ph.D

Biography

Currently, I am working at the Shanghai AI Lab as a Research Scientist. I got a Ph.D. degree from Shanghai Jiao Tong University and was supervised by Prof. Quanshi Zhang. Before that, I received my B.Sc. degree from Northeastern University.

My research interests include topics in the trustworthiness of multi-modal foundation models/agents and AI for Science, e.g., jailbreak, defense, alignment, interpretability, efficient reasoning, computer use/mobile agent, and embodied agent safety.

We are hiring full-time researchers, interns, and joint PhDs (with SJTU/FDU, etc) to work together on Trustworthy AI! I am open to collaboration and discussions. Feel free to drop me an email at drliu96@sjtu.edu.cn if you are interested.

Honors & Awards

ACL 2025 Outstanding Paper Award (Top 0.3%)
CVPR 2024 Best Paper Award Candidate (Top 0.2%)
Shanghai Jiao Tong University Outstanding Graduate (Top 4%)
National Scholarship (Top 2%)

News

[04/2026] Honored to give a talk @ Alibaba.
[04/2024] Four papers were accepted by ACL 2026.
[04/2026] Honored to give a talk @ Nanjing University.
[03/2025] Honored to give a talk @ Tsinghua University.
[03/2025] Honored to give a talk @ Tencent.
[01/2026] Ten papers were accepted by ICLR 2026.
[12/2025] Honored to give a talk @ Zhejiang University.
[11/2025] Honored to give a talk @ Peking University.
[11/2025] Two papers were accepted by AAAI 2026 (Oral*1).
[10/2025] Honored to give a talk @ ShanghaiTech University.
[10/2025] Honored to give a talk @ Tongji University.
[08/2025] Two papers were accepted by NeurIPs 2025.
[08/2025] Two papers were accepted by EMNLP 2025.
[05/2025] Six papers were accepted by ACL 2025 (Outstanding award*1, Oral*3).
[02/2025] One paper was accepted by CVPR 2025.
[01/2025] One paper was accepted by ICLR 2025 Oral.
[09/2024] One paper was accepted by NeurIPs 2024.
[04/2024] Two papers were accepted by ACL 2024.
[04/2024] I joined Shanghai AI Lab as a research scientist.
[03/2024] I graduated from Shanghai Jiao Tong University with Outstanding Doctoral Graduate Award.

Projects

AgentDoG
A Diagnostic Guardrail Framework for AI Agent Safety and Security
SafeWork-F1
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.0 & 1.5
SafeWork-R1
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45◦ Law

Selected Publications [Google Scholar]

* indicates equal contribution or project lead. † indicates corresponding author.

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
Dongrui Liu *, Qihan Ren, Chen Qian, Shuai Shao, Yuejin Xie, Yu Li, Zhonghao Yang, Haoyu Luo, Peng Wang, Qingyu Liu, Binxin Hu, Ling Tang, Jilin Mei, Dadi Guo, Leitao Yuan, Junyao Yang, Guanxu Chen, Qihao Lin, Yi Yu, Bo Zhang, Jiaxuan Guo, Jie Zhang, Wenqi Shao, Huiqi Deng, Zhiheng Xi, Wenjie Wang, Wenxuan Wang, Wen Shen, Zhikai Chen, Haoyu Xie, Jialing Tao, Juntao Dai, Jiaming Ji, Zhongjie Ba, Linfeng Zhang, Yong Liu, Quanshi Zhang, Lei Zhu, Zhihua Wei, Hui Xue, Chaochao Lu, Jing Shao, Xia Hu
Technical Report
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
Qihan Ren, Peng Wang, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu†
Preprint
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis
Yu Li, Haoyu Luo, Yuejin Xie, Yuqian Fu, Zhonghao Yang, Shuai Shao, Qihan Ren, Wanying Qu, Yanwei Fu, Yujiu Yang, Jing Shao, Xia Hu, Dongrui Liu†
Preprint
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang.
Technical Report
The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution
Chen Qian, Peng Wang, Dongrui Liu †, Junyao Yang, Dadi Guo, Ling Tang, Jilin Mei, Qihan Ren, Shuai Shao, Yong Liu, Jie Fu, Jing Shao, Xia Hu
Preprint
Interpreting Emergent Extreme Events in Multi-Agent Systems
Ling Tang, Jilin Mei, Dongrui Liu †, Chen Qian, Dawei Zhong, Jing Shao, Xia Hu
Preprint
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Shuai Shao, Qihan Ren, Dongrui Liu*, Chen Qian, Boyi Wei, Dadi Guo, Jingyi Yang, Xinhao Song, Linfeng Zhang, Weinan Zhang, Jing Shao
ICLR 2026
Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm
Dadi Guo, Tianyi Zhou, Dongrui Liu*, Chen Qian, Qihan Ren, Shuai Shao, Zhiyuan Fan, Yi R Fung, Kun Wang, Linfeng Zhang, Jing Shao
ICLR 2026
Rethinking Entropy Regularization in Large Reasoning Models
Yuxian Jiang, Yafu Li, Guanxu Chen, Dongrui Liu†, Yu Cheng, Jing Shao
Preprint
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
Guanxu Chen, Yafu Li, Yuxian Jiang, Chen Qian, Qihan Ren, Jingyi Yang, Yu Cheng, Dongrui Liu†, Jing Shao
ICLR 2026
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
TMLR 2025
The Devil Behind The Mask: An Emergent Safety Vulnerability of Diffusion LLMs
Zichen Wen, Jiashu Qu, Dongrui Liu*, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang
ICLR 2026
IS-BENCH: Evaluating Interactive Safety of VLM-driven Embodied Agents in Daily Household Tasks
Xiaoya Lu, Zeren Chen, Xuhao Hu, Yijin Zhou, Weichen Zhang, Dongrui Liu†, Lu Sheng, Jing Shao
AAAI 2026
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Qingyan Wei, Yaojie Zhang, Zhiyuan Liu, Dongrui Liu, Linfeng Zhang
ICLR 2026
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations
Yubo Zhu, Dongrui Liu*, Zecheng Lin, Wei Tong, Sheng Zhong, Jing Shao
EMNLP 2025
Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning
Chen Qian, Dongrui Liu*, Haochen Wen, Zhen Bai, Yong Liu, Jing Shao
NeurIPS 2025
RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents
Jingyi Yang, Shuai Shao, Dongrui Liu*, Jing Shao
NeurIPS 2025
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models
Yifan Jia, Kailin Jiang, Yuyang Liang, Qihan Ren, Yi Xin, Rui Yang, Fenze Feng, Mingcai Chen, Hengyang Lu, Haozhe Wang, Xiaoye Qu, Dongrui Liu, Lizhen Cui, Yuntao Du
AAAI 2026 (Oral)
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution
Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang
Preprint
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng
Preprint
Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective
Xiaoye Qu, Zengqi Yu, Dongrui Liu, Wei Wei, Daizong Liu, Jianfeng Dong, Yu Cheng
ACL 2025 (Oral)
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma, Dongrui Liu*, Qian Chen, Linfeng Zhang, Jing Shao
ACL 2025
X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability
Xiaoya Lu, Dongrui Liu*, Yi Yu, Luxin Xu, Jing Shao
EMNLP 2025
SEER: Self-Explainability Enhancement of Large Language Models’ Representations
Guanxu Chen, Dongrui Liu, Tao Luo, Jing Shao
Preprint
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang, Kaichen Huang, Jiahao Huo, Yibo Yan, Sirui Huang, Dongrui Liu, Mengxi Gao, Jie Zhang, Chen Qian, Kun Wang, Yong Liu, Jing Shao, Hui Xiong, Xuming Hu
Preprint
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Xuhao Hu, Dongrui Liu*, Hao Li, Xuanjing Huang, Jing Shao
ACL 2025
DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models
Chen Qian, Dongrui Liu*, Jie Zhang, Yong Liu, Jing Shao
ACL 2025 (Oral)
REEF: Representation Encoding Fingerprints for Large Language Models
Jie Zhang, Dongrui Liu*, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao
ICLR 2025 (Oral)
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Qibing Ren, Hao Li, Dongrui Liu*, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao
ACL 2025 (Outstanding Paper Award)
Self-Supervised Multi-Frame Neural Scene Flow
Dongrui Liu, Daqi Liu, Xueqian Li, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun Chang, Lei Chu
Preprint
MLP Can Be A Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang
CVPR 2024 (Best Paper Award Candidates)
Towards the Difficulty for a Deep Neural Network to Learn Concepts of Different Complexities
Dongrui Liu, Huiqi Deng, Xu Cheng, Qihan Ren, Kangrui Wang, Quanshi Zhang
NeurIPS 2023
Self-Supervised Point Cloud Registration with Deep Versatile Descriptors for Intelligent Driving
Dongrui Liu, Chuanchuan Chen, Changqing Xu, Robert Qiu, Lei Chu
IEEE Transactions on Intelligent Transportation Systems (T-ITS)
A Robust and Reliable Point Cloud Recognition Network Under Rigid Transformation
Dongrui Liu, Chuanchuan Chen, Changqing Xu, Qi Cai, Lei Chu, Fei Wen, Robert Qiu
IEEE Transactions on Instrumentation and Measurement (TIM)
Trap of Feature Diversity in the Learning of MLPs
Dongrui Liu, Shaobo Wang, Jie Ren, Kangrui Wang, Sheng Yin, Huiqi Deng, Quanshi Zhang
Preprint

Academic Services

Area Chair:
NeurIPs Position Paper Track, ICLR LLA Workshop,
Conference Reviews:
ICML, CVPR, ICCV, ECCV, NeurIPs, ICLR, AAAI, ACL, AISTATS, ICRA, IROS, PAMI, IJCV, TKDE, TVCG, TCSVT, T-ITS, TMLR