Skip to content
View x-zheng16's full-sized avatar

Highlights

  • Pro

Organizations

@CongGroup @Tsinghua-Space-Robot-Learning-Group

Block or report x-zheng16

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
x-zheng16/README.md

header

Research Interests

Website  Google Scholar  Email  Profile Views


About

I am a Research Assistant Professor at HKAI-Sci, City University of Hong Kong. I also work closely with Prof. Xingjun Ma at Fudan University. My research develops robust and efficient RL algorithms for red & blue teaming of science and embodied agents.


Featured Research

Repository Description Stars
Awesome-Embodied-AI-Safety Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses (400+ Papers)
JustAsk Curious Code Agents Reveal System Prompts in Frontier LLMs
System-Prompt-Open Open database of system prompts extracted from frontier LLMs
OpenRedRL OpenRedRL: A Light-Weight Benchmark for RL-Based Red Teaming
ISC-Bench ISC-Bench: Internal Safety Collapse in Frontier LLMs

Selected Publications

Date Paper Venue
2026.05 STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack ICML'26
2026.05 Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models Pattern Recognition
2026.05 Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses arXiv Preprint
2026.03 OpenRedRL: A Light-Weight Benchmark for RL-Based Red Teaming FCS
2026.02 OptiLeak: Efficient Prompt Reconstruction via RL in Multi-tenant LLM Services arXiv Preprint
2026.02 GenBreak: Red Teaming Text-to-Image Generators Using LLMs CVPR'26
2026.01 Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs ICML'26
2026.01 Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety FnT P&S
2025.01 BlueSuffix: Reinforced Blue Teaming for VLMs Against Jailbreak Attacks ICLR'25
2024.12 CALM: Curiosity-Driven Auditing for Large Language Models AAAI'25
2024.04 Constrained Intrinsic Motivation for Reinforcement Learning IJCAI'24
2024.03 Toward Evaluating Robustness of RL with Adversarial Policy DSN'24
2020.06 Clean-Label Backdoor Attacks on Video Recognition Models CVPR'20

Full list on Google Scholar


Professional Service

🏅 ICML 2026 Gold Reviewer Award (Top 25%)

Role Venues
Program Committee NeurIPS 2026, ICML 2026, ICLR 2025/2026, CVPR 2026, ECCV 2026, AAAI 2025/2026, MM 2025/2026, ICRA 2026
Journal Reviewer ACM Computing Surveys (CSUR), IEEE TPAMI, IEEE TDSC, IEEE TSC, IEEE TC
External Reviewer ACL 2026, NeurIPS 2025, ICNP 2025, ESORICS 2022, AsiaCCS 2022, RAID 2021, IEEE IoT-J

footer

Pinned Loading

  1. Awesome-Embodied-AI-Safety Awesome-Embodied-AI-Safety Public

    Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

    Shell 98 3

  2. System-Prompt-Open System-Prompt-Open Public

    Open database of system prompts extracted from frontier LLMs using JustAsk

    HTML 34 5

  3. JustAsk JustAsk Public

    [ICML 2026] JustAsk: Curious Code Agents Reveal System Prompts in Frontier LLMs | Verified on Claude Code | Autoresearch for System Prompt Extraction

    Python 52 20

  4. OpenRedRL OpenRedRL Public

    [FCS] OpenRedRL: A Light-Weight Benchmark for RL-Based Red Teaming

    Python 6 1

  5. wuyoscar/Internal-Safety-Collapse wuyoscar/Internal-Safety-Collapse Public

    Internal Safety Collapse (ISC): Turning the LLM or an AI Agent into a sensitive data generator.

    Python 865 142