This is the official repository for "EVA: Editing for Versatile Alignment against Jailbreaks" (Accepted by IEEE TPAMI 2026) ππ.
In this work, we propose EVA, an editing-based framework for versatile alignment against jailbreak attacks. EVA aims to improve the robustness and safety alignment of LLMs and VLMs under diverse jailbreak scenarios, while maintaining their general capabilities on benign tasks.
- π EVA: Editing for Versatile Alignment against Jailbreaks has been accepted by IEEE TPAMI 2026.
- π§ The code for the VLM part is currently being organized and will be released soon.
- π§ For the LLM part, please first refer to our previous repository: DELMAN.
The LLM-related implementation of EVA is closely related to our previous work DELMAN. Before the EVA code is fully released, you may refer to the DELMAN repository for the LLM model editing pipeline:
The VLM-related code is currently being organized and will be released soon.
The images used by the three HarmBench image data files in data/ (HarmBench_images_llava.json, HarmBench_images_qwen.json, and HarmBench_images_internvl.json) are available in this Google Drive folder:
- Release LLM-related code
- Release the EVA paper link
- Release VLM-related code
@ARTICLE{11523146,
author={Wang, Yi and Qiu, Hongye and Xu, Yue and Yang, Sibei and Qin, Zhan and Huang, Minlie and Wang, Wenjie},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={EVA: Editing for Versatile Alignment against Jailbreaks},
year={2026},
volume={},
number={},
pages={1-16},
keywords={Automatic speech recognition;Modeling;Safety;Visualization;Large language models;Conferences;Optimization;Educational institutions;Light emitting diodes;Tuning;Safety Alignment;Jailbreak Attacks;Model Editing;Large Language Models;Vision Language Models},
doi={10.1109/TPAMI.2026.3694189}}