SunChungEn

Chung-En, Sun SunChungEn

Pinned Loading

Trustworthy-ML-Lab/Steer2Edit Trustworthy-ML-Lab/Steer2Edit Public

Python 1
Trustworthy-ML-Lab/Training_Trustworthy_LRM_with_Refine Trustworthy-ML-Lab/Training_Trustworthy_LRM_with_Refine Public

A new training framework for Trustworthy Large Reasoning Models

Python 4 1
Trustworthy-ML-Lab/ThinkEdit Trustworthy-ML-Lab/ThinkEdit Public

[EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s repr…

Python 18 1
ADV-LLM ADV-LLM Public

[NAACL 25] A framework to build powerful adversarial LLMs that can generate jailbreak prompts.

Python 8
Trustworthy-ML-Lab/CB-LLMs Trustworthy-ML-Lab/CB-LLMs Public

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

Python 33 19
Trustworthy-ML-Lab/Robust_HighUtil_Smoothed_DRL Trustworthy-ML-Lab/Robust_HighUtil_Smoothed_DRL Public

[ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance

Python 6