🪩 DISCO: Disentangled Communication Steering for Large Language Models (NeurIPS 2025) 🪩

Max Torop Aria Masoomi Masih Eskandar Jennifer Dy

Northeastern University

[Paper]

📘 Abstract

A variety of recent methods guide large language model outputs via the inferencetime addition of steering vectors to residual-stream or attention-head representations. In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts –a key property motivating the use of steering vectors– than attention head outputs. We analytically characterize the effect of our method, which we term DISentangled COmmunication (DISCO) Steering, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO’s direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to 19.1% higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods.

⚙️ Setup

Dependencies

Create a conda environment:

conda create -n DISCO python=3.11

Activate the environment:

conda activate DISCO

Install Packages

pip install -r requirements.txt

APIs

For both HuggingFace and OpenAI, start by runnning:

cp .env.example .env

HuggingFace

You need access to LLaMA-3.1-8b-Instruct and Gemma-2-9b-it. Both of these models are available on HuggingFace, and can be accessed after requesting permission.
Inside .env fill out

HF_TOKEN=your_hf_token_here

OpenAI

This is only required for experiments that use OpenAI (see below)
Inside .env fill out

OPENAI_API_KEY=your_openai_api_key_here

🪩 Running DISCO

(coming soon!) DISCO.ipynb fully self-contained DISCO notebook optimized for accesibility.

Notebooks

Notebooks/0_Discriminability.ipynb Shows linear discriminability results for query, value and attention head output spaces (+additional key space results).
Notebooks/1_TQA_MC.ipynb TruthfulQA Multiple choice evaluation with logit scoring

Note: The following experiments use a GPT-based judge (GPT-4o) and require an OpenAI API key. Running them incurs API costs. The estimates below correspond to the default configuration in each notebook, which runs the evaluation for a combination of model, method, dataset, and valence (if applicable). These settings can be freely modified.

Notebooks/2_TQA_Open.ipynb Open-ended TruthfulQA evaluation using a GPT judge. Computes the True × Info (T × I) score. The default model/method combination costs approximately ~$0.60 to run.
Notebooks/3_Power_Corr_Wealth.ipynb Scores power-seeking, corrigibility, or wealth-seeking behaviors using a GPT judge. The default dataset/method/model/valence combination costs approximately ~$1.60 to run.

📖 Citation

Please cite our paper if you use our code (bibtex will be updated upon release of NeurIPS 2025 proceedings):

@article{torop2025disco,
  title={DISCO: Disentangled Communication Steering for Large Language Models},
  author={Torop, Max and Masoomi, Aria and Eskandar, Masih and Dy, Jennifer},
  journal={arXiv preprint arXiv:2509.16820},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Notebooks		Notebooks
configs		configs
data		data
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪩 DISCO: Disentangled Communication Steering for Large Language Models (NeurIPS 2025) 🪩

[Paper]

📘 Abstract

⚙️ Setup

Dependencies

APIs

🪩 Running DISCO

Notebooks

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

🪩 DISCO: Disentangled Communication Steering for Large Language Models (NeurIPS 2025) 🪩

[Paper]

📘 Abstract

⚙️ Setup

Dependencies

APIs

🪩 Running DISCO

Notebooks

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages