Paper: https://www.arxiv.org/abs/2511.07772
Accepted to Neurips25 ResponsibleFM Workshop, AAAI26-Trustworthy Agentic AI Workshop
This is our research codebase for SALT. Includes evaluating and steering large language models to reduce privacy leakage in chain-of-thought (CoT) reasoning. It contains scripts to:
- run baseline and steered generations while capturing layer activations,
- compute privacy/utility metrics (with optional LLM-as-a-judge), and
- analyze which layers are most associated with leakage and save steering vectors.
If you use this repository in academic work, please cite the accompanying paper:
@misc{batra2025saltsteeringactivationsleakagefree,
title={SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought},
author={Shourya Batra and Pierce Tillman and Samarth Gaggar and Shashank Kesineni and Kevin Zhu and Sunishchal Dev and Ashwinee Panda and Vasu Sharma and Maheep Chaudhary},
year={2025},
eprint={2511.07772},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2511.07772},
}leak_eval/eval_cp.py: Baseline evaluation with optional multi-layer activation capture and optional GPT eval.steered_eval_cp_resume.py: Run evaluation with steering vectors applied (single- or multi-layer), resume-safe.find_leak_layers.py: Contrastive per-neuron analysis to rank layers by leakage association and optionally export steering vectors.cp_eval_utils.py: Metric helpers (utility/leakage), GPT evaluation, cost estimation.generate_utils.py: Provider/model helpers and generation utilities.prompts/cp_open_ended_chat/: Prompt templates (vanilla.txt,cot_explicit_unk.txt,reasoning_explicit_unk.txt,situation_template.txt).scripts/: Small utilities (precompute vectors, merge/split datasets, run GPT eval on results, count leaks, sweeps).
notebooks/: Prototyping and blueprint pipeline (instruction.ipynb).results/: Example outputs, sweeps and paper figures (reference only; you can regenerate locally).
- Python 3.10+
- A GPU is recommended for HF models; CPU works for small tests.
- Optional APIs:
- OpenAI (for LLM-as-a-judge):
OPENAI_API_KEY - OpenRouter (if using
--model_provider openrouter):OPENROUTER_API_KEY - Hugging Face token as needed for gated models (
HF_TOKENorHUGGINGFACEHUB_API_TOKEN).
- OpenAI (for LLM-as-a-judge):
See notebooks/instruction.ipynb for a quick start.
All of the other code in here is code we used for the paper. Data from our paper is also provided in /results/.
- Reference results are under
results/final_results/and layer-analysis CSVs underresults/leak_layer_csvs/. - Plot scripts:
results/final_results/results_graph/graph.pyandresults/leak_layer_csvs/graphs/graph.py.
Notebooks
notebooks/instruction.ipynbis a high-level blueprint for the end-to-end pipeline.
Environment notes
- Some models do not support a
systemrole; the code handles this automatically (e.g., Gemma) by stripping the system role in chat templates. - Batch size is auto-tuned from GPU VRAM if not provided.
Results schema (abridged)
eval_cp.pywrites a JSON with at leastdata(list of examples with outputs/metrics) andsummary(aggregate metrics, averages). When GPT eval is run, the summary also includesgpt_utility_score,gpt_pii_leakage, andtotal_gpt_api_cost.
This work builds upon Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers by Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, and Seong Joon Oh (arXiv:2506.15674), and the accompanying AirGapAgent-R Dataset.
Contact & contributions Issues and PRs are welcome. For substantive contributions, please open an issue first to discuss scope. If you build on SALT, let us know—happy to link community extensions here.
Shourya Batra
- Sophomore at Homestead High School who enjoys experimenting with LLMs and playing volleyball and the Euphonium.
Pierce Tillman
- Junior at West Campus High School who loves to search for new ways to make LLMs more intuitive and enthusiast photographer (check out my work on Instagram @warrriorwatch)
Samarth Gaggar
- Sophomore at Dublin High School who enjoys understanding LLM trustworthiness analysis as well as robotics, debate, and research.
Shashank Kesineni
- Sophomore at Rock Ridge High School who loves learning why LLMs behave the way they do and enjoys soccer, debate, and volunteering.