Code accompanying "Subliminal Effects in Your Data: A General Mechanism via Log-Linearity". A simple implementation of our filtering/subset selection method, Logit-Linear-Selection (LLS). We provide a minimal end-to-end example showing how to transfer an affinity for owls from a system-prompted teacher (OLMo2-1B-Instruct) to a student model (Llama3.2-1B-Instruct) via preference tuning on an LLS dataset.
We use the stack_exchange_paired subset of Tulu 2.5, keeping examples with prompts under 250 tokens and truncating responses to 20 tokens. This dataset is fed into our LLS algorithm to construct an LLS preference dataset.
Requirements: torch, transformers, datasets, accelerate, trl, peft, numpy, pyyaml, tqdm
pip install -r requirements.txtSee requirements.txt for tested versions. Requires access to Llama 3.2 via HuggingFace.
- Set
local_rootinconfig.yamlto your desired output directory - Ensure
HF_HOMEandHF_TOKENenvironment variables are set
Step 1: Logit-Linear Selection
python logit_linear_selection.pyStep 2: Preference Tuning with DPO
python training.pyThe code uses HuggingFace Accelerate and extends naturally to multi-GPU and multi-node setups:
accelerate launch --num_processes <NUM_GPUS> logit_linear_selection.py
accelerate launch --num_processes <NUM_GPUS> training.pyFor SLURM clusters, wrap with srun to ensure proper GPU allocation. See Accelerate documentation for details.