Skip to content

Avinandan22/PEP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

PEP: Preference Elicitation with Priors

This is the official repository for "Cold-Start Personalization via Training-Free Priors from Structured World Models".

Avinandan Bose*, Shuyue Stella Li*, Faeze Brahman, Pang Wei Koh, Simon Shaolei Du, Yulia Tsvetkov, Maryam Fazel, Lin Xiao, Asli Celikyilmaz

*Equal contribution in alphabetical order


Overview

PEP is a modular framework for cold-start preference elicitation that decomposes the problem into offline structure learning and online Bayesian inference. Rather than training an end-to-end RL policy to ask questions and generate personalized responses, PEP:

  1. Learns a structured world model offline from population-level preference data using latent user embeddings, capturing how preference dimensions correlate across users.
  2. Performs training-free Bayesian inference online for each new user — observing one preference dimension updates beliefs about all unobserved dimensions through the learned structure.
  3. Selects questions via information gain to maximize uncertainty reduction about the user's complete preference profile.

The predicted profile can be passed to any black-box LLM solver for personalized response generation, requiring no retraining at test time.

Key Results

  • 80.8% alignment with ground-truth user preferences vs. 68.5% for RL (GRPO)
  • 3–5× fewer interactions to match RL performance
  • ~10K parameters vs. 8B for RL baselines
  • 2× more adaptive: PEP changes its next question 39–62% of the time when users differ, vs. 0–28% for RL

Evaluated on four reasoning domains from the PrefDisco benchmark: MedQA, AIME, CommonsenseQA, and SocialIQA.

Code

🚧 Code release coming soon. We are cleaning up the codebase and will release it here shortly.

About

Codebase for PEP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors