This is the official repository for "Cold-Start Personalization via Training-Free Priors from Structured World Models".
Avinandan Bose*, Shuyue Stella Li*, Faeze Brahman, Pang Wei Koh, Simon Shaolei Du, Yulia Tsvetkov, Maryam Fazel, Lin Xiao, Asli Celikyilmaz
*Equal contribution in alphabetical order
PEP is a modular framework for cold-start preference elicitation that decomposes the problem into offline structure learning and online Bayesian inference. Rather than training an end-to-end RL policy to ask questions and generate personalized responses, PEP:
- Learns a structured world model offline from population-level preference data using latent user embeddings, capturing how preference dimensions correlate across users.
- Performs training-free Bayesian inference online for each new user — observing one preference dimension updates beliefs about all unobserved dimensions through the learned structure.
- Selects questions via information gain to maximize uncertainty reduction about the user's complete preference profile.
The predicted profile can be passed to any black-box LLM solver for personalized response generation, requiring no retraining at test time.
- 80.8% alignment with ground-truth user preferences vs. 68.5% for RL (GRPO)
- 3–5× fewer interactions to match RL performance
- ~10K parameters vs. 8B for RL baselines
- 2× more adaptive: PEP changes its next question 39–62% of the time when users differ, vs. 0–28% for RL
Evaluated on four reasoning domains from the PrefDisco benchmark: MedQA, AIME, CommonsenseQA, and SocialIQA.
🚧 Code release coming soon. We are cleaning up the codebase and will release it here shortly.