This repository provides the code for our paper "How Well Does Generative Recommendation Generalize?"
🚀 This work was accepted to KDD 2026 under the title "On the Memorization and Generalization of Generative Recommendation".
In this work, we study the memorization and generalization behavior of generative recommendation (GR) models. We introduce a fine-grained evaluation framework that categorizes test instances by memorization and generalization patterns, and a token-level memorization analysis that explains why GR generalizes better but memorizes worse than conventional models. We further propose an adaptive ensemble method that leverages confidence-based indicators to combine GR and conventional models, improving overall performance.
We release instance-level memorization/generalization annotations and saved model checkpoints for the 7 open-source datasets used in the paper.
conda env create -f environment.yml
conda activate GenRec
pip install -r requirements.txtTrain SASRec or TIGER on a single GPU:
CUDA_VISIBLE_DEVICES=0 python main.py \
--model=SASRec \
--dataset=AmazonReviews2014 \
--category=Sports_and_OutdoorsCUDA_VISIBLE_DEVICES=0 python main.py \
--model=TIGER \
--dataset=AmazonReviews2014 \
--category=Sports_and_OutdoorsMulti-GPU training with accelerate:
accelerate launch --num_processes=2 --mixed_precision=fp16 main.py \
--model=TIGER \
--dataset=AmazonReviews2014 \
--category=Sports_and_OutdoorsTraining parameters can be overridden via command line (see genrec/default.yaml for all options).
Evaluate a trained model with memorization/generalization breakdown:
CUDA_VISIBLE_DEVICES=0 python mem_gen_evaluation.py \
--model=TIGER \
--dataset=AmazonReviews2014 \
--category=Sports_and_Outdoors \
--checkpoint_path=path/to/TIGER.pth \
--sem_ids_path=path/to/semantic_ids.sem_ids \
--eval=test \
--save_inferenceTo evaluate across all datasets for both models:
bash scripts/eval/eval_mem_gen.shScripts under scripts/analysis/ reproduce the analysis results in the paper. For example, to reproduce the support coverage analysis:
bash scripts/analysis/run_support_coverage.shOther analysis scripts include run_performance_analysis.sh, run_codebook_intervention.sh, run_indicator_validation.sh.
Run inference for both models and perform the adaptive ensemble grid search:
bash scripts/eval/eval_adaptive_ensemble.shPlease cite the following paper if you find our code helpful.
@inproceedings{ding2026generalize,
title={On the Memorization and Generalization of Generative Recommendation},
author={Yijie Ding and Zitian Guo and Jiacheng Li and Letian Peng and Shuai Shao and Wei Shao and Xiaoqiang Luo and Luke Simon and Jingbo Shang and Julian McAuley and Yupeng Hou},
booktitle={{KDD}},
year={2026}
}