Skip to content

ryang1119/AgenticShop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping

This is code repository for AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping

📄 Paper Link: AgenticShop

🎉 Our paper has been accepted Accepted at The Web Conference 2026

Introduction

AgenticShop is a benchmark for evaluating how well agentic systems curate personalized products in open-web shopping environments. It captures realistic shopping intents, diverse user profiles, and fine-grained preferences, and introduces a checklist-driven evaluation framework grounded in verifiable product evidence to measure true personalization beyond simple product search.


System Architecture

Pipeline

The workflow consists of two main phases:

Benchmark Construction: Generate user contexts, queries, and evaluation checklists that form the foundation of the benchmark dataset.

# Step 1: Generate diverse user contexts with shopping preferences and behaviors
python src/benchmark_construction/1_gen_user_context.py --domain clothing --samples 1

# Step 2: Create realistic user queries based on the generated contexts
python src/benchmark_construction/2_gen_user_query.py --domain clothing --samples 1

# Step 3: Build evaluation checklists tailored to each user's preferences
python src/benchmark_construction/3_gen_user_checklist.py --domain clothing --samples 1

Benchmark Evaluation: Run the evaluation pipeline to test different models and approaches against the constructed benchmark, measuring their performance in product curation tasks.

# Run complete evaluation pipeline with example parameters:
# --model-type: Type of model (search_llms or web_agents)
# --model-name: Specific model name (gpt, claude, etc.)
# --category: Product category (clothing, electronics, home, etc.)
# --num-users: Number of users to evaluate
python src/benchmark_evaluation/run_pipeline.py \
  --model-type search_llms \
  --model-name gpt \
  --category clothing \
  --num-users 1

Setup

1. Create Python Virtual Environment

conda create -n agenticshop python=3.10.13
conda activate agenticshop

2. Install Dependencies

pip install -r requirements.txt

3. Environment Configuration

Copy the environment template and add your API keys:

cp env.example .env.local
# Edit .env.local and add your OpenAI API key

4. Project Structure

  • src/benchmark_construction/ - Scripts for generating benchmark data
  • src/benchmark_evaluation/ - Evaluation framework and modules
  • eval_inputs/ - Input data for evaluation
  • eval_results/ - Evaluation outputs and results

Data

Sample user profile data is available in data/user_profiles/ to help you get started with the benchmark. This includes:

  • Pre-generated user contexts and queries
  • Sample evaluation checklists

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%