Skip to content

antman9914/proplay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProPlay: Procedural Pre-play for Self-Evolving LLM Agents

This is the official implementation of ProPlay: Procedural Pre-play for Self-Evolving LLM Agents.

ProPlay addresses the problem of self-evolving agents in partially observable environments, where agents must continually refine the internal understanding of environmental dynamics. It introduces a preplay framework built on an evolving procedural world model that encourages continual information exchange between planning and memory under a unified architecture.


✨ Method Overview

ProPlay represents environment knowledge as a procedure graph where:

  • Nodes are abstracted procedures induced from successful task trajectories.
  • Directed edges (procedure transitions) encode induced causal transitions among task stages.
  • Reliability record embeddings on each edge track how consistently a transition contributed to success on semantically similar tasks.

Each episode follows a three-phase loop:

  • Pre-play: Before acting, ProPlay queries the procedure world model to construct a procedural trajectory as structured soft guidance.
  • Execute: The agent acts under this guidance while retaining full freedom to deviate and explore.
  • Refine: After execution, new procedures are induced from successful trajectory fragments, and the world model is refined for future task episodes.

This episodic query–execute–refine loop enables ProPlay to progressively internalize environment dynamics, combining the strengths of memory (consolidated procedural knowledge) and planning (task-specific trajectory lookahead) based on a unified world model.


🗂️ Supported Benchmarks

Benchmark Domain Implementation
ScienceWorld Text-based scientific reasoning (23 task types) benchmarks/sciworld/
PlanCraft Minecraft crafting (187 tasks, 3 difficulty levels) benchmarks/plancraft/
τ-bench Customer service tool use (retail & airline) benchmarks/taubench/

📦 Project Structure

proplay/
├── proplay/                    # Core library (benchmark-agnostic)
│   ├── graph.py                # WorkflowGraph: nodes, edges, reliability record embeddings
│   ├── env.py                  # BaseEnv interface
│   └── llm.py                  # LLMClient (OpenAI-compatible)
│
├── benchmarks/
│   ├── sciworld/
│   │   ├── router.py           # SciWorldEnv (AgentGym REST wrapper)
│   │   ├── agent.py            # ProPlay agent: think/act loop
│   │   ├── induction.py        # Procedure induction from episode summaries
│   │   ├── preplay.py          # Pre-play trajectory construction and graph recording
│   │   ├── prompts.py          # LLM prompt templates
│   │   ├── prompt/             # preplay_instruction.txt, preplay_one_shot.txt
│   │   └── pipeline.py         # End-to-end evaluation loop
│   ├── plancraft/
│   │   ├── router.py           # PlanCraft environment wrapper
│   │   ├── agent.py            # ProPlay agent for Minecraft crafting
│   │   ├── graph.py            # WorkflowGraph (plancraft copy)
│   │   ├── induction.py        # Recipe library induction
│   │   ├── preplay.py          # Pre-play for recipe ordering
│   │   ├── prompts.py          # LLM prompt templates
│   │   ├── prompt/             # preplay_instruction.txt, preplay_one_shot.txt
│   │   └── pipeline.py         # Evaluation loop
│   └── taubench/
│       ├── router.py           # tau_bench.envs.get_env wrapper (retail, airline)
│       ├── agent.py            # ProPlay agent with tool-calling support
│       ├── graph.py            # WorkflowGraph (taubench copy)
│       ├── induction.py        # Workflow induction from tool trajectories
│       ├── preplay.py          # Pre-play for tool ordering
│       ├── prompts.py          # LLM prompt templates
│       ├── prompt/             # preplay_instruction.txt, preplay_one_shot.txt
│       ├── llm_client.py       # LLM client with tool-calling extension
│       └── pipeline.py         # Evaluation loop
│
├── data/
│   ├── sciworld/
│   │   ├── gen_online_splits.py    # Generate online evaluation split (shuffled)
│   │   └── splits/                 # Generated
│   ├── plancraft/
│   │   ├── gen_splits.py           # Generate evaluation split from plancraft package
│   │   └── splits/                 # Generated
│   └── taubench/
│       └── load_data.py            # Data loading utilities (data bundled in package)
│
├── prompts/                    # Source copies of pre-play prompt text files
│   ├── sciworld/
│   ├── plancraft/
│   └── taubench/
│
└── scripts/
    ├── run_sciworld.sh
    ├── run_plancraft.sh
    └── run_taubench.sh

🚀 Installation

git clone <this-repo>
cd proplay
pip install -e .

# Benchmark-specific dependencies
pip install -e ".[sciworld]"   # ScienceWorld (agentenv-sciworld + scienceworld)
pip install -e ".[plancraft]"  # PlanCraft
# τ-bench — install from source
pip install git+https://github.com/sierra-research/tau-bench

⚙️ Data Preprocessing

Generate task splits before running ProPlay:

# ScienceWorld
cd data/sciworld
python gen_online_splits.py   # → splits/online_shuffled_ids.json

# PlanCraft (reads val/test splits directly from the installed plancraft package)
cd data/plancraft
python gen_splits.py          # → splits/merged_187_by_complexity.json

τ-bench task data is bundled with the tau_bench package — no preprocessing needed.


🔬 Quick Start

ScienceWorld

# Start AgentGym SciWorld server
python -m uvicorn agentenv_sciworld.server:app --host 0.0.0.0 --port <your_port> &

export OPENAI_API_KEY=<your_key>
bash scripts/run_sciworld.sh

PlanCraft

export OPENAI_API_KEY=<your_key>
bash scripts/run_plancraft.sh

τ-bench

export OPENAI_API_KEY=<your_key>
DOMAIN=retail  bash scripts/run_taubench.sh
DOMAIN=airline bash scripts/run_taubench.sh

👥 Contact

For questions, please contact yma7@nd.edu.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors