Skip to content

UW-Madison-Lee-Lab/TAPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

Jongwon Jeong1, Jungtaek Kim1, Kangwook Lee1,2,3,

1University of Wisconsin–Madison  ·  2KRAFTON  ·  3Ludo Robotics

arXiv

Language Model (LM) agents have demonstrated remarkable capabilities in solving tasks that require multiple interactions with the environment. However, they remain vulnerable in environments where a single error often leads to irrecoverable failure, particularly under strict feasibility constraints. We systematically analyze existing agent frameworks, identifying imperfect planning and stochastic execution as the primary causes. To address these challenges, we propose Tool-guided Adaptive Planning with constrained Execution (TAPE). TAPE enhances planning capability by aggregating multiple plans into a graph and employing an external solver to identify a feasible path. During execution, TAPE employs constrained decoding to reduce sampling noise, while adaptively re-planning whenever environmental feedback deviates from the intended state. Experiments across Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard demonstrate that TAPE consistently outperforms existing frameworks, with particularly large gains on hard settings, improving success rates by 21.0 percentage points on hard settings on average, and by 20.0 percentage points for weaker base models on average.

Figure 1: TAPE overview

Figure 1. Overview. We illustrate our work using Sokoban, where the goal is to push all boxes onto target locations. (a) Sources of Irrecoverable Failure in the ReAct Framework. A planning error occurs when the internal reasoning suggests a non-viable action (e.g., pushing a box against a wall); this makes the goal unachievable because the agent cannot pull the box from the wall. A sampling error arises when LM stochasticity leads to an action that deviates from the plan. (b) Conceptual Toy Analysis. We model simplified agents by injecting planning and sampling errors into a feasible policy for Sokoban. We measure success rates as the task step T increases, and observe that existing frameworks degrade rapidly as T grows. (c) Our Framework. TAPE generates and aggregates multiple plans into a graph and uses a solver to select a feasible path, thereby reducing planning errors. It then enforces constrained execution to suppress sampling errors.

1) Environment and Dependencies

This repository uses requirements.txt for dependency setup. Recommended setup in a fresh conda environment:

conda create -n tape python=3.10 -y
conda activate tape
pip install --upgrade pip
pip install -r requirements.txt

2) Repository Usage

This repository contains Sokoban and ALFWorld experiment code. The main execution entry points are:

  • src/sokoban_real_experiments.py
  • src/alfworld_real_experiments.py
  • src/sokoban_benchmark.py
  • src/arithmetic_experiment.py

Argument semantics in Sokoban scripts:

  • --dataset_path: path to the dataset JSON file.
  • --out_dir: directory for evaluation outputs (CSV/cache), not the dataset file itself.

Argument semantics in arithmetic script:

  • --dataset_path: path to GSM-Hard dataset JSON.
  • --agent: react, pa, or ours_graph_ilp.
  • --time_constraints / --cost_constraints: budget pairs (same length required).

3) API Keys

src/utils/llm.py initializes clients from these files:

  • keys/openai-key.env
  • keys/gemini-key.env
  • keys/anthropic-key.env

If you only use OpenAI models, openai-key.env is typically sufficient.

4) ALFWorld Data Path

The default config file is data/mini_config.yaml. It references $ALFWORLD_DATA, so set this environment variable first:

export ALFWORLD_DATA=/path/to/alfworld_data

5) Quick Start Commands

Sokoban (1-episode smoke run)

python src/sokoban_real_experiments.py \
  --agent ours_graph_ilp \
  --model gpt-4.1-mini \
  --episodes 1 \
  --slack 2 \
  --T_targets 6 \
  --num_plans 4 \
  --num_jobs 1 \
  --print_episodes 0

ALFWorld (1-episode smoke run)

python src/alfworld_real_experiments.py \
  --agent ours_graph_ilp \
  --model gpt-4.1-mini \
  --episodes 1 \
  --slack 8 \
  --config data/mini_config.yaml \
  --split eval_out_of_distribution \
  --num_jobs 1 \
  --print_episodes 0

Arithmetic (GSM-Hard)

python src/arithmetic_experiment.py \
  --agent react \
  --model gpt-4.1-mini \
  --episodes 1 \
  --num_jobs 1 \
  --print_episodes 0 \
  --time_constraints 0.6 \
  --cost_constraints 0.02

6) Dataset Generation (Optional)

To rebuild the Sokoban dataset:

python src/sokoban_benchmark.py \
  --build_dataset \
  --dataset_path data/dataset.json

Citation

@article{jeong2026tape,
    title={{TAPE}: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents},
    author={Jeong, Jongwon and Kim, Jungtaek and Lee, Kangwook},
    journal={arXiv preprint arXiv:2602.19633},
    year={2026}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors