Jongwon Jeong1, Jungtaek Kim1, Kangwook Lee1,2,3,
1University of Wisconsin–Madison · 2KRAFTON · 3Ludo Robotics
Language Model (LM) agents have demonstrated remarkable capabilities in solving tasks that require multiple interactions with the environment. However, they remain vulnerable in environments where a single error often leads to irrecoverable failure, particularly under strict feasibility constraints. We systematically analyze existing agent frameworks, identifying imperfect planning and stochastic execution as the primary causes. To address these challenges, we propose Tool-guided Adaptive Planning with constrained Execution (TAPE). TAPE enhances planning capability by aggregating multiple plans into a graph and employing an external solver to identify a feasible path. During execution, TAPE employs constrained decoding to reduce sampling noise, while adaptively re-planning whenever environmental feedback deviates from the intended state. Experiments across Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard demonstrate that TAPE consistently outperforms existing frameworks, with particularly large gains on hard settings, improving success rates by 21.0 percentage points on hard settings on average, and by 20.0 percentage points for weaker base models on average.
Figure 1. Overview. We illustrate our work using Sokoban, where the goal is to push all boxes onto target locations.
(a) Sources of Irrecoverable Failure in the ReAct Framework. A planning error occurs when the internal reasoning suggests a non-viable action (e.g., pushing a box against a wall); this makes the goal unachievable because the agent cannot pull the box from the wall. A sampling error arises when LM stochasticity leads to an action that deviates from the plan.
(b) Conceptual Toy Analysis. We model simplified agents by injecting planning and sampling errors into a feasible policy for Sokoban. We measure success rates as the task step T increases, and observe that existing frameworks degrade rapidly as T grows.
(c) Our Framework. TAPE generates and aggregates multiple plans into a graph and uses a solver to select a feasible path, thereby reducing planning errors. It then enforces constrained execution to suppress sampling errors.
This repository uses requirements.txt for dependency setup.
Recommended setup in a fresh conda environment:
conda create -n tape python=3.10 -y
conda activate tape
pip install --upgrade pip
pip install -r requirements.txtThis repository contains Sokoban and ALFWorld experiment code. The main execution entry points are:
src/sokoban_real_experiments.pysrc/alfworld_real_experiments.pysrc/sokoban_benchmark.pysrc/arithmetic_experiment.py
Argument semantics in Sokoban scripts:
--dataset_path: path to the dataset JSON file.--out_dir: directory for evaluation outputs (CSV/cache), not the dataset file itself.
Argument semantics in arithmetic script:
--dataset_path: path to GSM-Hard dataset JSON.--agent:react,pa, orours_graph_ilp.--time_constraints/--cost_constraints: budget pairs (same length required).
src/utils/llm.py initializes clients from these files:
keys/openai-key.envkeys/gemini-key.envkeys/anthropic-key.env
If you only use OpenAI models, openai-key.env is typically sufficient.
The default config file is data/mini_config.yaml.
It references $ALFWORLD_DATA, so set this environment variable first:
export ALFWORLD_DATA=/path/to/alfworld_datapython src/sokoban_real_experiments.py \
--agent ours_graph_ilp \
--model gpt-4.1-mini \
--episodes 1 \
--slack 2 \
--T_targets 6 \
--num_plans 4 \
--num_jobs 1 \
--print_episodes 0python src/alfworld_real_experiments.py \
--agent ours_graph_ilp \
--model gpt-4.1-mini \
--episodes 1 \
--slack 8 \
--config data/mini_config.yaml \
--split eval_out_of_distribution \
--num_jobs 1 \
--print_episodes 0python src/arithmetic_experiment.py \
--agent react \
--model gpt-4.1-mini \
--episodes 1 \
--num_jobs 1 \
--print_episodes 0 \
--time_constraints 0.6 \
--cost_constraints 0.02To rebuild the Sokoban dataset:
python src/sokoban_benchmark.py \
--build_dataset \
--dataset_path data/dataset.json@article{jeong2026tape,
title={{TAPE}: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents},
author={Jeong, Jongwon and Kim, Jungtaek and Lee, Kangwook},
journal={arXiv preprint arXiv:2602.19633},
year={2026}
}
