Scaling RL Environments for Terminal Agents
Endless Terminals is a fully autonomous pipeline that procedurally generates terminal-use tasks without human annotation for training terminal agents with reinforcement learning.
Prerequisites: Python 3.12+, uv
# Install Apptainer
./scripts/install_apptainer.sh
# Install dependencies
uv sync
# Download base container
./scripts/get_ubuntu_sif.shStart a vLLM server locally before running task generation:
./scripts/launch_vllm_server.shThen generate tasks:
python generate_tasks.py --num-tasks 100 --out-dir ./tasks --model Qwen/Qwen3-32B --jobs 8Each task generates: task.json, test_initial_state.py, test_final_state.py, container.def, and container.sif.
python generate_solutions.py --tasks-dir ./tasks --num-solutions 16 --model Qwen/Qwen3-32B# Prepare dataset
python train/prepare_endless.py --task-dir ./tasks --output-dir ./data --build-sif
# Install SkyRL
./scripts/install_sky.sh
# Run training
ray start --head
python train/main_endless.py --config-dir train/confs --config-name baseConfigs: base.yaml (Llama-3.2-3B), base_qwen.yaml (Qwen2.5-7B), base_qwen3_otak8.yaml (Qwen3-8B)
# Install Harbor
./scripts/setup.sh
# Run evaluation
./scripts/parallel_harbor.sh --model path/to/model --parallel 8@article{gandhi2025endless,
title={Endless Terminals: Scaling RL Environments for Terminal Agents},
author={Gandhi, Kanishk and Garg, Shivam and Goodman, Noah D. and Papailiopoulos, Dimitris},
journal={arXiv preprint arXiv:2601.16443},
year={2025}
}Apache License 2.0 - see LICENSE.