- [2025/02/13] Publishing TeamCraft Docker Image on Docker Hub.
- [2025/01/30] Adding support for customizing tasks and scenes.
- [2024/12/22] Releasing TeamCraft-VLA (centralized and decentralized) model checkpoints and training data.
- [2024/12/06] Publishing the arXiv paper.
In this repo, we provide TeamCraft code. This codebase is under MIT License.
We offer 3 ways to install TeamCraft. Please make sure you have CUDA available on your machine. We have tested on Ubuntu 22.04, CUDA Version 12.4, Nvidia Driver 550.120.
Note: TeamCraft has minimum requirements for hardware, list in System Requirements.
Run the following commands to install necessary packages:
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install -y \
software-properties-common \
xvfb \
xserver-xorg \
xserver-xephyr \
libxi-dev \
libxext-dev \
ffmpeg \
openjdk-17-jdk \
tmux \
curl \
build-essential \
pkg-config \
libgl1-mesa-dev
# Install Node.js and npm from NodeSource
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash -
sudo apt-get install -y nodejsCreate and activate a new Conda environment:
conda create -n teamcraft -y python=3.9
conda activate teamcraftNavigate to the TeamCraft directory and install the packages:
cd teamcraft/
python3.9 -m pip install --no-cache-dir -e .
python3.9 -m pip install --no-cache-dir -e ./llava_teamcraft/
python3.9 -m pip install --no-cache-dir -e ./llava_teamcraft/[train]
pip install flash-attn==2.7.3 --no-build-isolation
cd ./teamcraft/env/mineflayer && npm installInstallation on Ubuntu 24.04
To install on Ubuntu 24.04, change default system gcc and g++ to version 11 before executing step c above.
sudo apt update
sudo apt install gcc-11 g++-11
sudo ln -s -f /usr/bin/gcc-11 /usr/bin/gcc
sudo ln -s -f /usr/bin/g++-11 /usr/bin/g++
# confirm version
g++ --version
gcc --versionTo change it back:
sudo ln -s -f /usr/bin/gcc-13 /usr/bin/gcc
sudo ln -s -f /usr/bin/g++-13 /usr/bin/g++Tested on EC2 with Ubuntu Server 22.04 LTS (HVM). Run the setup script with root privileges:
cd teamcraft/
sudo ./setup.shInstall Docker and NVIDIA Container Toolkit on Ubuntu 22.04.
Run TeamCraft Offical Docker image
docker pull teamcraft/teamcraft-image:latest
docker run -d --name teamcraft-container --gpus all --ulimit nofile=1000000:1000000 teamcraft/teamcraft-image:latestNote:
- Xserver is not needed for if using provided Docker image
- Ignore "UserWarning: Subprocess mineflayer failed to start." The mineflayer will restart itself.
Build Docker image on local machine
cd teamcraft/
docker build -t teamcraft-image .Run local image:
docker run -d --name teamcraft-container --gpus all --ulimit nofile=1000000:1000000 teamcraft-image(Optional, For Headless Cloud Desktop Only) Set up x server:
# start tmux session
$ tmux new -s xserver
# start X server on DISPLAY 0
$ sudo python ./xserver.py 0 # if this throws errors e.g "(EE) Server terminated with error (1)" or "(EE) already running ..." try a display > 0
# detach from tmux shell
# Ctrl+b then d
# set DISPLAY variable to match X server
$ export DISPLAY=:0Run eval.py for evaluation. It takes the following parameters:
--mode: Evaluation mode; options arecen(centralized) anddec(decentralized) (default:cen)--ckpt: Local path or huggingface name to the model checkpoint (default:teamcraft/TeamCraft-VLA-7B-Cen)--tasks: Tasks to evaluate; options arebuild,break,farm, orsmelt(default:[build, break, farm, smelt])--var_low: Lowest task variant seed, could be 0 to 249 (default:0)--var_high: One above the largest task variant seed, could be 1 to 250 (default:1)--mc_port: Minecraft server port (default:2037)--mineflayer_port: Mineflayer server port (default:3000)--out_folder: Output folder path, containing evaluation reports and logs (default:eval_cen/)--load_4bit: Whether to load the checkpoint with 4 bit (default:False)--load_8bit: Whether to load the checkpoint with 8 bit (default:False)
To run a full evaluation of all 4 tasks (1000 variants total, 250 for each task):
# evaluate the 7B centralized checkpoint
python eval.py \
--mode cen \
--ckpt teamcraft/TeamCraft-VLA-7B-Cen \
--tasks build break farm smelt \
--var_low 0 \
--var_high 250 \
--out_folder eval_cen/
# evaluate the 7B decentralized checkpoint
python eval.py \
--mode dec \
--ckpt teamcraft/TeamCraft-VLA-7B-Dec \
--tasks build break farm smelt \
--var_low 0 \
--var_high 250 \
--out_folder eval_dec/To evaluate a specific task <task_name>:
# evaluate the 7B centralized checkpoint
python eval.py \
--mode cen \
--ckpt teamcraft/TeamCraft-VLA-7B-Cen \
--tasks <task_name> \
--var_low 0 \
--var_high 250 \
--out_folder eval_cen/
# evaluate the 7B decentralized checkpoint
python eval.py \
--mode dec \
--ckpt teamcraft/TeamCraft-VLA-7B-Dec \
--tasks <task_name> \
--var_low 0 \
--var_high 250 \
--out_folder eval_dec/To evaluate one variant <variant_number> of a specific task <task_name>:
# evaluate the 7B centralized checkpoint
python eval.py \
--mode cen \
--ckpt teamcraft/TeamCraft-VLA-7B-Cen \
--tasks <task_name> \
--var_low <variant_number> \
--var_high <variant_number>+1 \
--out_folder eval_cen/
# evaluate the 7B decentralized checkpoint
python eval.py \
--mode dec \
--ckpt teamcraft/TeamCraft-VLA-7B-Dec \
--tasks <task_name> \
--var_low <variant_number> \
--var_high <variant_number>+1 \
--out_folder eval_dec/Full evaluation with 4 bit:
# evaluate the 7B centralized checkpoint
python eval.py \
--mode cen \
--ckpt teamcraft/TeamCraft-VLA-7B-Cen \
--tasks build break farm smelt \
--var_low 0 \
--var_high 250 \
--out_folder eval_cen/ \
--load_4bit True
# evaluate the 7B decentralized checkpoint
python eval.py \
--mode dec \
--ckpt teamcraft/TeamCraft-VLA-7B-Dec \
--tasks build break farm smelt \
--var_low 0 \
--var_high 250 \
--out_folder eval_dec/ \
--load_4bit TrueAfter downloading our dataset, unzip and save it under ./llava_teamcraft/teamcraft_data/:
cd llava_teamcraft
mkdir -p teamcraft_data
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# For centralized data
git clone https://huggingface.co/datasets/teamcraft/TeamCraft-Data-Cen
mv -f TeamCraft-Data-Cen/10001_1_1.png TeamCraft-Data-Cen/combine_centralized_updated.json teamcraft_data/
for zip in TeamCraft-Data-Cen/*.zip; do echo "Extracting $zip..."; unzip -qo "$zip" -d teamcraft_data; done
# For decentralized data
git clone https://huggingface.co/datasets/teamcraft/TeamCraft-Data-Dec
mv -f TeamCraft-Data-Dec/10001_1_1.png TeamCraft-Data-Dec/combine_decentralized_updated.json teamcraft_data/
for zip in TeamCraft-Data-Dec/*.zip; do echo "Extracting $zip..."; unzip -qo "$zip" -d teamcraft_data; done
# Verify saved on correct path
ls -lh ./teamcraft_data/10001_1_1.pngTo train, run finetune_7b_cen.sh and finetune_7b_dec.sh:
chmod +x ./scripts/v1_5/finetune_7b_cen.sh
chmod +x ./scripts/v1_5/finetune_7b_dec.sh
# to train centralized 7b model
bash ./scripts/v1_5/finetune_7b_cen.sh
# to train decentralized 7b model
bash ./scripts/v1_5/finetune_7b_dec.shTraining on 8× NVIDIA A100-SXM4-40GB takes ~24 hours for centralized 7B Vicuna model, and takes ~48 hours for decentralized 7B Vicuna model.
We host pretrained models and dataset on Hugging Face. Download links are listed below.
Note: By default, our eval.py will automatically download the pre-trained checkpoints. There is no need to download it manually.
| Model & Dataset | Hugging Face Link |
|---|---|
| TeamCraft-VLA-7B-Cen | Link |
| TeamCraft-VLA-7B-Dec | Link |
| TeamCraft-Data-Overview | Link |
| TeamCraft-Data-Cen | Link |
| TeamCraft-Data-Dec | Link |
For benchmark environment documents, please refer to environment document.
For task design and available evaluation configure setup, please refer to task document.
For how to desin you own task and configuration, please refer to customize task document.
PC:
- CPU: Intel Core i9-14900KF @3.2 GHz (24 cores, 32 threads)
- Memory: 128 GB RAM
- Disk: 100 GB available
- GPU: 1× NVIDIA RTX A6000 ADA
Workstation:
- CPU: AMD Ryzen Threadripper 3990X @2.2 GHz (64 cores, 128 threads)
- Memory: 252 GB RAM
- Disk: 100 GB available
- GPU: 1× NVIDIA RTX A6000
Inference with 4 bit:
- CPU: Intel Core i7-12700K @3.6 GHz (12 cores, 20 threads)
- Memory: 48 GB RAM
- Disk: 100 GB available
- GPU: 1× NVIDIA RTX 4070 Super
AWS EC2 Instance:
- Instance Type: p4d.24xlarge
- CPU: 96 vCPUs @3 GHz
- Memory: 1,152 GiB
- GPU: 8× NVIDIA A100-SXM4-40GB
- Add task descriptions in the document
- Add instructions to visualize the game
- Refactor codebase to lower the requirement on CPU cores
- Release task customization instructions and code
- Enable GPU image rendering
- Release the data generation code
Please cite the following papers if you use this codebase or our datasets in your work.
@article{long2024teamcraft,
title={TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft},
author={Long, Qian and Li, Zhi and Gong, Ran and Wu, Ying Nian and Terzopoulos, Demetri and Gao, Xiaofeng},
journal={arXiv preprint arXiv:2412.05255},
year={2024}
}