superpacman

A stateless, vectorized implementation of pacman, implemented in torchRL

output.mp4

After implementing a number of RL algorithms, I wanted an environment that allowed me to get test results fast, and was also satisfying to solve.

Pacman, with gigantic batch is a non-trivial, yet data-rich environment for developing RL algorithms.

The constant stream of rewards generated by pacman eating means you have plenty reward signal to work with.

As training progresses, ghosts are released, and rewards become more sparse, naturally challenging your agent in a very satsifying way!

Action space: (0, 1, 2, 3) -> N, E, S, W

Features

it's stateless! Supports Monte-carlo tree search
its vectorized by default, run batch sizes of 2048, 4096.. whatever your GPU can handle
outputs trajectories of pytorch tensors, ready to use
outputs highly detailed 11 channel image (walls, food, energizers, 4 ghosts, players, etc)
ghost AI is implemented as per original game
can output absolute, egocentric, and partial egocentric observations
can output pixel observations (also absolute or partial egocentric)
comes with a fully working and tuned PPO implementation, so you can verify it works!
manual interface, so you can play the game yourself

Installing

Tested under python 3.11 venv

If you intend to run on gpu, install the gpu version of pytorch

pip install superpacman

Running

basic demo, using pretrained agent

superpacman train --enjoy_checkpoint demo_checkpoint.pt

after running, check the logs/superpacman/videos directory

manual mode - play the environment yourself

superpacman play

manual mode - play the environment with partial observation

superpacman play --partial_size 4

help with parameters

superpacman --help

Basic use

import torch
import superpacman
from superpacman import Actions

batch_size = 2

env = superpacman.make_env(batch_size)

state = env.reset()
state['action'] = torch.tensor([Actions.N, Actions.E])
state = env.step(state)

Training and Logging

its possible to reproduce the PPO training baselines in your favourite ml tool

superpacman train

default is csv, but wandb, mlflow and tensorboard are also supported

wandb integration

log training data to wandb

pip install wandb
superpacman train --logger wandb

wandb parameter sweeps

run a parameter sweep using wandb

wandb sweep sweep.yaml

you will see output like below..

wandb: Creating sweep from: sweep.yaml
wandb: Creating sweep with ID: t1qjy41y
wandb: View sweep at: https://wandb.ai/duanenielsen/supergrid/sweeps/t1qjy41y
wandb: Run sweep agent with: wandb agent duanenielsen/supergrid/t1qjy41y

run generated agent command

wandb agent duanenielsen/supergrid/t1qjy41y

mlflow integration

pip install mlflow
superpacman train --logger mlflow
mlflow ui --backend-store-uri file://$PWD/mlflow

you may need to modify the file uri for your OS

tensorboard integration

pip install tensorboard
superpacman train --logger tensorboard
tensorboard --logdir tensorboard

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
docs		docs
superpacman		superpacman
tests		tests
README.md		README.md
setup.py		setup.py
sweep.yaml		sweep.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

superpacman

Features

Installing

Running

Basic use

Training and Logging

wandb integration

wandb parameter sweeps

mlflow integration

tensorboard integration

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

superpacman

Features

Installing

Running

Basic use

Training and Logging

wandb integration

wandb parameter sweeps

mlflow integration

tensorboard integration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages