Skip to content

DuaneNielsen/superpacman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

130 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

superpacman

A stateless, vectorized implementation of pacman, implemented in torchRL

Documentation

output.mp4

After implementing a number of RL algorithms, I wanted an environment that allowed me to get test results fast, and was also satisfying to solve.

Pacman, with gigantic batch is a non-trivial, yet data-rich environment for developing RL algorithms.

The constant stream of rewards generated by pacman eating means you have plenty reward signal to work with.

As training progresses, ghosts are released, and rewards become more sparse, naturally challenging your agent in a very satsifying way!

Action space: (0, 1, 2, 3) -> N, E, S, W

Features

  • it's stateless! Supports Monte-carlo tree search
  • its vectorized by default, run batch sizes of 2048, 4096.. whatever your GPU can handle
  • outputs trajectories of pytorch tensors, ready to use
  • outputs highly detailed 11 channel image (walls, food, energizers, 4 ghosts, players, etc)
  • ghost AI is implemented as per original game
  • can output absolute, egocentric, and partial egocentric observations
  • can output pixel observations (also absolute or partial egocentric)
  • comes with a fully working and tuned PPO implementation, so you can verify it works!
  • manual interface, so you can play the game yourself

Installing

Tested under python 3.11 venv

If you intend to run on gpu, install the gpu version of pytorch

pip install superpacman 

Running

basic demo, using pretrained agent

superpacman train --enjoy_checkpoint demo_checkpoint.pt

after running, check the logs/superpacman/videos directory

manual mode - play the environment yourself

superpacman play

manual mode - play the environment with partial observation

superpacman play --partial_size 4

help with parameters

superpacman --help

Basic use

import torch
import superpacman
from superpacman import Actions

batch_size = 2

env = superpacman.make_env(batch_size)

state = env.reset()
state['action'] = torch.tensor([Actions.N, Actions.E])
state = env.step(state)

Training and Logging

its possible to reproduce the PPO training baselines in your favourite ml tool

superpacman train

default is csv, but wandb, mlflow and tensorboard are also supported

wandb integration

log training data to wandb

pip install wandb
superpacman train --logger wandb

wandb parameter sweeps

run a parameter sweep using wandb

wandb sweep sweep.yaml

you will see output like below..

wandb: Creating sweep from: sweep.yaml
wandb: Creating sweep with ID: t1qjy41y
wandb: View sweep at: https://wandb.ai/duanenielsen/supergrid/sweeps/t1qjy41y
wandb: Run sweep agent with: wandb agent duanenielsen/supergrid/t1qjy41y

run generated agent command

wandb agent duanenielsen/supergrid/t1qjy41y

mlflow integration

pip install mlflow
superpacman train --logger mlflow
mlflow ui --backend-store-uri file://$PWD/mlflow

you may need to modify the file uri for your OS

tensorboard integration

pip install tensorboard
superpacman train --logger tensorboard
tensorboard --logdir tensorboard

About

Pacman Deep Learning Demo

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages