pacman

65 experiemtns for RL research in the Ms. Pacman environment and also single-file implementations for diffusion models, Gaussian processes, active inference, Muon optimizer, pi-sigma product units, growing networks, eligibility traces, halting RNNs, world models

File tree

pacman/
├── README.md
├── utils/                         # shared Atari env helpers
│   ├── env.py
│   └── atari100k_env.py
│
├── minagi_v00/                    # era 0 — broad architecture sweep
│   ├── 2015NatureDQN.py
│   ├── bee_timing.py
│   ├── bob.py
│   ├── cv1.ipynb
│   ├── diffusion.ipynb
│   ├── dqn.py
│   ├── et.py
│   ├── fourier_diffusion.py
│   ├── gnn_pacman_dqn.py
│   ├── grow.py
│   ├── grow_atari.py
│   ├── grow_atari2.py
│   ├── grow_mnist_torch.py
│   ├── haltingrnn.ipynb / .py / haltingrnn2.py
│   ├── hopfield.py
│   ├── hrm_agent_mspacman.py
│   ├── masked_diffusion.py
│   ├── mem_rl_ms_pacman.py
│   ├── minagi/a1.py
│   ├── ms_pacman_ppo_transformer.py
│   ├── muondqn.py
│   ├── muon_dqn_tensorboard.py (+ copy)
│   ├── mushroombody.py
│   ├── novel.py
│   ├── pacman.py
│   ├── ppobaseline.py
│   ├── reinforce.py
│   ├── rnn_pacman.py
│   ├── spatial_attention_cnn.py
│   ├── vqvaeppo.py
│   ├── xnor.py
│   ├── random_dropout/random_replace.py
│   └── sobol/sobol.py
│
├── minagi_v01/                    # era 1 — "noise" agents, Muon PPO, PACF
│   ├── readme
│   ├── MUONPPO.py / muon_ppo.py / muon.py
│   ├── ppo.py / sb_ppo_mspacman.py
│   ├── noise.py
│   ├── noise_agent.py / _2 / _3 / _4 / _muon.py
│   ├── noise_cnn.py / noise_td_conv.py
│   ├── pacf.py / pacf.ipynb / pacf_atari.py
│   ├── hac.py
│   ├── vqvae.py
│   ├── test.py
│   ├── out/ · snapshots/          # output artifacts
│
├── v3/                            # era 3 — Gaussian processes / PILCO
│   ├── gp.py · gp.ipynb
│   └── pixel_pilco.py
│
├── v4/                            # era 4 — active inference / world models
│   ├── active_inference_mountaincar.py
│   ├── with_reward_f.py
│   └── worldmodel_agent.py
│
└── notebooks/                     # scratch experiments + result figures
    ├── pacman.ipynb · new.ipynb · newnew.ipynb · view.ipynb
    ├── pisigma.ipynb
    ├── grow.ipynb
    ├── et.ipynb · trace.ipynb
    ├── forgetting.ipynb
    ├── free_energy.ipynb
    ├── idk.ipynb · misc.ipynb
    ├── *.png                      # ~31 result figures
    └── mnist_data/ · fashion_mnist_data/

Experiment synopses

`utils/` — shared environment helpers

env.py — Minimal smoke test that creates an ALE/Gymnasium Ms. Pac-Man env and runs random episodes to verify setup and observation shapes.
atari100k_env.py — Reusable Gymnasium + ALE env factory for Atari-100k-style preprocessing (frame skip, grayscale downsample, life-loss terminals, frame stacking, noop_max), with train/eval modes.

`minagi_v00/` — era 0: broad architecture sweep

2015NatureDQN.py — Faithful reproduction of Nature DQN (Mnih et al. 2015) on Ms. Pac-Man: RMSProp, reward clipping, standard 84×84 / frame-skip-4 / 1M-replay preprocessing.
bob.py — DQN with CNN + Muon optimizer hybrid (Muon for 2D tensors, Aux-Adam for biases) on Ms. Pac-Man.
muondqn.py — DQN with Muon optimizer on Ms. Pac-Man (small 50k replay, MPS support).
muon_dqn_tensorboard.py (+ copy) — DQN with Muon optimizer (1M replay, 50k warmup) plus TensorBoard logging.
dqn.py — N-step SARSA with a similarity-based push-down memory buffer and spatial-attention visualization on Ms. Pac-Man.
et.py — Emphatic TD (ETD) with a world-model auxiliary loss and episodic memory for Ms. Pac-Man.
gnn_pacman_dqn.py — Graph-neural-network DQN that message-passes over a 4-neighborhood grid graph from a patch encoder.
hopfield.py — Modern Hopfield memory networks for Q-learning, bipolar image encoding, per-action memory banks.
mem_rl_ms_pacman.py — Hybrid memory-attention agent: soft-kNN over external memory (key = CNN latent, value = Q-vector) with TD(λ).
hrm_agent_mspacman.py — Hierarchical Reasoning Model: two recurrent timescales (high- + low-level GRU) trained with REINFORCE.
spatial_attention_cnn.py — A2C with CNN + spatial attention + LSTM, emitting attention heatmaps over the frame.
ms_pacman_ppo_transformer.py / novel.py — PPO with a CNN backbone + lightweight spatial-Transformer encoder over spatial tokens.
mushroombody.py — PPO with n-step (n=5) rollouts and GAE on grayscale 84×84 Ms. Pac-Man.
ppobaseline.py — Plain PPO baseline (last-frame obs, CNN + global-average-pool heads), TensorBoard logging.
pacman.py — Actor-critic on raw 210×160 RGB with a 3-layer CNN + dense policy/value heads.
reinforce.py — REINFORCE policy gradient with a 6-layer dense net on flattened 80×80 obs, temperature-decayed sampling.
rnn_pacman.py — PPO with CNN + GRU (fixed ponder steps), vectorized across 8 envs.
haltingrnn.py / haltingrnn2.py / haltingrnn.ipynb — PPO + GRU with ACT-style learned halting (adaptive computation time / ponder steps).
bee_timing.py — Interval-timing task: a halting agent learns to PROBE at precise multiples of 16 timesteps from sparse reward.
grow.py — Dynamic-depth MLP (per-neuron eligibility traces, growth/pruning) on synthetic moons/XOR/AND.
grow_mnist_torch.py — Dynamic-depth MNIST classifier that grows/prunes residual blocks on a validation-loss threshold.
grow_atari.py / grow_atari2.py — DQN with a growing/pruning residual head on Ms. Pac-Man; v2 adds RND curiosity intrinsic rewards.
vqvaeppo.py — PPO + EMA VQ-VAE with commitment-error intrinsic reward and a temporal-delta prediction auxiliary loss.
diffusion.ipynb / masked_diffusion.py — Masked diffusion (FiLM-conditioned tiny UNet) trained on FashionMNIST/CIFAR/CelebA/STL10 via gradual corruption.
fourier_diffusion.py — Diffusion in a DCT/Fourier-feature latent space, time-conditioned, on FashionMNIST.
xnor.py — Parity/XOR benchmark: ReLU MLP vs. learnable product-pooling trees on {−1,+1}ᵈ vectors.
random_dropout/random_replace.py — Standard dropout vs. random-replacement dropout on CIFAR-10/100 with ResNet-18.
sobol/sobol.py — Sobol quasi-random vs. Xavier weight init on an LSTM time-series task.
cv1.ipynb — Basic computer-vision tutorial (convolution filters, image ops).
minagi/a1.py — Config dataclass for A2C/PPO (attention heads, RNN latent, replay + episodic buffers, ETD emphasis).

`minagi_v01/` — era 1: "noise" agents, Muon PPO, PACF credit assignment

readme — v0.1 notes: standardize returns to mean-0, explore autocorrelation between past signals and latent observations, KL in PPO, Mahalanobis-distance novelty reward, and the observation that motion-defined objects need temporal information.
muon.py — Muon optimizer implementation (orthogonalized SGD-momentum via Newton-Schulz), single-device + distributed variants with AuxAdam.
MUONPPO.py — PPO with the Muon optimizer on Ms. Pac-Man.
muon_ppo.py — Baseline PPO (standard momentum actor-critic) for Ms. Pac-Man.
ppo.py — Standard PPO matching Mnih et al. (2016) Atari hyperparameters (CNN actor-critic, GAE, KL clip, frame stack).
sb_ppo_mspacman.py — PPO via Stable-Baselines3 with Atari preprocessing + frame stack, wandb logging.
noise.py — Custom Gymnasium "noise-on-noise" env: a moving circle on a static Bernoulli-noise background; reward from position prediction.
noise_agent.py — PPO on the noise env; CNN policy outputs continuous 2D predictions with learnable log-std and entropy regularization.
noise_agent_2.py — Two-frame-stacked coordinate predictor (predict next circle position from two consecutive frames).
noise_agent_3.py — CNN policy predicting 2D object coordinates via average-reward REINFORCE with a tanh-squashed Gaussian policy.
noise_agent_4.py — Faithful Nature-2015 DQN on Ms. Pac-Man with wandb, uint8 replay frames.
noise_agent_muon.py — Same Nature DQN but with the Muon optimizer replacing Adam.
noise_cnn.py — Inductive-bias benchmark: learnable conv vs. fixed-random vs. resampled-random non-overlapping convs on MNIST.
noise_td_conv.py — Visualizes white-noise kernel response in Ms. Pac-Man frames.
pacf.py / pacf.ipynb — PACF-based credit assignment: data-driven temporal-autocorrelation-weighted returns vs. exponential TD(λ), on MiniGrid Key-Corridor.
pacf_atari.py — PACF credit assignment inside DQN for Ms. Pac-Man (multi-step returns weighted by PACF lags via OLS).
hac.py — Heteroscedastic auto-correlation-aware recurrent actor-critic (CNN→GRU) with Newey-West variance preconditioning and IACT diagnostics.
vqvae.py — VQ-VAE v1 with straight-through estimator and codebook perplexity, on small 32×32 images.
test.py — Empty template.
out/, snapshots/ — Output figures and saved frames.

`v3/` — era 3: Gaussian processes / PILCO

gp.py — PyTorch port of Deisenroth's PILCO: GP dynamics model, trig augmentation, control saturation, cart-pole sim, GP-policy trajectory optimization.
gp.ipynb — Tutorial on GPs as distributions over functions (Brownian motion, exponentiated-quadratic kernel, prior sampling).
pixel_pilco.py — Pixel-space PILCO for Ms. Pac-Man: CNN autoencoder + latent world model (dynamics + reward), policy optimized by gradient planning through the model.

`v4/` — era 4: active inference / world models

active_inference_mountaincar.py — Active-inference agent (on CartPole) with capped model learning so prediction error persists in unexplored regions and drives epistemic/curiosity exploration.
with_reward_f.py — Active inference on Hopper: learned generative model + Cross-Entropy-Method planning that minimizes expected free energy over imagined trajectories.
worldmodel_agent.py — Active-inference agent on Ms. Pac-Man with a GRU world model (encoder/dynamics/decoder/prior/done heads); optimizes policy on imagined rollouts by minimizing free energy — never sees the environment reward.

`notebooks/` — scratch experiments + figures

pacman.ipynb — Full DQN for Ms. Pac-Man (3-conv CNN → 512 FC, ε-greedy, replay, target net).
new.ipynb / newnew.ipynb — PPO actor-critic with an IMPALA ResNet encoder and GAE / n-step returns on Ms. Pac-Man.
view.ipynb — Loads a PPO-Transformer checkpoint and plays one evaluation episode.
pisigma.ipynb — MLP vs. pi-sigma / product-unit networks on MNIST, with numerically stable log-space products.
grow.ipynb — Growing networks with dynamic neuron addition (IMPALA-style ResNet encoders) on MNIST/FashionMNIST.
et.ipynb / trace.ipynb — Actor-critic with per-parameter eligibility traces on a contextual bandit (local × global modulatory learning, no BPTT).
forgetting.ipynb — Catastrophic-forgetting demo on arithmetic facts (accuracy on "ones" collapses after retraining on "twos").
free_energy.ipynb — Theory write-up of variational free energy / active inference (ELBO, VAE decomposition, action-conditioned world models).
idk.ipynb — Ms. Pac-Man env exploration (setup, random actions, observation space).
misc.ipynb — Quick numpy tutorial.
*.png — ~31 result figures (ablations, MNIST/FashionMNIST comparisons, pi-sigma analyses, weight/activation/gradient stats, scaling studies, XOR/two-moons).
mnist_data/, fashion_mnist_data/ — Downloaded datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pacman

File tree

Experiment synopses

`utils/` — shared environment helpers

`minagi_v00/` — era 0: broad architecture sweep

`minagi_v01/` — era 1: "noise" agents, Muon PPO, PACF credit assignment

`v3/` — era 3: Gaussian processes / PILCO

`v4/` — era 4: active inference / world models

`notebooks/` — scratch experiments + figures

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
minagi_v00		minagi_v00
minagi_v01		minagi_v01
notebooks		notebooks
utils		utils
v3		v3
v4		v4
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

pacman

File tree

Experiment synopses

utils/ — shared environment helpers

minagi_v00/ — era 0: broad architecture sweep

minagi_v01/ — era 1: "noise" agents, Muon PPO, PACF credit assignment

v3/ — era 3: Gaussian processes / PILCO

v4/ — era 4: active inference / world models

notebooks/ — scratch experiments + figures

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`utils/` — shared environment helpers

`minagi_v00/` — era 0: broad architecture sweep

`minagi_v01/` — era 1: "noise" agents, Muon PPO, PACF credit assignment

`v3/` — era 3: Gaussian processes / PILCO

`v4/` — era 4: active inference / world models

`notebooks/` — scratch experiments + figures

Packages