Skip to content

kevbuh/pacman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pacman

65 experiemtns for RL research in the Ms. Pacman environment and also single-file implementations for diffusion models, Gaussian processes, active inference, Muon optimizer, pi-sigma product units, growing networks, eligibility traces, halting RNNs, world models


File tree

pacman/
├── README.md
├── utils/                         # shared Atari env helpers
│   ├── env.py
│   └── atari100k_env.py
│
├── minagi_v00/                    # era 0 — broad architecture sweep
│   ├── 2015NatureDQN.py
│   ├── bee_timing.py
│   ├── bob.py
│   ├── cv1.ipynb
│   ├── diffusion.ipynb
│   ├── dqn.py
│   ├── et.py
│   ├── fourier_diffusion.py
│   ├── gnn_pacman_dqn.py
│   ├── grow.py
│   ├── grow_atari.py
│   ├── grow_atari2.py
│   ├── grow_mnist_torch.py
│   ├── haltingrnn.ipynb / .py / haltingrnn2.py
│   ├── hopfield.py
│   ├── hrm_agent_mspacman.py
│   ├── masked_diffusion.py
│   ├── mem_rl_ms_pacman.py
│   ├── minagi/a1.py
│   ├── ms_pacman_ppo_transformer.py
│   ├── muondqn.py
│   ├── muon_dqn_tensorboard.py (+ copy)
│   ├── mushroombody.py
│   ├── novel.py
│   ├── pacman.py
│   ├── ppobaseline.py
│   ├── reinforce.py
│   ├── rnn_pacman.py
│   ├── spatial_attention_cnn.py
│   ├── vqvaeppo.py
│   ├── xnor.py
│   ├── random_dropout/random_replace.py
│   └── sobol/sobol.py
│
├── minagi_v01/                    # era 1 — "noise" agents, Muon PPO, PACF
│   ├── readme
│   ├── MUONPPO.py / muon_ppo.py / muon.py
│   ├── ppo.py / sb_ppo_mspacman.py
│   ├── noise.py
│   ├── noise_agent.py / _2 / _3 / _4 / _muon.py
│   ├── noise_cnn.py / noise_td_conv.py
│   ├── pacf.py / pacf.ipynb / pacf_atari.py
│   ├── hac.py
│   ├── vqvae.py
│   ├── test.py
│   ├── out/ · snapshots/          # output artifacts
│
├── v3/                            # era 3 — Gaussian processes / PILCO
│   ├── gp.py · gp.ipynb
│   └── pixel_pilco.py
│
├── v4/                            # era 4 — active inference / world models
│   ├── active_inference_mountaincar.py
│   ├── with_reward_f.py
│   └── worldmodel_agent.py
│
└── notebooks/                     # scratch experiments + result figures
    ├── pacman.ipynb · new.ipynb · newnew.ipynb · view.ipynb
    ├── pisigma.ipynb
    ├── grow.ipynb
    ├── et.ipynb · trace.ipynb
    ├── forgetting.ipynb
    ├── free_energy.ipynb
    ├── idk.ipynb · misc.ipynb
    ├── *.png                      # ~31 result figures
    └── mnist_data/ · fashion_mnist_data/

Experiment synopses

utils/ — shared environment helpers

  • env.py — Minimal smoke test that creates an ALE/Gymnasium Ms. Pac-Man env and runs random episodes to verify setup and observation shapes.
  • atari100k_env.py — Reusable Gymnasium + ALE env factory for Atari-100k-style preprocessing (frame skip, grayscale downsample, life-loss terminals, frame stacking, noop_max), with train/eval modes.

minagi_v00/ — era 0: broad architecture sweep

  • 2015NatureDQN.py — Faithful reproduction of Nature DQN (Mnih et al. 2015) on Ms. Pac-Man: RMSProp, reward clipping, standard 84×84 / frame-skip-4 / 1M-replay preprocessing.
  • bob.py — DQN with CNN + Muon optimizer hybrid (Muon for 2D tensors, Aux-Adam for biases) on Ms. Pac-Man.
  • muondqn.py — DQN with Muon optimizer on Ms. Pac-Man (small 50k replay, MPS support).
  • muon_dqn_tensorboard.py (+ copy) — DQN with Muon optimizer (1M replay, 50k warmup) plus TensorBoard logging.
  • dqn.py — N-step SARSA with a similarity-based push-down memory buffer and spatial-attention visualization on Ms. Pac-Man.
  • et.py — Emphatic TD (ETD) with a world-model auxiliary loss and episodic memory for Ms. Pac-Man.
  • gnn_pacman_dqn.py — Graph-neural-network DQN that message-passes over a 4-neighborhood grid graph from a patch encoder.
  • hopfield.py — Modern Hopfield memory networks for Q-learning, bipolar image encoding, per-action memory banks.
  • mem_rl_ms_pacman.py — Hybrid memory-attention agent: soft-kNN over external memory (key = CNN latent, value = Q-vector) with TD(λ).
  • hrm_agent_mspacman.py — Hierarchical Reasoning Model: two recurrent timescales (high- + low-level GRU) trained with REINFORCE.
  • spatial_attention_cnn.py — A2C with CNN + spatial attention + LSTM, emitting attention heatmaps over the frame.
  • ms_pacman_ppo_transformer.py / novel.py — PPO with a CNN backbone + lightweight spatial-Transformer encoder over spatial tokens.
  • mushroombody.py — PPO with n-step (n=5) rollouts and GAE on grayscale 84×84 Ms. Pac-Man.
  • ppobaseline.py — Plain PPO baseline (last-frame obs, CNN + global-average-pool heads), TensorBoard logging.
  • pacman.py — Actor-critic on raw 210×160 RGB with a 3-layer CNN + dense policy/value heads.
  • reinforce.py — REINFORCE policy gradient with a 6-layer dense net on flattened 80×80 obs, temperature-decayed sampling.
  • rnn_pacman.py — PPO with CNN + GRU (fixed ponder steps), vectorized across 8 envs.
  • haltingrnn.py / haltingrnn2.py / haltingrnn.ipynb — PPO + GRU with ACT-style learned halting (adaptive computation time / ponder steps).
  • bee_timing.py — Interval-timing task: a halting agent learns to PROBE at precise multiples of 16 timesteps from sparse reward.
  • grow.py — Dynamic-depth MLP (per-neuron eligibility traces, growth/pruning) on synthetic moons/XOR/AND.
  • grow_mnist_torch.py — Dynamic-depth MNIST classifier that grows/prunes residual blocks on a validation-loss threshold.
  • grow_atari.py / grow_atari2.py — DQN with a growing/pruning residual head on Ms. Pac-Man; v2 adds RND curiosity intrinsic rewards.
  • vqvaeppo.py — PPO + EMA VQ-VAE with commitment-error intrinsic reward and a temporal-delta prediction auxiliary loss.
  • diffusion.ipynb / masked_diffusion.py — Masked diffusion (FiLM-conditioned tiny UNet) trained on FashionMNIST/CIFAR/CelebA/STL10 via gradual corruption.
  • fourier_diffusion.py — Diffusion in a DCT/Fourier-feature latent space, time-conditioned, on FashionMNIST.
  • xnor.py — Parity/XOR benchmark: ReLU MLP vs. learnable product-pooling trees on {−1,+1}ᵈ vectors.
  • random_dropout/random_replace.py — Standard dropout vs. random-replacement dropout on CIFAR-10/100 with ResNet-18.
  • sobol/sobol.py — Sobol quasi-random vs. Xavier weight init on an LSTM time-series task.
  • cv1.ipynb — Basic computer-vision tutorial (convolution filters, image ops).
  • minagi/a1.py — Config dataclass for A2C/PPO (attention heads, RNN latent, replay + episodic buffers, ETD emphasis).

minagi_v01/ — era 1: "noise" agents, Muon PPO, PACF credit assignment

  • readme — v0.1 notes: standardize returns to mean-0, explore autocorrelation between past signals and latent observations, KL in PPO, Mahalanobis-distance novelty reward, and the observation that motion-defined objects need temporal information.
  • muon.py — Muon optimizer implementation (orthogonalized SGD-momentum via Newton-Schulz), single-device + distributed variants with AuxAdam.
  • MUONPPO.py — PPO with the Muon optimizer on Ms. Pac-Man.
  • muon_ppo.py — Baseline PPO (standard momentum actor-critic) for Ms. Pac-Man.
  • ppo.py — Standard PPO matching Mnih et al. (2016) Atari hyperparameters (CNN actor-critic, GAE, KL clip, frame stack).
  • sb_ppo_mspacman.py — PPO via Stable-Baselines3 with Atari preprocessing + frame stack, wandb logging.
  • noise.py — Custom Gymnasium "noise-on-noise" env: a moving circle on a static Bernoulli-noise background; reward from position prediction.
  • noise_agent.py — PPO on the noise env; CNN policy outputs continuous 2D predictions with learnable log-std and entropy regularization.
  • noise_agent_2.py — Two-frame-stacked coordinate predictor (predict next circle position from two consecutive frames).
  • noise_agent_3.py — CNN policy predicting 2D object coordinates via average-reward REINFORCE with a tanh-squashed Gaussian policy.
  • noise_agent_4.py — Faithful Nature-2015 DQN on Ms. Pac-Man with wandb, uint8 replay frames.
  • noise_agent_muon.py — Same Nature DQN but with the Muon optimizer replacing Adam.
  • noise_cnn.py — Inductive-bias benchmark: learnable conv vs. fixed-random vs. resampled-random non-overlapping convs on MNIST.
  • noise_td_conv.py — Visualizes white-noise kernel response in Ms. Pac-Man frames.
  • pacf.py / pacf.ipynbPACF-based credit assignment: data-driven temporal-autocorrelation-weighted returns vs. exponential TD(λ), on MiniGrid Key-Corridor.
  • pacf_atari.py — PACF credit assignment inside DQN for Ms. Pac-Man (multi-step returns weighted by PACF lags via OLS).
  • hac.py — Heteroscedastic auto-correlation-aware recurrent actor-critic (CNN→GRU) with Newey-West variance preconditioning and IACT diagnostics.
  • vqvae.py — VQ-VAE v1 with straight-through estimator and codebook perplexity, on small 32×32 images.
  • test.py — Empty template.
  • out/, snapshots/ — Output figures and saved frames.

v3/ — era 3: Gaussian processes / PILCO

  • gp.py — PyTorch port of Deisenroth's PILCO: GP dynamics model, trig augmentation, control saturation, cart-pole sim, GP-policy trajectory optimization.
  • gp.ipynb — Tutorial on GPs as distributions over functions (Brownian motion, exponentiated-quadratic kernel, prior sampling).
  • pixel_pilco.py — Pixel-space PILCO for Ms. Pac-Man: CNN autoencoder + latent world model (dynamics + reward), policy optimized by gradient planning through the model.

v4/ — era 4: active inference / world models

  • active_inference_mountaincar.py — Active-inference agent (on CartPole) with capped model learning so prediction error persists in unexplored regions and drives epistemic/curiosity exploration.
  • with_reward_f.py — Active inference on Hopper: learned generative model + Cross-Entropy-Method planning that minimizes expected free energy over imagined trajectories.
  • worldmodel_agent.py — Active-inference agent on Ms. Pac-Man with a GRU world model (encoder/dynamics/decoder/prior/done heads); optimizes policy on imagined rollouts by minimizing free energy — never sees the environment reward.

notebooks/ — scratch experiments + figures

  • pacman.ipynb — Full DQN for Ms. Pac-Man (3-conv CNN → 512 FC, ε-greedy, replay, target net).
  • new.ipynb / newnew.ipynb — PPO actor-critic with an IMPALA ResNet encoder and GAE / n-step returns on Ms. Pac-Man.
  • view.ipynb — Loads a PPO-Transformer checkpoint and plays one evaluation episode.
  • pisigma.ipynb — MLP vs. pi-sigma / product-unit networks on MNIST, with numerically stable log-space products.
  • grow.ipynb — Growing networks with dynamic neuron addition (IMPALA-style ResNet encoders) on MNIST/FashionMNIST.
  • et.ipynb / trace.ipynb — Actor-critic with per-parameter eligibility traces on a contextual bandit (local × global modulatory learning, no BPTT).
  • forgetting.ipynb — Catastrophic-forgetting demo on arithmetic facts (accuracy on "ones" collapses after retraining on "twos").
  • free_energy.ipynb — Theory write-up of variational free energy / active inference (ELBO, VAE decomposition, action-conditioned world models).
  • idk.ipynb — Ms. Pac-Man env exploration (setup, random actions, observation space).
  • misc.ipynb — Quick numpy tutorial.
  • *.png — ~31 result figures (ablations, MNIST/FashionMNIST comparisons, pi-sigma analyses, weight/activation/gradient stats, scaling studies, XOR/two-moons).
  • mnist_data/, fashion_mnist_data/ — Downloaded datasets.

About

atari rl experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors