Future Sight — RL Agent for Pokemon Showdown

A reinforcement learning bot for Gen 9 Random Battles on Pokemon Showdown, trained with PPO and self-play over 100 million timesteps.

Performance

Opponent	Win Rate (500 games)
RandomPlayer	100%
SimpleHeuristics	73.2%
Best prior self-play agent	61.8%

How It Works

The bot uses a PPO policy network with learned embeddings for Pokemon, moves, abilities, and items. It reads the battle state each turn, encodes it into a feature vector, and picks the highest-value action from 13 options (4 moves, 4 moves with Terastallize, 5 switches). Action masking enforces legal moves and blocks obviously bad ones like Electric moves into Ground types.

Setup

Prerequisites

Python 3.9+
Node.js 18+

Install

# Python dependencies
pip install poke-env stable-baselines3 sb3-contrib gymnasium torch numpy

# Pokemon Showdown server
git clone https://github.com/smogon/pokemon-showdown.git
cd pokemon-showdown
npm install
cp config/config-example.js config/config.js

Usage

All scripts should be run from the project root directory.

Play against the bot locally

Start your local Showdown server:

cd pokemon-showdown
node pokemon-showdown start --no-security

In a separate terminal, start the bot:

python -m players.play_bot

Open http://localhost:8000 in your browser and challenge MyPPOBot to a Gen 9 Random Battle.

Run on the Pokemon Showdown ladder

python -m players.ladder_bot

Edit players/ladder_bot.py to set your username, password, and number of games. Connects to the public Showdown server and queues for ranked Gen 9 Random Battles.

Evaluate against bots

python -m eval.ppo_test                                                              # vs Random + SimpleHeuristics
python -m eval.ppo_test --opponents model --opp-model models/ppo_pokemon_6v6_v17.3sp # head-to-head

Project Structure

├── players/                          # Player wrappers + deployment bots
│   ├── ppo_player.py                 #   PPO inference player (auto-detects model version)
│   ├── play_bot.py                   #   Play against the bot locally
│   ├── ladder_bot.py                 #   Run on the Showdown ladder
│   ├── search_player.py              #   Value-network-guided search player
│   ├── engine_search_player.py       #   PPO + poke-engine MCTS search
│   ├── nn_mcts_player.py             #   Conservative NN + engine MCTS
│   └── wang_mcts_player.py           #   Wang-style MCTS with NN value leaf
├── envs/                             # RL environments + gym wrappers
│   ├── rl_player_6v6.py              #   v17 observation space (640-dim)
│   ├── rl_player_6v6_v18.py          #   v18 observation space (677-dim)
│   └── wrappers.py                   #   Maskable action wrapper, curriculum wrapper
├── networks/                         # Neural network feature extractors
│   ├── embedding_extractor.py        #   v17 feature extractor
│   ├── embedding_extractor_sp.py     #   v17 self-play variant (max+mean pool)
│   ├── embedding_extractor_v18.py    #   v18 feature extractor
│   └── pretrain_embed.py             #   Embedding pretraining (types → effectiveness)
├── training/                         # Training scripts
│   ├── ppo_embed_train_selfplay.py   #   v18 mixed self-play + curriculum
│   ├── ppo_embed_train_parallel.py   #   v18 heuristic-only parallel training
│   ├── ppo_train.py                  #   v17 curriculum training
│   └── ppo_train_selfplay.py         #   v13 self-play training
├── eval/                             # Evaluation + testing
│   ├── ppo_test.py                   #   Evaluate models vs various opponents
│   ├── test_battle.py                #   Simple battle smoke test
│   └── debug.py                      #   Dataset diagnostics
├── utils/                            # Utilities
│   ├── state_bridge.py               #   poke-env → search server state conversion
│   ├── battle_cloner.py              #   JS search state → observation vectors
│   └── train_log.py                  #   Training monitor (TensorBoard logs)
├── data/                             # Embedding vocabularies + config
└── models/                           # Trained model checkpoints

References

Huang & Lee. "PPO and Self-Play for Pokemon Showdown." CoG 2019.
Wang. "Pokemon Battle Agent with RL." MIT Thesis 2024.
pmariglia. Foul Play — search-based Pokemon Showdown bot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Future Sight — RL Agent for Pokemon Showdown

Performance

How It Works

Setup

Prerequisites

Install

Usage

Play against the bot locally

Run on the Pokemon Showdown ladder

Evaluate against bots

Project Structure

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
envs		envs
eval		eval
models		models
networks		networks
players		players
training		training
utils		utils
.gitignore		.gitignore
README.md		README.md
debug_reconstruct.js		debug_reconstruct.js
search_server.js		search_server.js
start_servers.sh		start_servers.sh

Folders and files

Latest commit

History

Repository files navigation

Future Sight — RL Agent for Pokemon Showdown

Performance

How It Works

Setup

Prerequisites

Install

Usage

Play against the bot locally

Run on the Pokemon Showdown ladder

Evaluate against bots

Project Structure

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages