OpenEnv-WolfeClick

title

OpenEnv-WolfeClick Environment

emoji

🎮

colorFrom

blue

colorTo

gray

sdk

docker

app_port

7860

OpenEnv-WolfeClick

An OpenEnv-compatible environment for training LLMs to play competitive Pokemon Showdown battles using GRPO.

Competitive Pokemon has hidden information, constrained legal actions, long-term resource tradeoffs, and an active opponent. This repo turns that setting into a trainable RL environment with a reset() / step() loop, shaped rewards, an OpenEnv server wrapper, and a GRPO training pipeline.

Try the live demo — watch a GRPO-trained model play a full battle turn by turn.

Quick Start

git clone https://github.com/Atharva2099/OpenEnv-WolfeClick.git
cd OpenEnv-WolfeClick
pip install -e .

# Run a battle with random actions (needs local Pokemon Showdown on port 8000)
python examples/run_single_episode.py

# Watch a trained model battle
python examples/watch_model_battle.py --revision grpo-qwen3-4b-run3

Project Structure

src/smogon_rl/           Core environment: state formatting, action validation,
                         reward shaping, poke-env client
env/                     OpenEnv server package (env.server.app:app)
examples/                Runnable scripts for local battles
trainer.ipynb            Colab: rollout collection + GRPO training
watch_battle.ipynb       Colab: run one live watched battle
benchmarks/              Checkpoint comparison notebook + results
record_battle.py         Record a battle to JSON for replay
space_app.py             Gradio HF Space battle viewer
openenv.yaml             OpenEnv deployment config
Dockerfile               HF Spaces Docker deployment

Environment Design

Each turn the model receives a structured markdown state:

Section	Contents
Part A: Active Field	Active Pokemon for both sides — HP, status, ability, item, stat modifiers, opponent speed range
Part B: Full Self Roster	All 6 team Pokemon with HP, status, item, and known moves (type + base power)
Part C: Opponent History	Every revealed opponent Pokemon — last known HP, status, moves, items, abilities

The model outputs one JSON action:

{"action": "move" | "switch", "choice": "Exact Name of Move or Pokemon"}

Up to 4 moves and 5 switches are available per turn. The environment validates the action, executes it in a real Showdown battle, and returns the next state + shaped reward.

Reward Shaping

Dense reward signal tied to battle progress:

Component	Signal
Damage dealt	+1.0 per 10% opponent HP reduced
Damage taken	-1.0 per 10% self HP lost
Knockouts	+3.0 per opponent faint, -3.0 per self faint
Healing	+1.0 per 10% healed (capped 3.0/battle)
Setup	+0.5 per stat stage gained (capped 2.0/mon)
Type effectiveness	+0.5 super effective, -1.0 immune
Illegal action	-10.0 for hallucinated moves/Pokemon
Step penalty	-0.05 per turn (anti-stall)

Training Pipeline

Base Model (Qwen3-4B-Instruct)
        |
  [JSON Warm-up SFT]     establish legal action baseline
        |
  [Rollout Collection]   live Pokemon Showdown battles
        |
  [GRPO Training]        optimize policy on real trajectories
        |
  LoRA Checkpoint  --->  Hugging Face Hub

Start local Pokemon Showdown in Colab
Collect rollout trajectories from live battles
Store prompt, chosen action, and environment reward
Train a LoRA adapter with GRPO on real trajectories
Benchmark checkpoints against each other

Architecture

Pokemon Showdown (Node.js, port 8000)
        |  WebSocket
PokeEnvClient (async background loop)
  |-- RLPlayer (queue-driven)
  |-- RandomPlayer (opponent)
        |
PokemonShowdownEnv (sync wrapper: reset/step)
  |-- state_formatter   -> markdown state for LLM
  |-- action_space      -> JSON validation + matching
  |-- reward calculator  -> shaped multi-component reward
        |
OpenEnv Server (FastAPI on port 8001)

Trained Checkpoints

Model repo: Atharva2099/openenv-smogon-rl

Checkpoint	Description
`grpo-qwen3-4b-run1`	First GRPO training run
`grpo-qwen3-4b-run2`	Second run, tuned reward shaping
`grpo-qwen3-4b-run3`	Third run, best performing

Notebooks

Notebook	Purpose
`trainer.ipynb`	Rollout collection + GRPO training (Colab GPU)
`watch_battle.ipynb`	Run one live watched battle
`benchmarks/benchmark.ipynb`	Compare checkpoint performance

OpenEnv Server

The environment follows the OpenEnv standard. Config:

# openenv.yaml
spec_version: 1
name: openenv-wolfeclick
type: space
runtime: fastapi
app: env.server.app:app
port: 8001

Server package: env/server/app.py, env/server/environment.py, env/models.py

HF Spaces Deployment

The Dockerfile builds a lightweight Gradio app that replays pre-recorded model battles:

docker build -t wolfeclick . && docker run -p 7860:7860 wolfeclick

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
battle_logs		battle_logs
benchmarks		benchmarks
docs		docs
env		env
examples		examples
server		server
src/smogon_rl		src/smogon_rl
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
__init__.py		__init__.py
client.py		client.py
convert_battle_log.py		convert_battle_log.py
models.py		models.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
record_battle.py		record_battle.py
requirements_space.txt		requirements_space.txt
space_app.py		space_app.py
trainer.ipynb		trainer.ipynb
uv.lock		uv.lock
watch_battle.ipynb		watch_battle.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenEnv-WolfeClick

Quick Start

Project Structure

Environment Design

Reward Shaping

Training Pipeline

Architecture

Trained Checkpoints

Notebooks

OpenEnv Server

HF Spaces Deployment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenEnv-WolfeClick

Quick Start

Project Structure

Environment Design

Reward Shaping

Training Pipeline

Architecture

Trained Checkpoints

Notebooks

OpenEnv Server

HF Spaces Deployment

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages