Feature: Pokémon Playing Skill — Headless GBA/GB Emulation with AI Gameplay (inspired by gpt-play-pokemon-firered)

## Overview

Inspired by [gpt-play-pokemon-firered](https://github.com/Clad3815/gpt-play-pokemon-firered) — an autonomous AI agent that plays Pokémon FireRed in real-time using OpenAI's LLMs — this proposes a **Pokémon Playing Skill** that enables Hermes Agent to play Pokémon games via headless GBA/GB emulation. The agent would read structured game state from emulator memory, make strategic decisions, and send button inputs — all from the terminal with no display server required.

The reference project uses a complex 4-layer architecture (mGBA + Lua → Python FastAPI bridge → Node.js AI agent → OpenAI API), totaling ~15,000+ lines across Python, Node.js, and Lua. It requires a GUI-based mGBA instance and is tightly coupled to OpenAI. For Hermes, we can dramatically simplify this: **Hermes IS the AI agent**, so we eliminate the Node.js layer entirely. By using [PyGBA](https://github.com/dvruette/pygba) (MIT) for headless emulation, we also eliminate the Lua scripting layer and mGBA GUI dependency.

The result: Hermes Agent plays Pokémon through a lightweight Python game server, using its native tools (terminal, vision, memory) for the decision loop — no separate agent process, no GUI, no OpenAI lock-in.

---

## Research Findings

### How gpt-play-pokemon-firered Works

The reference project has four distinct layers communicating in a chain:

```
mGBA Emulator (GBA ROM)
     |  (Lua TCP socket, port 8888)
Lua Bridge (FireRedBridgeSocketServer.lua, ~950 lines)
     |  (TCP socket with <|END|> framing)
Python Bridge (firered_mgba_bridge.py, FastAPI, port 8000, ~1500 lines)
     |  (HTTP REST API)
Node.js AI Agent (server/index.js + gameLoop.js, ~1400 lines)
     |  (OpenAI API - hardcoded)
OpenAI GPT Model (default: gpt-5.2)
```

**Lua Layer**: Runs inside mGBA, exposes a TCP socket for reading arbitrary GBA memory (8/16/32-bit), sending frame-accurate button presses, taking screenshots, and smart overworld movement control. Uses pokefirered decomp symbols for address resolution.

**Python Bridge**: FastAPI server that reads GBA RAM through the Lua socket and parses it into structured game state JSON. Key features:
- Full game state endpoint: player position/facing/badges/money, party (with PID/OTID XOR decryption and substructure unshuffling), bag (XOR-encrypted), PC storage, dialog state, battle state, map tiles, NPCs, warps, connections
- Fog-of-war system tracking discovered vs. unexplored tiles across sessions
- Per-command before/after state tracing
- Periodic savestate backups

**Node.js Agent**: Continuous decision loop — fetch state, build prompt with context (game state + memory + objectives + screenshots), send to OpenAI, process tool calls (execute_action, write_memory, update_objectives), send commands back. Includes summary rollup (every 120 steps), self-criticism (every 55 steps), and pathfinding sub-prompts.

**Prompts** (~2,700 lines total across 5 files):
- `game.txt` (~1,300 lines): Core gameplay prompt — priorities, tool usage, RAM interpretation, battle strategy (Gen 3 phys/special split), team building, HM management, anti-loop protocol
- `self_criticism.txt` (~760 lines): Error analysis, strategy audit, memory management review
- `path_finding.txt` (~455 lines): A*/BFS pathfinding with collision handling, ledge mechanics, movement cost weighting
- `summary.txt` / `summary_rollup.txt`: Gameplay state summarization with confidence tags

### Key Design Decisions in the Reference Project

1. **Memory reading over vision**: The core game state comes from reading RAM directly (positions, HP, items, flags), NOT from screenshot OCR. Screenshots supplement memory data for the LLM's visual understanding.
2. **Dual navigation**: `explored_map` (fog-of-war persistent) as primary nav, `visible_area` (current viewport) as secondary.
3. **Anti-loop protocol**: Question-mark tiles (unexplored) prioritized; explicit loop detection and escape heuristics.
4. **Structured objectives**: Each objective has what/why/how/context fields, maintained by the AI.
5. **Summary compaction**: Periodic summaries prevent context overflow; rollups merge older summaries.

### License Constraint

The reference project is **CC BY-NC 4.0** (Non-Commercial). We cannot use its code directly. However:
- The pokefirered decomp ([pret/pokefirered](https://github.com/pret/pokefirered)) provides memory addresses/symbols as community knowledge
- PyGBA is **MIT** licensed
- We write our own implementation inspired by the architecture patterns

### Headless Emulation: The Landscape

We evaluated multiple approaches for running GBA emulation without a display server:

| Approach | Headless? | Install | Speed | GBA Support | Verdict |
|----------|-----------|---------|-------|-------------|---------|
| **PyGBA + mGBA bindings** | ✅ Yes | pip + build | Fast (in-process) | ✅ | **Best option** |
| **libmgba-py** | ✅ Yes | Build from source | Fast (in-process) | ✅ | Same as above, different build |
| **mGBA-Qt + xvfb** | Partial (virtual display) | apt install | Medium | ✅ | Heavyweight |
| **mgba-mcp** | Partial (xvfb) | npm + apt | Slow (subprocess per op) | ✅ | Too slow for gameplay |
| **PyBoy** | ✅ Yes | pip install | Fast | ❌ GB/GBC only | Gen 1-2 only |
| **RetroArch headless** | Unreliable | apt install | Medium | ✅ | Documented issues |
| **PyBoyAdvance** | ✅ Yes | pip install | Very slow | ✅ | Experimental, not ready |

**Winner: PyGBA** — MIT licensed, pip installable, truly headless (no X11/display needed), provides direct memory read/write, button input, and frame rendering as numpy arrays. Already has a PokemonEmerald game wrapper as a reference. The main challenge is the mGBA Python bindings: pre-built wheels exist only for Python 3.10-3.11 on Linux; Python 3.12+ requires building mGBA from source with `-DBUILD_PYTHON=ON`.

**Fallback: PyBoy** — For Gen 1-2 (Pokémon Red/Blue/Gold/Silver), PyBoy is simpler: `pip install pyboy`, `window='null'` for headless. Multiple AI projects already use it successfully (ClaudePlaysPokemonStarter, Pokemon-OpenClaw, PokemonRedExperiments). Could be Phase 0 if GBA setup proves too complex.

---

## Current State in Hermes Agent

**No existing gaming/emulation capabilities beyond Minecraft server management.** No issues related to Pokémon, emulators, or game-playing agents.

**Relevant existing capabilities that the skill would leverage:**
- `terminal` tool — run Python scripts, manage background processes, HTTP calls
- `vision_analyze` tool — analyze screenshots from the emulator
- `memory` tool — persistent memory for game strategy, objectives, map knowledge
- `process` tool — manage background emulator server
- `execute_code` tool — run Python code for quick calculations
- Background process management (`background=true`, poll, kill)
- Cronjob scheduling — potential for autonomous long-running gameplay sessions

**Related but distinct:**
- Minecraft modpack server skill (gaming category exists)
- No overlap — completely different domain

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** because:
- The capability is expressible as Python scripts + existing tools (terminal, vision, memory)
- It wraps external libraries (PyGBA/PyBoy) that the agent calls via shell commands
- No custom Python integration or API key management needs to be baked into the agent harness
- Hermes's existing tool suite (terminal for running scripts, vision for screenshots, memory for game knowledge) provides everything needed

**Should it be bundled?** Yes — while CONTRIBUTING.md notes that games are typically Skills Hub candidates, this is a flagship showcase capability that demonstrates Hermes Agent's most advanced features (persistent memory, vision analysis, strategic multi-session reasoning, background process management) in a compelling, accessible way. Bundling it means every Hermes install can play Pokémon out of the box — a powerful "wow factor" demo. → **Bundled in `skills/gaming/pokemon-player/`.**

### Architecture

Unlike the reference project's 4-layer architecture, Hermes's approach is 2 layers:

```
PyGBA (mGBA Python bindings)
     |  (in-process Python calls)
Pokemon Game Server (Python FastAPI, background process)
     |  (HTTP REST API on localhost)
Hermes Agent (existing tools: terminal, vision, memory)
```

The **Pokemon Game Server** is a lightweight Python FastAPI app that:
- Loads the ROM via PyGBA at startup (headless, no display)
- Exposes REST endpoints for game interaction
- Handles memory address parsing for structured game state

**API Endpoints:**
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/state` | GET | Full game state JSON (player, party, bag, map, battle, dialog) |
| `/screenshot` | GET | Current frame as base64 PNG |
| `/action` | POST | Send button inputs (press_a, press_b, walk_up, etc.) |
| `/action/path` | POST | Pathfind + walk to coordinates |
| `/save` | POST | Create savestate |
| `/load` | POST | Load savestate |
| `/minimap` | GET | ASCII/emoji minimap of explored area |

**Hermes's decision loop** (no custom code needed — just skill instructions):
1. `curl localhost:8765/state` → read game state JSON
2. `curl localhost:8765/screenshot > /tmp/pokemon_screen.png` → get visual
3. Think: what should I do next? (use memory for objectives, map knowledge)
4. `curl -X POST localhost:8765/action -d '{"actions": ["walk_up", "walk_up", "press_a"]}'`
5. Check result, update memory/objectives, repeat

### What We'd Need

**Skill directory structure:**
```
pokemon-player/
├── SKILL.md                           # Skill instructions for Hermes
├── scripts/
│   ├── setup.sh                       # Install PyGBA, mGBA bindings, dependencies
│   ├── pokemon_server.py              # FastAPI game server (~500-800 lines)
│   ├── memory_reader.py               # FireRed RAM address parser (~400 lines)
│   ├── state_builder.py               # Build structured state from raw memory (~300 lines)
│   └── pathfinder.py                  # A* pathfinding on tile grid (~200 lines)
├── references/
│   ├── firered_addresses.json         # Memory addresses for FireRed (from pokefirered decomp)
│   ├── game_data.json                 # Species, moves, items, type chart data
│   ├── battle_strategy.md             # Gen 3 battle mechanics reference
│   └── progression_guide.md           # Key story milestones and flags
└── templates/
    └── game_prompt.md                 # System prompt template for gameplay sessions
```

**Dependencies:**
- Python: `pygba`, `mgba` (Python bindings), `fastapi`, `uvicorn`, `Pillow`
- System (for building mGBA): `cmake`, `libelf-dev`, `libzip-dev`, `libsqlite3-dev`, `libpng-dev`
- ROM: User must provide their own Pokémon FireRed ROM (the skill MUST NOT include or distribute ROMs)

### Phased Rollout

**Phase 1: Basic Gameplay (Gen 1 via PyBoy — quick win)**
- PyBoy-based server for Pokémon Red/Blue (GB) — simplest possible setup
- `pip install pyboy` (no source build needed)
- Basic state reading: player position, party summary, badges, dialog
- Button input: directional movement + A/B/Start/Select
- Screenshot capture for vision analysis
- Simple skill instructions: explore, battle, catch, progress story
- This proves the concept with minimal setup friction

**Phase 2: FireRed via PyGBA (the real target)**
- PyGBA-based server for Pokémon FireRed/LeafGreen (GBA)
- Full memory reader using pokefirered decomp addresses
- Party decryption (PID/OTID XOR, substructure shuffling — Gen 3 specific)
- Bag reading (security key XOR decryption)
- Battle state (active Pokémon, moves, type matchups)
- Map/collision data with fog-of-war tracking
- Pathfinding (A* with terrain costs)
- Build script for mGBA Python bindings

**Phase 3: Advanced Gameplay**
- Self-criticism loop (periodic strategy review)
- Summary compaction (compress game history to manage context)
- Progress milestone tracking (badges, story events, Champion)
- Multi-game support: Ruby/Sapphire/Emerald (reuse PyGBA, different addresses)
- Long-running autonomous sessions via cronjob scheduling
- Optional: web dashboard for monitoring (WebSocket state broadcast)

**Phase 4: Multi-Generation Support**
- PyBoy backend for Gen 1-2 (Red/Blue/Yellow/Gold/Silver/Crystal)
- PyGBA backend for Gen 3 (FireRed/LeafGreen/Ruby/Sapphire/Emerald)
- Abstracted game interface so the skill works across generations
- Game-specific address files and strategy references

---

## Pros & Cons

### Pros
- **Unique showcase capability** — An AI agent that plays Pokémon is compelling for demos, streams, and community engagement. Pokémon is perfect for AI: turn-based battles, explorable world, clear objectives, measurable progress (badges, Pokédex, Champion).
- **Leverages Hermes's strengths** — Memory (persistent game knowledge), vision (screenshot analysis), terminal (script execution), background processes (game server). No new tools needed.
- **Dramatically simpler than reference** — 2 layers vs 4. No Lua, no Node.js, no OpenAI lock-in. Hermes IS the agent, using any model.
- **Headless operation** — PyGBA runs without any display server. Works on headless servers, SSH sessions, CI/CD, cloud VMs.
- **Model-agnostic** — Unlike the reference project (OpenAI-only), Hermes can play with any LLM backend.
- **MIT-licensed foundation** — PyGBA is MIT. pokefirered decomp is community knowledge. No license conflicts.
- **Extensible** — The game server pattern generalizes to other games. Address files are swappable.
- **Educational** — Demonstrates complex agent capabilities: persistent state, strategic planning, real-time decision-making, multi-session continuity.

### Cons / Risks
- **mGBA build complexity** — Python 3.12+ requires building mGBA from source with `-DBUILD_PYTHON=ON`. Needs cmake, system libraries. Could be a setup barrier. Mitigated by Phase 1's PyBoy approach (pip-only).
- **ROM requirement** — Users must provide their own ROM file. We cannot distribute ROMs. This adds setup friction and potential legal gray area (though emulation is legal, ROM distribution is not).
- **Token cost** — Each gameplay "turn" requires an LLM call. A full playthrough (30-50 hours of gameplay) could require thousands of turns. At ~$0.01-0.05/turn, a complete game could cost $30-250+ in API calls depending on model.
- **Context management** — Game state + memory + objectives can be large. Need careful prompt engineering to stay within context limits. Summary compaction (Phase 3) helps.
- **Game-specific addresses** — Memory addresses are specific to each ROM version (FireRed USA v1.0 vs v1.1). The skill must clearly specify which ROM version is supported.
- **Niche appeal** — Not everyone wants their AI to play Pokémon. But the pattern (agent + emulator + game) generalizes, and as a bundled showcase it demonstrates capabilities that attract users to the platform.
- **Long-running sessions** — A full playthrough requires sustained multi-session engagement. Hermes's memory system handles this, but it's untested at this scale.

---

## Open Questions

1. **Phase 1 game choice**: Should we start with Pokémon Red (GB/PyBoy, simpler) or go straight to FireRed (GBA/PyGBA)? PyBoy is dramatically easier to set up but the user specifically asked about FireRed.
2. **Autonomy level**: Should the agent play fully autonomously (cronjob-driven, reports progress), or interactively (user watches and can intervene)? Or both modes?
3. **Vision vs memory-only**: The reference project uses both RAM reading and screenshots. How much should we rely on vision_analyze for the screen vs. structured RAM data? Vision is more flexible but slower and uses more tokens.
4. **Multi-game priority**: After FireRed, which game next? Emerald (PyGBA already has a wrapper)? Red/Blue (PyBoy, simpler)? The architecture should be game-agnostic from the start.
5. **Streaming/broadcasting**: Should the skill support live-streaming gameplay (e.g., OBS integration, Twitch) for "TwitchPlaysPokémon but it's an AI" scenarios?
6. **Save management**: How to handle saves across sessions? Auto-save on each turn? Named savestates for "checkpoint" moments?
7. **Python version**: Target Python 3.10/3.11 (pre-built mgba wheels) or 3.12+ (requires source build)? Could provide both paths in setup.sh.

---

## References

- [gpt-play-pokemon-firered](https://github.com/Clad3815/gpt-play-pokemon-firered) — Reference implementation (CC BY-NC 4.0, studied for architecture patterns)
- [PyGBA](https://github.com/dvruette/pygba) — MIT-licensed Python GBA emulator wrapper (our chosen emulation layer)
- [pret/pokefirered](https://github.com/pret/pokefirered) — Community decompilation providing memory addresses and game data
- [PyBoy](https://github.com/Baekalfen/PyBoy) — Python GB/GBC emulator (Phase 1 / Gen 1-2 fallback)
- [libmgba-py](https://github.com/hanzi/libmgba-py) — Alternative mGBA Python bindings
- [mgba-mcp](https://github.com/struktured-labs/mgba-mcp) — MCP server for mGBA (evaluated, too slow for gameplay)
- [ClaudePlaysPokemonStarter](https://github.com/davidhershey/ClaudePlaysPokemonStarter) — PyBoy + Claude for Pokémon Red
- [Pokemon-OpenClaw](https://github.com/drbarq/Pokemon-OpenClaw) — PyBoy + FastAPI for Pokémon Red
- [PokemonRedExperiments](https://github.com/PWhiddy/PokemonRedExperiments) — RL training with PyBoy
- [Hermes CONTRIBUTING.md](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) — Skill vs. Tool criteria



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Pokémon Playing Skill — Headless GBA/GB Emulation with AI Gameplay (inspired by gpt-play-pokemon-firered) #417

Overview

Research Findings

How gpt-play-pokemon-firered Works

Key Design Decisions in the Reference Project

License Constraint

Headless Emulation: The Landscape

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

Architecture

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Approach	Headless?	Install	Speed	GBA Support	Verdict
PyGBA + mGBA bindings	✅ Yes	pip + build	Fast (in-process)	✅	Best option
libmgba-py	✅ Yes	Build from source	Fast (in-process)	✅	Same as above, different build
mGBA-Qt + xvfb	Partial (virtual display)	apt install	Medium	✅	Heavyweight
mgba-mcp	Partial (xvfb)	npm + apt	Slow (subprocess per op)	✅	Too slow for gameplay
PyBoy	✅ Yes	pip install	Fast	❌ GB/GBC only	Gen 1-2 only
RetroArch headless	Unreliable	apt install	Medium	✅	Documented issues
PyBoyAdvance	✅ Yes	pip install	Very slow	✅	Experimental, not ready

Endpoint	Method	Purpose
`/state`	GET	Full game state JSON (player, party, bag, map, battle, dialog)
`/screenshot`	GET	Current frame as base64 PNG
`/action`	POST	Send button inputs (press_a, press_b, walk_up, etc.)
`/action/path`	POST	Pathfind + walk to coordinates
`/save`	POST	Create savestate
`/load`	POST	Load savestate
`/minimap`	GET	ASCII/emoji minimap of explored area

Feature: Pokémon Playing Skill — Headless GBA/GB Emulation with AI Gameplay (inspired by gpt-play-pokemon-firered) #417

Description

Overview

Research Findings

How gpt-play-pokemon-firered Works

Key Design Decisions in the Reference Project

License Constraint

Headless Emulation: The Landscape

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

Architecture

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions