Overview
Inspired by gpt-play-pokemon-firered — an autonomous AI agent that plays Pokémon FireRed in real-time using OpenAI's LLMs — this proposes a Pokémon Playing Skill that enables Hermes Agent to play Pokémon games via headless GBA/GB emulation. The agent would read structured game state from emulator memory, make strategic decisions, and send button inputs — all from the terminal with no display server required.
The reference project uses a complex 4-layer architecture (mGBA + Lua → Python FastAPI bridge → Node.js AI agent → OpenAI API), totaling ~15,000+ lines across Python, Node.js, and Lua. It requires a GUI-based mGBA instance and is tightly coupled to OpenAI. For Hermes, we can dramatically simplify this: Hermes IS the AI agent, so we eliminate the Node.js layer entirely. By using PyGBA (MIT) for headless emulation, we also eliminate the Lua scripting layer and mGBA GUI dependency.
The result: Hermes Agent plays Pokémon through a lightweight Python game server, using its native tools (terminal, vision, memory) for the decision loop — no separate agent process, no GUI, no OpenAI lock-in.
Research Findings
How gpt-play-pokemon-firered Works
The reference project has four distinct layers communicating in a chain:
mGBA Emulator (GBA ROM)
| (Lua TCP socket, port 8888)
Lua Bridge (FireRedBridgeSocketServer.lua, ~950 lines)
| (TCP socket with <|END|> framing)
Python Bridge (firered_mgba_bridge.py, FastAPI, port 8000, ~1500 lines)
| (HTTP REST API)
Node.js AI Agent (server/index.js + gameLoop.js, ~1400 lines)
| (OpenAI API - hardcoded)
OpenAI GPT Model (default: gpt-5.2)
Lua Layer: Runs inside mGBA, exposes a TCP socket for reading arbitrary GBA memory (8/16/32-bit), sending frame-accurate button presses, taking screenshots, and smart overworld movement control. Uses pokefirered decomp symbols for address resolution.
Python Bridge: FastAPI server that reads GBA RAM through the Lua socket and parses it into structured game state JSON. Key features:
- Full game state endpoint: player position/facing/badges/money, party (with PID/OTID XOR decryption and substructure unshuffling), bag (XOR-encrypted), PC storage, dialog state, battle state, map tiles, NPCs, warps, connections
- Fog-of-war system tracking discovered vs. unexplored tiles across sessions
- Per-command before/after state tracing
- Periodic savestate backups
Node.js Agent: Continuous decision loop — fetch state, build prompt with context (game state + memory + objectives + screenshots), send to OpenAI, process tool calls (execute_action, write_memory, update_objectives), send commands back. Includes summary rollup (every 120 steps), self-criticism (every 55 steps), and pathfinding sub-prompts.
Prompts (~2,700 lines total across 5 files):
game.txt (~1,300 lines): Core gameplay prompt — priorities, tool usage, RAM interpretation, battle strategy (Gen 3 phys/special split), team building, HM management, anti-loop protocol
self_criticism.txt (~760 lines): Error analysis, strategy audit, memory management review
path_finding.txt (~455 lines): A*/BFS pathfinding with collision handling, ledge mechanics, movement cost weighting
summary.txt / summary_rollup.txt: Gameplay state summarization with confidence tags
Key Design Decisions in the Reference Project
- Memory reading over vision: The core game state comes from reading RAM directly (positions, HP, items, flags), NOT from screenshot OCR. Screenshots supplement memory data for the LLM's visual understanding.
- Dual navigation:
explored_map (fog-of-war persistent) as primary nav, visible_area (current viewport) as secondary.
- Anti-loop protocol: Question-mark tiles (unexplored) prioritized; explicit loop detection and escape heuristics.
- Structured objectives: Each objective has what/why/how/context fields, maintained by the AI.
- Summary compaction: Periodic summaries prevent context overflow; rollups merge older summaries.
License Constraint
The reference project is CC BY-NC 4.0 (Non-Commercial). We cannot use its code directly. However:
- The pokefirered decomp (pret/pokefirered) provides memory addresses/symbols as community knowledge
- PyGBA is MIT licensed
- We write our own implementation inspired by the architecture patterns
Headless Emulation: The Landscape
We evaluated multiple approaches for running GBA emulation without a display server:
| Approach |
Headless? |
Install |
Speed |
GBA Support |
Verdict |
| PyGBA + mGBA bindings |
✅ Yes |
pip + build |
Fast (in-process) |
✅ |
Best option |
| libmgba-py |
✅ Yes |
Build from source |
Fast (in-process) |
✅ |
Same as above, different build |
| mGBA-Qt + xvfb |
Partial (virtual display) |
apt install |
Medium |
✅ |
Heavyweight |
| mgba-mcp |
Partial (xvfb) |
npm + apt |
Slow (subprocess per op) |
✅ |
Too slow for gameplay |
| PyBoy |
✅ Yes |
pip install |
Fast |
❌ GB/GBC only |
Gen 1-2 only |
| RetroArch headless |
Unreliable |
apt install |
Medium |
✅ |
Documented issues |
| PyBoyAdvance |
✅ Yes |
pip install |
Very slow |
✅ |
Experimental, not ready |
Winner: PyGBA — MIT licensed, pip installable, truly headless (no X11/display needed), provides direct memory read/write, button input, and frame rendering as numpy arrays. Already has a PokemonEmerald game wrapper as a reference. The main challenge is the mGBA Python bindings: pre-built wheels exist only for Python 3.10-3.11 on Linux; Python 3.12+ requires building mGBA from source with -DBUILD_PYTHON=ON.
Fallback: PyBoy — For Gen 1-2 (Pokémon Red/Blue/Gold/Silver), PyBoy is simpler: pip install pyboy, window='null' for headless. Multiple AI projects already use it successfully (ClaudePlaysPokemonStarter, Pokemon-OpenClaw, PokemonRedExperiments). Could be Phase 0 if GBA setup proves too complex.
Current State in Hermes Agent
No existing gaming/emulation capabilities beyond Minecraft server management. No issues related to Pokémon, emulators, or game-playing agents.
Relevant existing capabilities that the skill would leverage:
terminal tool — run Python scripts, manage background processes, HTTP calls
vision_analyze tool — analyze screenshots from the emulator
memory tool — persistent memory for game strategy, objectives, map knowledge
process tool — manage background emulator server
execute_code tool — run Python code for quick calculations
- Background process management (
background=true, poll, kill)
- Cronjob scheduling — potential for autonomous long-running gameplay sessions
Related but distinct:
- Minecraft modpack server skill (gaming category exists)
- No overlap — completely different domain
Implementation Plan
Skill vs. Tool Classification
This should be a skill because:
- The capability is expressible as Python scripts + existing tools (terminal, vision, memory)
- It wraps external libraries (PyGBA/PyBoy) that the agent calls via shell commands
- No custom Python integration or API key management needs to be baked into the agent harness
- Hermes's existing tool suite (terminal for running scripts, vision for screenshots, memory for game knowledge) provides everything needed
Should it be bundled? Yes — while CONTRIBUTING.md notes that games are typically Skills Hub candidates, this is a flagship showcase capability that demonstrates Hermes Agent's most advanced features (persistent memory, vision analysis, strategic multi-session reasoning, background process management) in a compelling, accessible way. Bundling it means every Hermes install can play Pokémon out of the box — a powerful "wow factor" demo. → Bundled in skills/gaming/pokemon-player/.
Architecture
Unlike the reference project's 4-layer architecture, Hermes's approach is 2 layers:
PyGBA (mGBA Python bindings)
| (in-process Python calls)
Pokemon Game Server (Python FastAPI, background process)
| (HTTP REST API on localhost)
Hermes Agent (existing tools: terminal, vision, memory)
The Pokemon Game Server is a lightweight Python FastAPI app that:
- Loads the ROM via PyGBA at startup (headless, no display)
- Exposes REST endpoints for game interaction
- Handles memory address parsing for structured game state
API Endpoints:
| Endpoint |
Method |
Purpose |
/state |
GET |
Full game state JSON (player, party, bag, map, battle, dialog) |
/screenshot |
GET |
Current frame as base64 PNG |
/action |
POST |
Send button inputs (press_a, press_b, walk_up, etc.) |
/action/path |
POST |
Pathfind + walk to coordinates |
/save |
POST |
Create savestate |
/load |
POST |
Load savestate |
/minimap |
GET |
ASCII/emoji minimap of explored area |
Hermes's decision loop (no custom code needed — just skill instructions):
curl localhost:8765/state → read game state JSON
curl localhost:8765/screenshot > /tmp/pokemon_screen.png → get visual
- Think: what should I do next? (use memory for objectives, map knowledge)
curl -X POST localhost:8765/action -d '{"actions": ["walk_up", "walk_up", "press_a"]}'
- Check result, update memory/objectives, repeat
What We'd Need
Skill directory structure:
pokemon-player/
├── SKILL.md # Skill instructions for Hermes
├── scripts/
│ ├── setup.sh # Install PyGBA, mGBA bindings, dependencies
│ ├── pokemon_server.py # FastAPI game server (~500-800 lines)
│ ├── memory_reader.py # FireRed RAM address parser (~400 lines)
│ ├── state_builder.py # Build structured state from raw memory (~300 lines)
│ └── pathfinder.py # A* pathfinding on tile grid (~200 lines)
├── references/
│ ├── firered_addresses.json # Memory addresses for FireRed (from pokefirered decomp)
│ ├── game_data.json # Species, moves, items, type chart data
│ ├── battle_strategy.md # Gen 3 battle mechanics reference
│ └── progression_guide.md # Key story milestones and flags
└── templates/
└── game_prompt.md # System prompt template for gameplay sessions
Dependencies:
- Python:
pygba, mgba (Python bindings), fastapi, uvicorn, Pillow
- System (for building mGBA):
cmake, libelf-dev, libzip-dev, libsqlite3-dev, libpng-dev
- ROM: User must provide their own Pokémon FireRed ROM (the skill MUST NOT include or distribute ROMs)
Phased Rollout
Phase 1: Basic Gameplay (Gen 1 via PyBoy — quick win)
- PyBoy-based server for Pokémon Red/Blue (GB) — simplest possible setup
pip install pyboy (no source build needed)
- Basic state reading: player position, party summary, badges, dialog
- Button input: directional movement + A/B/Start/Select
- Screenshot capture for vision analysis
- Simple skill instructions: explore, battle, catch, progress story
- This proves the concept with minimal setup friction
Phase 2: FireRed via PyGBA (the real target)
- PyGBA-based server for Pokémon FireRed/LeafGreen (GBA)
- Full memory reader using pokefirered decomp addresses
- Party decryption (PID/OTID XOR, substructure shuffling — Gen 3 specific)
- Bag reading (security key XOR decryption)
- Battle state (active Pokémon, moves, type matchups)
- Map/collision data with fog-of-war tracking
- Pathfinding (A* with terrain costs)
- Build script for mGBA Python bindings
Phase 3: Advanced Gameplay
- Self-criticism loop (periodic strategy review)
- Summary compaction (compress game history to manage context)
- Progress milestone tracking (badges, story events, Champion)
- Multi-game support: Ruby/Sapphire/Emerald (reuse PyGBA, different addresses)
- Long-running autonomous sessions via cronjob scheduling
- Optional: web dashboard for monitoring (WebSocket state broadcast)
Phase 4: Multi-Generation Support
- PyBoy backend for Gen 1-2 (Red/Blue/Yellow/Gold/Silver/Crystal)
- PyGBA backend for Gen 3 (FireRed/LeafGreen/Ruby/Sapphire/Emerald)
- Abstracted game interface so the skill works across generations
- Game-specific address files and strategy references
Pros & Cons
Pros
- Unique showcase capability — An AI agent that plays Pokémon is compelling for demos, streams, and community engagement. Pokémon is perfect for AI: turn-based battles, explorable world, clear objectives, measurable progress (badges, Pokédex, Champion).
- Leverages Hermes's strengths — Memory (persistent game knowledge), vision (screenshot analysis), terminal (script execution), background processes (game server). No new tools needed.
- Dramatically simpler than reference — 2 layers vs 4. No Lua, no Node.js, no OpenAI lock-in. Hermes IS the agent, using any model.
- Headless operation — PyGBA runs without any display server. Works on headless servers, SSH sessions, CI/CD, cloud VMs.
- Model-agnostic — Unlike the reference project (OpenAI-only), Hermes can play with any LLM backend.
- MIT-licensed foundation — PyGBA is MIT. pokefirered decomp is community knowledge. No license conflicts.
- Extensible — The game server pattern generalizes to other games. Address files are swappable.
- Educational — Demonstrates complex agent capabilities: persistent state, strategic planning, real-time decision-making, multi-session continuity.
Cons / Risks
- mGBA build complexity — Python 3.12+ requires building mGBA from source with
-DBUILD_PYTHON=ON. Needs cmake, system libraries. Could be a setup barrier. Mitigated by Phase 1's PyBoy approach (pip-only).
- ROM requirement — Users must provide their own ROM file. We cannot distribute ROMs. This adds setup friction and potential legal gray area (though emulation is legal, ROM distribution is not).
- Token cost — Each gameplay "turn" requires an LLM call. A full playthrough (30-50 hours of gameplay) could require thousands of turns. At ~$0.01-0.05/turn, a complete game could cost $30-250+ in API calls depending on model.
- Context management — Game state + memory + objectives can be large. Need careful prompt engineering to stay within context limits. Summary compaction (Phase 3) helps.
- Game-specific addresses — Memory addresses are specific to each ROM version (FireRed USA v1.0 vs v1.1). The skill must clearly specify which ROM version is supported.
- Niche appeal — Not everyone wants their AI to play Pokémon. But the pattern (agent + emulator + game) generalizes, and as a bundled showcase it demonstrates capabilities that attract users to the platform.
- Long-running sessions — A full playthrough requires sustained multi-session engagement. Hermes's memory system handles this, but it's untested at this scale.
Open Questions
- Phase 1 game choice: Should we start with Pokémon Red (GB/PyBoy, simpler) or go straight to FireRed (GBA/PyGBA)? PyBoy is dramatically easier to set up but the user specifically asked about FireRed.
- Autonomy level: Should the agent play fully autonomously (cronjob-driven, reports progress), or interactively (user watches and can intervene)? Or both modes?
- Vision vs memory-only: The reference project uses both RAM reading and screenshots. How much should we rely on vision_analyze for the screen vs. structured RAM data? Vision is more flexible but slower and uses more tokens.
- Multi-game priority: After FireRed, which game next? Emerald (PyGBA already has a wrapper)? Red/Blue (PyBoy, simpler)? The architecture should be game-agnostic from the start.
- Streaming/broadcasting: Should the skill support live-streaming gameplay (e.g., OBS integration, Twitch) for "TwitchPlaysPokémon but it's an AI" scenarios?
- Save management: How to handle saves across sessions? Auto-save on each turn? Named savestates for "checkpoint" moments?
- Python version: Target Python 3.10/3.11 (pre-built mgba wheels) or 3.12+ (requires source build)? Could provide both paths in setup.sh.
References
Overview
Inspired by gpt-play-pokemon-firered — an autonomous AI agent that plays Pokémon FireRed in real-time using OpenAI's LLMs — this proposes a Pokémon Playing Skill that enables Hermes Agent to play Pokémon games via headless GBA/GB emulation. The agent would read structured game state from emulator memory, make strategic decisions, and send button inputs — all from the terminal with no display server required.
The reference project uses a complex 4-layer architecture (mGBA + Lua → Python FastAPI bridge → Node.js AI agent → OpenAI API), totaling ~15,000+ lines across Python, Node.js, and Lua. It requires a GUI-based mGBA instance and is tightly coupled to OpenAI. For Hermes, we can dramatically simplify this: Hermes IS the AI agent, so we eliminate the Node.js layer entirely. By using PyGBA (MIT) for headless emulation, we also eliminate the Lua scripting layer and mGBA GUI dependency.
The result: Hermes Agent plays Pokémon through a lightweight Python game server, using its native tools (terminal, vision, memory) for the decision loop — no separate agent process, no GUI, no OpenAI lock-in.
Research Findings
How gpt-play-pokemon-firered Works
The reference project has four distinct layers communicating in a chain:
Lua Layer: Runs inside mGBA, exposes a TCP socket for reading arbitrary GBA memory (8/16/32-bit), sending frame-accurate button presses, taking screenshots, and smart overworld movement control. Uses pokefirered decomp symbols for address resolution.
Python Bridge: FastAPI server that reads GBA RAM through the Lua socket and parses it into structured game state JSON. Key features:
Node.js Agent: Continuous decision loop — fetch state, build prompt with context (game state + memory + objectives + screenshots), send to OpenAI, process tool calls (execute_action, write_memory, update_objectives), send commands back. Includes summary rollup (every 120 steps), self-criticism (every 55 steps), and pathfinding sub-prompts.
Prompts (~2,700 lines total across 5 files):
game.txt(~1,300 lines): Core gameplay prompt — priorities, tool usage, RAM interpretation, battle strategy (Gen 3 phys/special split), team building, HM management, anti-loop protocolself_criticism.txt(~760 lines): Error analysis, strategy audit, memory management reviewpath_finding.txt(~455 lines): A*/BFS pathfinding with collision handling, ledge mechanics, movement cost weightingsummary.txt/summary_rollup.txt: Gameplay state summarization with confidence tagsKey Design Decisions in the Reference Project
explored_map(fog-of-war persistent) as primary nav,visible_area(current viewport) as secondary.License Constraint
The reference project is CC BY-NC 4.0 (Non-Commercial). We cannot use its code directly. However:
Headless Emulation: The Landscape
We evaluated multiple approaches for running GBA emulation without a display server:
Winner: PyGBA — MIT licensed, pip installable, truly headless (no X11/display needed), provides direct memory read/write, button input, and frame rendering as numpy arrays. Already has a PokemonEmerald game wrapper as a reference. The main challenge is the mGBA Python bindings: pre-built wheels exist only for Python 3.10-3.11 on Linux; Python 3.12+ requires building mGBA from source with
-DBUILD_PYTHON=ON.Fallback: PyBoy — For Gen 1-2 (Pokémon Red/Blue/Gold/Silver), PyBoy is simpler:
pip install pyboy,window='null'for headless. Multiple AI projects already use it successfully (ClaudePlaysPokemonStarter, Pokemon-OpenClaw, PokemonRedExperiments). Could be Phase 0 if GBA setup proves too complex.Current State in Hermes Agent
No existing gaming/emulation capabilities beyond Minecraft server management. No issues related to Pokémon, emulators, or game-playing agents.
Relevant existing capabilities that the skill would leverage:
terminaltool — run Python scripts, manage background processes, HTTP callsvision_analyzetool — analyze screenshots from the emulatormemorytool — persistent memory for game strategy, objectives, map knowledgeprocesstool — manage background emulator serverexecute_codetool — run Python code for quick calculationsbackground=true, poll, kill)Related but distinct:
Implementation Plan
Skill vs. Tool Classification
This should be a skill because:
Should it be bundled? Yes — while CONTRIBUTING.md notes that games are typically Skills Hub candidates, this is a flagship showcase capability that demonstrates Hermes Agent's most advanced features (persistent memory, vision analysis, strategic multi-session reasoning, background process management) in a compelling, accessible way. Bundling it means every Hermes install can play Pokémon out of the box — a powerful "wow factor" demo. → Bundled in
skills/gaming/pokemon-player/.Architecture
Unlike the reference project's 4-layer architecture, Hermes's approach is 2 layers:
The Pokemon Game Server is a lightweight Python FastAPI app that:
API Endpoints:
/state/screenshot/action/action/path/save/load/minimapHermes's decision loop (no custom code needed — just skill instructions):
curl localhost:8765/state→ read game state JSONcurl localhost:8765/screenshot > /tmp/pokemon_screen.png→ get visualcurl -X POST localhost:8765/action -d '{"actions": ["walk_up", "walk_up", "press_a"]}'What We'd Need
Skill directory structure:
Dependencies:
pygba,mgba(Python bindings),fastapi,uvicorn,Pillowcmake,libelf-dev,libzip-dev,libsqlite3-dev,libpng-devPhased Rollout
Phase 1: Basic Gameplay (Gen 1 via PyBoy — quick win)
pip install pyboy(no source build needed)Phase 2: FireRed via PyGBA (the real target)
Phase 3: Advanced Gameplay
Phase 4: Multi-Generation Support
Pros & Cons
Pros
Cons / Risks
-DBUILD_PYTHON=ON. Needs cmake, system libraries. Could be a setup barrier. Mitigated by Phase 1's PyBoy approach (pip-only).Open Questions
References