Skip to content

Feature: Pokémon Playing Skill — Headless GBA/GB Emulation with AI Gameplay (inspired by gpt-play-pokemon-firered) #417

@teknium1

Description

@teknium1

Overview

Inspired by gpt-play-pokemon-firered — an autonomous AI agent that plays Pokémon FireRed in real-time using OpenAI's LLMs — this proposes a Pokémon Playing Skill that enables Hermes Agent to play Pokémon games via headless GBA/GB emulation. The agent would read structured game state from emulator memory, make strategic decisions, and send button inputs — all from the terminal with no display server required.

The reference project uses a complex 4-layer architecture (mGBA + Lua → Python FastAPI bridge → Node.js AI agent → OpenAI API), totaling ~15,000+ lines across Python, Node.js, and Lua. It requires a GUI-based mGBA instance and is tightly coupled to OpenAI. For Hermes, we can dramatically simplify this: Hermes IS the AI agent, so we eliminate the Node.js layer entirely. By using PyGBA (MIT) for headless emulation, we also eliminate the Lua scripting layer and mGBA GUI dependency.

The result: Hermes Agent plays Pokémon through a lightweight Python game server, using its native tools (terminal, vision, memory) for the decision loop — no separate agent process, no GUI, no OpenAI lock-in.


Research Findings

How gpt-play-pokemon-firered Works

The reference project has four distinct layers communicating in a chain:

mGBA Emulator (GBA ROM)
     |  (Lua TCP socket, port 8888)
Lua Bridge (FireRedBridgeSocketServer.lua, ~950 lines)
     |  (TCP socket with <|END|> framing)
Python Bridge (firered_mgba_bridge.py, FastAPI, port 8000, ~1500 lines)
     |  (HTTP REST API)
Node.js AI Agent (server/index.js + gameLoop.js, ~1400 lines)
     |  (OpenAI API - hardcoded)
OpenAI GPT Model (default: gpt-5.2)

Lua Layer: Runs inside mGBA, exposes a TCP socket for reading arbitrary GBA memory (8/16/32-bit), sending frame-accurate button presses, taking screenshots, and smart overworld movement control. Uses pokefirered decomp symbols for address resolution.

Python Bridge: FastAPI server that reads GBA RAM through the Lua socket and parses it into structured game state JSON. Key features:

  • Full game state endpoint: player position/facing/badges/money, party (with PID/OTID XOR decryption and substructure unshuffling), bag (XOR-encrypted), PC storage, dialog state, battle state, map tiles, NPCs, warps, connections
  • Fog-of-war system tracking discovered vs. unexplored tiles across sessions
  • Per-command before/after state tracing
  • Periodic savestate backups

Node.js Agent: Continuous decision loop — fetch state, build prompt with context (game state + memory + objectives + screenshots), send to OpenAI, process tool calls (execute_action, write_memory, update_objectives), send commands back. Includes summary rollup (every 120 steps), self-criticism (every 55 steps), and pathfinding sub-prompts.

Prompts (~2,700 lines total across 5 files):

  • game.txt (~1,300 lines): Core gameplay prompt — priorities, tool usage, RAM interpretation, battle strategy (Gen 3 phys/special split), team building, HM management, anti-loop protocol
  • self_criticism.txt (~760 lines): Error analysis, strategy audit, memory management review
  • path_finding.txt (~455 lines): A*/BFS pathfinding with collision handling, ledge mechanics, movement cost weighting
  • summary.txt / summary_rollup.txt: Gameplay state summarization with confidence tags

Key Design Decisions in the Reference Project

  1. Memory reading over vision: The core game state comes from reading RAM directly (positions, HP, items, flags), NOT from screenshot OCR. Screenshots supplement memory data for the LLM's visual understanding.
  2. Dual navigation: explored_map (fog-of-war persistent) as primary nav, visible_area (current viewport) as secondary.
  3. Anti-loop protocol: Question-mark tiles (unexplored) prioritized; explicit loop detection and escape heuristics.
  4. Structured objectives: Each objective has what/why/how/context fields, maintained by the AI.
  5. Summary compaction: Periodic summaries prevent context overflow; rollups merge older summaries.

License Constraint

The reference project is CC BY-NC 4.0 (Non-Commercial). We cannot use its code directly. However:

  • The pokefirered decomp (pret/pokefirered) provides memory addresses/symbols as community knowledge
  • PyGBA is MIT licensed
  • We write our own implementation inspired by the architecture patterns

Headless Emulation: The Landscape

We evaluated multiple approaches for running GBA emulation without a display server:

Approach Headless? Install Speed GBA Support Verdict
PyGBA + mGBA bindings ✅ Yes pip + build Fast (in-process) Best option
libmgba-py ✅ Yes Build from source Fast (in-process) Same as above, different build
mGBA-Qt + xvfb Partial (virtual display) apt install Medium Heavyweight
mgba-mcp Partial (xvfb) npm + apt Slow (subprocess per op) Too slow for gameplay
PyBoy ✅ Yes pip install Fast ❌ GB/GBC only Gen 1-2 only
RetroArch headless Unreliable apt install Medium Documented issues
PyBoyAdvance ✅ Yes pip install Very slow Experimental, not ready

Winner: PyGBA — MIT licensed, pip installable, truly headless (no X11/display needed), provides direct memory read/write, button input, and frame rendering as numpy arrays. Already has a PokemonEmerald game wrapper as a reference. The main challenge is the mGBA Python bindings: pre-built wheels exist only for Python 3.10-3.11 on Linux; Python 3.12+ requires building mGBA from source with -DBUILD_PYTHON=ON.

Fallback: PyBoy — For Gen 1-2 (Pokémon Red/Blue/Gold/Silver), PyBoy is simpler: pip install pyboy, window='null' for headless. Multiple AI projects already use it successfully (ClaudePlaysPokemonStarter, Pokemon-OpenClaw, PokemonRedExperiments). Could be Phase 0 if GBA setup proves too complex.


Current State in Hermes Agent

No existing gaming/emulation capabilities beyond Minecraft server management. No issues related to Pokémon, emulators, or game-playing agents.

Relevant existing capabilities that the skill would leverage:

  • terminal tool — run Python scripts, manage background processes, HTTP calls
  • vision_analyze tool — analyze screenshots from the emulator
  • memory tool — persistent memory for game strategy, objectives, map knowledge
  • process tool — manage background emulator server
  • execute_code tool — run Python code for quick calculations
  • Background process management (background=true, poll, kill)
  • Cronjob scheduling — potential for autonomous long-running gameplay sessions

Related but distinct:

  • Minecraft modpack server skill (gaming category exists)
  • No overlap — completely different domain

Implementation Plan

Skill vs. Tool Classification

This should be a skill because:

  • The capability is expressible as Python scripts + existing tools (terminal, vision, memory)
  • It wraps external libraries (PyGBA/PyBoy) that the agent calls via shell commands
  • No custom Python integration or API key management needs to be baked into the agent harness
  • Hermes's existing tool suite (terminal for running scripts, vision for screenshots, memory for game knowledge) provides everything needed

Should it be bundled? Yes — while CONTRIBUTING.md notes that games are typically Skills Hub candidates, this is a flagship showcase capability that demonstrates Hermes Agent's most advanced features (persistent memory, vision analysis, strategic multi-session reasoning, background process management) in a compelling, accessible way. Bundling it means every Hermes install can play Pokémon out of the box — a powerful "wow factor" demo. → Bundled in skills/gaming/pokemon-player/.

Architecture

Unlike the reference project's 4-layer architecture, Hermes's approach is 2 layers:

PyGBA (mGBA Python bindings)
     |  (in-process Python calls)
Pokemon Game Server (Python FastAPI, background process)
     |  (HTTP REST API on localhost)
Hermes Agent (existing tools: terminal, vision, memory)

The Pokemon Game Server is a lightweight Python FastAPI app that:

  • Loads the ROM via PyGBA at startup (headless, no display)
  • Exposes REST endpoints for game interaction
  • Handles memory address parsing for structured game state

API Endpoints:

Endpoint Method Purpose
/state GET Full game state JSON (player, party, bag, map, battle, dialog)
/screenshot GET Current frame as base64 PNG
/action POST Send button inputs (press_a, press_b, walk_up, etc.)
/action/path POST Pathfind + walk to coordinates
/save POST Create savestate
/load POST Load savestate
/minimap GET ASCII/emoji minimap of explored area

Hermes's decision loop (no custom code needed — just skill instructions):

  1. curl localhost:8765/state → read game state JSON
  2. curl localhost:8765/screenshot > /tmp/pokemon_screen.png → get visual
  3. Think: what should I do next? (use memory for objectives, map knowledge)
  4. curl -X POST localhost:8765/action -d '{"actions": ["walk_up", "walk_up", "press_a"]}'
  5. Check result, update memory/objectives, repeat

What We'd Need

Skill directory structure:

pokemon-player/
├── SKILL.md                           # Skill instructions for Hermes
├── scripts/
│   ├── setup.sh                       # Install PyGBA, mGBA bindings, dependencies
│   ├── pokemon_server.py              # FastAPI game server (~500-800 lines)
│   ├── memory_reader.py               # FireRed RAM address parser (~400 lines)
│   ├── state_builder.py               # Build structured state from raw memory (~300 lines)
│   └── pathfinder.py                  # A* pathfinding on tile grid (~200 lines)
├── references/
│   ├── firered_addresses.json         # Memory addresses for FireRed (from pokefirered decomp)
│   ├── game_data.json                 # Species, moves, items, type chart data
│   ├── battle_strategy.md             # Gen 3 battle mechanics reference
│   └── progression_guide.md           # Key story milestones and flags
└── templates/
    └── game_prompt.md                 # System prompt template for gameplay sessions

Dependencies:

  • Python: pygba, mgba (Python bindings), fastapi, uvicorn, Pillow
  • System (for building mGBA): cmake, libelf-dev, libzip-dev, libsqlite3-dev, libpng-dev
  • ROM: User must provide their own Pokémon FireRed ROM (the skill MUST NOT include or distribute ROMs)

Phased Rollout

Phase 1: Basic Gameplay (Gen 1 via PyBoy — quick win)

  • PyBoy-based server for Pokémon Red/Blue (GB) — simplest possible setup
  • pip install pyboy (no source build needed)
  • Basic state reading: player position, party summary, badges, dialog
  • Button input: directional movement + A/B/Start/Select
  • Screenshot capture for vision analysis
  • Simple skill instructions: explore, battle, catch, progress story
  • This proves the concept with minimal setup friction

Phase 2: FireRed via PyGBA (the real target)

  • PyGBA-based server for Pokémon FireRed/LeafGreen (GBA)
  • Full memory reader using pokefirered decomp addresses
  • Party decryption (PID/OTID XOR, substructure shuffling — Gen 3 specific)
  • Bag reading (security key XOR decryption)
  • Battle state (active Pokémon, moves, type matchups)
  • Map/collision data with fog-of-war tracking
  • Pathfinding (A* with terrain costs)
  • Build script for mGBA Python bindings

Phase 3: Advanced Gameplay

  • Self-criticism loop (periodic strategy review)
  • Summary compaction (compress game history to manage context)
  • Progress milestone tracking (badges, story events, Champion)
  • Multi-game support: Ruby/Sapphire/Emerald (reuse PyGBA, different addresses)
  • Long-running autonomous sessions via cronjob scheduling
  • Optional: web dashboard for monitoring (WebSocket state broadcast)

Phase 4: Multi-Generation Support

  • PyBoy backend for Gen 1-2 (Red/Blue/Yellow/Gold/Silver/Crystal)
  • PyGBA backend for Gen 3 (FireRed/LeafGreen/Ruby/Sapphire/Emerald)
  • Abstracted game interface so the skill works across generations
  • Game-specific address files and strategy references

Pros & Cons

Pros

  • Unique showcase capability — An AI agent that plays Pokémon is compelling for demos, streams, and community engagement. Pokémon is perfect for AI: turn-based battles, explorable world, clear objectives, measurable progress (badges, Pokédex, Champion).
  • Leverages Hermes's strengths — Memory (persistent game knowledge), vision (screenshot analysis), terminal (script execution), background processes (game server). No new tools needed.
  • Dramatically simpler than reference — 2 layers vs 4. No Lua, no Node.js, no OpenAI lock-in. Hermes IS the agent, using any model.
  • Headless operation — PyGBA runs without any display server. Works on headless servers, SSH sessions, CI/CD, cloud VMs.
  • Model-agnostic — Unlike the reference project (OpenAI-only), Hermes can play with any LLM backend.
  • MIT-licensed foundation — PyGBA is MIT. pokefirered decomp is community knowledge. No license conflicts.
  • Extensible — The game server pattern generalizes to other games. Address files are swappable.
  • Educational — Demonstrates complex agent capabilities: persistent state, strategic planning, real-time decision-making, multi-session continuity.

Cons / Risks

  • mGBA build complexity — Python 3.12+ requires building mGBA from source with -DBUILD_PYTHON=ON. Needs cmake, system libraries. Could be a setup barrier. Mitigated by Phase 1's PyBoy approach (pip-only).
  • ROM requirement — Users must provide their own ROM file. We cannot distribute ROMs. This adds setup friction and potential legal gray area (though emulation is legal, ROM distribution is not).
  • Token cost — Each gameplay "turn" requires an LLM call. A full playthrough (30-50 hours of gameplay) could require thousands of turns. At ~$0.01-0.05/turn, a complete game could cost $30-250+ in API calls depending on model.
  • Context management — Game state + memory + objectives can be large. Need careful prompt engineering to stay within context limits. Summary compaction (Phase 3) helps.
  • Game-specific addresses — Memory addresses are specific to each ROM version (FireRed USA v1.0 vs v1.1). The skill must clearly specify which ROM version is supported.
  • Niche appeal — Not everyone wants their AI to play Pokémon. But the pattern (agent + emulator + game) generalizes, and as a bundled showcase it demonstrates capabilities that attract users to the platform.
  • Long-running sessions — A full playthrough requires sustained multi-session engagement. Hermes's memory system handles this, but it's untested at this scale.

Open Questions

  1. Phase 1 game choice: Should we start with Pokémon Red (GB/PyBoy, simpler) or go straight to FireRed (GBA/PyGBA)? PyBoy is dramatically easier to set up but the user specifically asked about FireRed.
  2. Autonomy level: Should the agent play fully autonomously (cronjob-driven, reports progress), or interactively (user watches and can intervene)? Or both modes?
  3. Vision vs memory-only: The reference project uses both RAM reading and screenshots. How much should we rely on vision_analyze for the screen vs. structured RAM data? Vision is more flexible but slower and uses more tokens.
  4. Multi-game priority: After FireRed, which game next? Emerald (PyGBA already has a wrapper)? Red/Blue (PyBoy, simpler)? The architecture should be game-agnostic from the start.
  5. Streaming/broadcasting: Should the skill support live-streaming gameplay (e.g., OBS integration, Twitch) for "TwitchPlaysPokémon but it's an AI" scenarios?
  6. Save management: How to handle saves across sessions? Auto-save on each turn? Named savestates for "checkpoint" moments?
  7. Python version: Target Python 3.10/3.11 (pre-built mgba wheels) or 3.12+ (requires source build)? Could provide both paths in setup.sh.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions