Inspiration
I've been dreaming about using LLMs as units to represent people/groups of people for a long time. I think it has so many exciting implications for modeling real world scenarios, enabling agents to self-improve and to even add to virtual worlds!
Imagine being able to model any society or question as "actors" and "actions" and watch an extremely realistic simulation (that could be run many times) give insights and data to act on!
You might ask - what's really different about this from just sending a single request with the question? The answer is the same as the motivation behind thinking models and inference time compute scaling → more tokens spent on one question = deeper and better answers.
Why not just ask the LLM directly? Why a "simulation"?
You might ask—what's really different about this from sending a single request with the question?
The answer mirrors the motivation behind thinking models and inference-time compute scaling: more tokens spent on a question = deeper, better answers.
A single LLM call gives you a surface-level response based on the model's training. This system spends 100x more tokens: each actor is enriched with real information from the web (giving them complete LONG histories, memories etc), then independently analyzes the situation, makes decisions, sends messages, and reacts to outcomes across multiple rounds. The depth comes from:
- Independent reasoning: 10 actors each spending tokens thinking through their unique perspective
- Emergent complexity: Interactions and second-order effects that no single call would capture
- Iterative refinement: Each round builds on previous outcomes, like chain-of-thought at scale
- Temporal realism: Actions have durations and consequences that unfold over time
Just like o1 spends more inference compute for better reasoning, this system scales compute horizontally across multiple agents. The result isn't just an answer—it's a rich, multi-perspective simulation that reveals dynamics a single call would miss.
Full design details in BACKEND_DESIGN.md on GitHub.
What it does
Actors-Actions transforms any question into a living simulation. Ask "What happens if the AI bubble pops?" and watch fully enriched AI actors—each with memory, characteristics, goals, and history—make decisions, send messages, and evolve over discrete time steps. The system generates relevant actors, enriches their profiles, then runs a temporally-grounded simulation where actions have durations and consequences ripple through time. Each round, actors learn from outcomes and adapt their strategies, creating a self-improving simulation where emergent behaviors compound over time. Perfect for exploring "what-if" scenarios, understanding social dynamics, or stress-testing policies before real-world deployment. There's a UI to go along with the ability to generate questions, enrich, simulate and replay old simulations, all with snazzy animations.
How I built it
The system starts by asking an LLM to generate both the actors AND determine the simulation's time scale (hours, days, months, etc.) and duration based on the question context. This single generation step creates 8-12 contextually relevant actors with research queries for enrichment.
Tech Stack:
- Backend: Python, FastAPI, MongoDB, async processing via ThreadPoolExecutor
- Frontend: React, TypeScript, Vite
- LLMs: Multi-model via OpenRouter (Claude Sonnet 4.5 for reasoning, Gemini 2.5 Flash for enrichment, Qwen 2.5 72B for actions)
- Observability: Weave for LLM tracing
- Deployment: Daytona sandboxes
The architecture uses a sophisticated action queue system where actors schedule actions with start times and durations, the world engine processes scheduled actions each round, and actor states evolve based on outcomes. Full design details in BACKEND_DESIGN.md on GitHub.
Challenges I ran into
Initial Architecture Design: Getting the core design right was critical—actors needed autonomy while the world engine maintained consistency.
Action Queue System: Building a temporal action queue that handles multi-round actions was complex. Actions need start times, durations, and proper state tracking across rounds. The data structure went through several revisions to handle scheduled → active → completed state transitions cleanly.
World Engine Design: Creating a world engine that processes 10+ simultaneous actions, resolves conflicts, delivers messages, updates actor states, and generates coherent outcomes while maintaining temporal consistency was the biggest technical challenge. Required careful data structure design to keep everything synchronized.
Integration & Data Flow: Coordinating async FastAPI with synchronous MongoDB/OpenRouter clients, managing actor state growth (action history accumulates), and keeping prompts under token limits while preserving context required careful architectural decisions.
Accomplishments that I am proud of
The World Engine: Built a sophisticated world processing system that handles multiple simultaneous actions, resolves conflicts, maintains temporal consistency, and generates coherent narrative outcomes—while sending updates to each individual actor. It's the heart of the simulation and it actually works.
Data Structure Design: Created a clean separation between static actor profiles, evolving actor states, scheduled actions, and round history. The action queue system with temporal scheduling and multi-round action tracking is elegant and scalable. Everything persists properly in MongoDB and the simulation state can be replayed round-by-round.
Self-Improving Actors: Each actor maintains a complete action history with outcomes and reasoning. They learn from successes and failures, adapting strategies based on what worked. This creates genuine emergence—actors get smarter and more coordinated as the simulation progresses, without any explicit learning algorithm.
Complete System in One & Half Days: From concept to working demo—actor generation, enrichment, action scheduling, world processing, message passing, frontend visualization, multi-model LLM integration, and deployment—all functional and documented. The system actually runs realistic simulations end-to-end.
Intuitive UI: Built a clean React interface that lets you watch simulations unfold in real-time, see actor decisions and reasoning, track messages between actors, and replay any round. The visualization makes the complex multi-agent simulation accessible and engaging.
What I learned
LLMs can simulate complex social dynamics: Actors with memory and goals make surprisingly realistic decisions. Emergent interactions create believable dynamics that weren't explicitly programmed.
Multi-model architecture works: Specialized models (Claude for reasoning, Gemini for speed, Qwen for repetition) cut costs to ~$0.29 per simulation while maintaining quality.
Temporal structure is crucial: Letting the LLM choose time scales and giving actions explicit durations creates coherent simulations. Time-awareness matters for realism.
This is viable: The system generates valuable insights. Next step: back-test against historical events to validate accuracy.
What's next for Actors & Actions
Simulated Environments with Real Actions: Enable actors to take concrete actions beyond messaging—trade stocks, manage budgets, write code, deploy resources. This turns the system into a true sandbox for testing strategies and policies with realistic constraints and feedback loops.
Historical Back-Testing: Validate the system by simulating past events (e.g., "2008 financial crisis," "COVID-19 pandemic response"). Compare simulation outcomes against what actually happened to measure predictive accuracy and identify model limitations.
RL-Trained World Engine: Use reinforcement learning to train the world engine on historical outcomes, improving its ability to model realistic consequences and second-order effects. The goal: move from plausible simulations to predictively accurate ones.
Scaling, Scaling Scaling: Increase agent total to 1000's or millions - find optimizations to make this work and the data to simulate so many actors.
Ultimate Vision: A system that can reliably forecast social, economic, and political dynamics—enabling policymakers, companies, and researchers to stress-test decisions before deployment in the real world.
Built With
- daytona
- fastapi
- mongodb
- python
- react
- tavily
- typescript
- weave
Log in or sign up for Devpost to join the conversation.