Pinned
- Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
- raised some money moved to sf hiring ml engineers (dm if you want to train models and make games)
- Replying to @bryan_johnsoneverything ive learned about Bryan has been against my will
- Replying to @Shreyko(1/9) Game Setup: Among AIs is built on our web-native game-engine. In each episode, agents are assigned roles: either Impostor, tasked with eliminating crewmates without being identified, or Crewmate, navigating a fog-of-war map to complete tasks and find clues to expose the
00:00 - Replying to @Shreyko(4/9) Here we plot harm (measured as fraction of model’s votes that contributed to mislynches / wrong ejections) against proactive commitment (how often model committed to eventual ejection before it was popular) for Impostors. The most surprising finding is Claude Sonnet 4
- you're building an ai agent. i'm building the world where ai agents live. we are not the same
- Replying to @Shreykotired of these fake benchmarks? us too. 4wallai.com/amongais
- Replying to @Shreyko(6/9) Another interesting snippet (Gemini is the impostor)
- Replying to @ShreykoIf you’re training models, (@xai, @MistralAI, @NousResearch, @AIatMeta and others) reach out and we’ll run head-to-head matches under fixed prompts, returning full logs and metrics. If you want variants (new maps, roles, or constraints), we’ll co-design them with you. DM to get








