Pinned
What we learned testing Claude Fable/Mythos 5 on Vending-Bench:
> Performance: Makes less money than Opus 4.7 and GPT-5.5
> Alignment: A step back. (Opus 4.8 was better, but we're back to Opus 4.6/4.7 behavior)
> It rationalizes its bad actions and has a weird moral boundary














