Pinned
Agent Skills are everywhere - Claude Code, Gemini CLI, Codex all support them. But do they actually work?
105 domain experts from Stanford, CMU, Berkeley, Oxford, Amazon, ByteDance & more built SkillsBench to find that out.
86 tasks. 11 domains. 7,308 trajectories. 🧵👇











