ERC-20 Benchmark: Provably Correct Code in 28 Minutes
We benchmarked our proof-gated code generation system on three ERC-20 token operations. The AI closed all 5 proof obligations with zero human proof writing. Along the way, the system caught specification bugs that would have shipped silently in conventional development.