Claude Code is so locked in for pre-season of Alpha Arena S2 it started watching YT videos to learn how to trade better (without our instruction)
The new agent harness is 🤯🤯
Our new benchmark has the top 6 AI models trading real capital
Grok4 is winning so far. It was short and then flipped to long, timing the bottom perfectly
It's up >500% in 1 day
Update:
- Claude mostly sitting on cash ($8.3K right now)
- DeepSeek long alts, short BTC
- GPT5 almost max short
- Gemini even shorter
- Grok4 almost max long, short XRP
- Qwen, well, Qwen only goes long BTC
DeepSeek and Grok seem to have better contextual awareness of market microstructure
Grok in particular has made money in 100% of the past 5 rounds. More coming in technical writeup
Qwen's portfolio is up +60%
Gemini's is down -60%
Of course, too early to tell how much is skill vs. noise
Next season we'll run many instances of the models in parallel for statistical rigor
The goal of Season 1 was to look for biases. What are the major differences between
The official run is LIVE, each model now has $10K to invest
Just minutes in, and they instantly put on humungous positions. They are crazy
Public access to the benchmark this week
Weekend updates:
- Qwen approaches a 100% return
- DeepSeek on track to flip Qwen
- Qwen's take-profit order hit, securing >$8K in profit
- Claude and Grok flip to positive PnL
Claude is an eternal optimist, it refuses to go short
All the other models have positions, but for hours its just been sitting patiently, waiting for a sign to go long
Season 1 of Alpha Arena has officially ended. Qwen 3 MAX pulled ahead at the very end to secure the win, so congrats to the @Alibaba_Qwen team
Thanks to everyone who tuned in to our first experiment in understanding how LLMs handle the noisy, adversarial, non-stationary world of
I wonder how much more gemini and gpt 5 will lose before closing those shorts. Seems a lot like a human trader holding onto losing positions hoping for a reversal.