Strategy Arena

Trade evaluation
Main dashboard
Leaderboard

Team Name: Buy Dip, Sell Rip

Inspiration

Chess.com revolutionized how players improve by scoring every move - was it brilliant, or a blunder? We asked: what if we could do the same for trading? Most backtesting platforms tell you whether a strategy made money, but not why. Was that profitable trade genuinely well-timed, or did the market just happen to move in your favor? Strategy Arena answers that question.

What It Does

Strategy Arena is a platform where users write trading strategies as simple Python classes and deploy them against 6 years of real BTC-USDT futures data across multiple timeframes. Each trade is then analyzed on three dimensions:

Move Quality - Every trade is scored against the locally optimal entry and exit within a surrounding window. We compute a quality score as 0.4 * entry_quality + 0.4 * exit_quality + 0.2 * capture_pct, where entry_quality measures how close you bought to the local minimum, exit_quality measures how close you sold to the local maximum, and capture_pct = actual_move / available_move measures what fraction of the price swing you captured. This produces chess-style ratings: Brilliant, Great, Best, Good, Inaccuracy, Mistake, Blunder.

Luck Analysis - For every trade, we run a Monte Carlo simulation of 200 random entries and exits in the same time window, then compute the percentile rank of the actual PnL against the random distribution. A trade that beats 85% of random attempts is tagged "Very Lucky" - the market moved favorably regardless of timing. A trade in the bottom 25th percentile that still profited? That's "Pure Skill."

ELO Ratings - Strategies compete in round-robin tournaments using the standard ELO formula: R' = R + K * (S - E), where E = 1 / (1 + 10^((Rb - Ra) / 400)), with K = 32, comparing Sharpe ratio, total PnL, and win rate across three rounds per pair.

The platform also detects market regimes via rolling volatility classification, correlates strategy performance with the Crypto Fear & Greed Index, computes comprehensive risk metrics (VaR, CVaR, Sortino, Calmar), and finds the optimal strategy ensemble through exhaustive combinatorial search with inverse-volatility weighting.

How We Built It

The backend is a Python FastAPI server with a ProcessPoolExecutor running strategies in parallel across all available CPU cores. Strategies are dynamically loaded Python files - users write a class with an on_candle(candle, history) method that returns BUY, SELL, or HOLD. The engine iterates through every candle, tracks positions, and records every trade.

We sourced 288 data files from Binance Vision covering 1-minute, 15-minute, 1-hour, and 1-day candles from 2020 through 2025 - roughly 3.4 million candle rows. Results are cached to disk as JSON so subsequent loads take seconds instead of minutes.

The frontend is React with Vite, TailwindCSS, and TradingView's Lightweight Charts library for professional-grade candlestick visualization with trade markers, zoom controls, and OHLC tooltips. We built custom SVG donut charts for move quality distribution and a full analytics dashboard with correlation matrices, regime analysis, and risk metric tables.

Challenges

Corrupted zip files - Git mangled the binary zip files during commits across Windows/Mac, producing negative seek errors on extraction. We solved it by pre-extracting all CSVs and committing them directly, with the data loader preferring CSV over ZIP.

Cross-platform Python compatibility - The zipfile module behaves differently across Python 3.8 and 3.11 on Windows vs Unix. We built a 4-method fallback chain (standard zipfile, BytesIO, PowerShell Expand-Archive, Unix unzip).

Backtest performance - Running 7 strategies on 500K+ candles each is computationally expensive. Multiprocessing with ProcessPoolExecutor and disk caching brought reload times from minutes to seconds.

What We Learned

Quantifying trading skill is fundamentally harder than quantifying chess skill because markets are stochastic - there's no single "best move." Our Monte Carlo approach to luck decomposition was the key insight: by simulating what random trading would have achieved in the same conditions, we can isolate the contribution of timing skill versus market momentum. The Fear & Greed correlation analysis revealed that most strategies perform very differently across sentiment regimes, suggesting that strategy selection should be adaptive rather than static.