screenshot in the middle of a 2018 full year run

GooseThink

Inspiration

Prediction markets are the best uncensored information aggregators humanity has ever built. They compress millions of diffuse beliefs into a single price. But every existing prediction market trades politics, sports, or crypto. Nobody applies this machinery to Canadian geese migration.

Meanwhile, goose migration is one of the most studied and least-predicted phenomena in ecology. Real scientists track it with GPS collars and hope for the best. We thought what if we built a prediction market where AI bots compete on migration forecasts, using real weather data, with actual market mechanics?

(we also really like geese migration patterns)

What it does

GooseThink is a fully-functional prediction market simulator. Five AI trading bots, each running a different sklearn model, compete over multiple seasons to forecast Canadian goose migration patterns. They trade binary contracts against an LMSR market maker, which is the same pricing mechanism Polymarket uses, and settle at $0 or $1 based on real migration outcomes.

Three contracts per season:

Timing: Will geese migrate from northern regions before October 15?
Distance: Will the average migration distance exceed the threshold?
Route: Will more geese use the Atlantic flyway or the Mississippi flyway?

The platform plays a simulation at 1x/10x/50x/100x speed with a live trading floor, animated migration map, risk-adjusted leaderboard, and real-time commentary. Humans can trade against the bots.

How we built it

Event log architecture. The core insight: everything is an event. Trades, weather updates, migration observations, contract settlements, bot decisions, all six event types flow through a single append-only SQLite log. The frontend reconstructs every view from this stream.

LMSR exchange. The market maker uses the Logarithmic Market Scoring Rule, which is literally softmax:

$$p_i = \frac{e^{q_i / b}}{\sum_j e^{q_j / b}}$$

where $q_i$ is the quantity of shares outstanding for outcome $i$, and $b$ is the liquidity parameter. The cost to move the market from state $q$ to $q'$ is:

$$C(q') - C(q) = b \ln\left(\sum_j e^{q'_j / b}\right) - b \ln\left(\sum_j e^{q_j / b}\right)$$

We auto-calibrate $b = n_{\text{bots}} \cdot \text{capital} \cdot 0.05$, scaling liquidity to tournament size. About 50 lines of Python.

ML bots with personality. Each bot uses a different sklearn algorithm matched to its personality:

Bot	Model	Behavior
Honker	LogisticRegression	Conservative, small positions
Maverick	GradientBoostingClassifier (200 trees)	Aggressive, big bets
Contrarian	RandomForest on inverted labels	Literally learns the opposite
Momentum	KNeighborsClassifier ($k=3$)	Follows recent patterns
Professor	RandomForest (depth-3 cap)	Balanced, usually wins

Models train on 13 engineered features: temperature gradients between northern and southern stations, daylight trends, migration velocity, days since first cold snap, etc. Each bot has its own cooldown, position sizing, and divergence threshold tuned to its personality.

Real data pipeline. We pull historical temperature, wind, and precipitation from NOAA's Climate Data Online API across six stations from Thunder Bay to New Orleans. Data is cached locally, so the simulation runs instantly after the first fetch. The migration data uses a biology-driven synthetic generator (flocks respond to temperature drops and daylight cues) with an eBird API integration as the "real data" fallback, which means that if an API key is obtained, it would be simple to make the migration patterns fully accurate.

Stack: Python FastAPI backend with SQLite event log and WebSocket replay. Next.js + React + TypeScript frontend with SVG charts, an animated migration map, and risk-adjusted Sharpe-like scoring. Docker for deployment. Vercel for the frontend, VPS for the backend.

Challenges we ran into

The LMSR was too generous. Initial calibration at $b = 0.02 \cdot n \cdot \text{capital}$ let every bot profit because the market maker absorbed too much loss. We had to tune $b$ up and enforce minimum divergence thresholds (bots only trade when their prediction diverges from market price by ($> 0.10$) so the LMSR spread wasn't free money.

Bots traded themselves to death. Before rate-limiting, Maverick made 647 trades per season and lost \$9,000 to the LMSR spread. We added personality-specific cooldowns (3–7 days) and reduced max quantities. Post-fix: 20–100 trades per season with realistic P&L.

Training data was degenerate. Our ML models initially saw all-zero labels for certain contract types across training seasons; the synthetic migration generator was too deterministic. We had to vary flock counts per season and recalibrate outcome thresholds so the models actually learned.

Bots couldn't sell. The first exchange implementation only supported buying. Bots accumulated positions with no exit strategy. We added sell support (reverse LMSR trades) and gave each bot personality-specific selling rules: Honker exits cleanly, Maverick is stubborn, Contrarian takes profit when the market agrees with him.

Frontend P&L was wrong. The leaderboard showed bot_balance_after from the last trade, but never added settlement credits. Bots looked like they were bleeding money. Fixed by tracking settlement payouts client-side from contract_settlement events.

NOAA API timeouts. Rate-limited at 5 req/sec with occasional 30s timeouts. We built aggressive local caching so subsequent runs skip the API entirely.

Accomplishments that we're proud of

Real sklearn .fit() and .predict_proba() calls, not weighted scoring pretending to be ML
Real NOAA weather data, cached and cross-referenced with real goose biology
LMSR market maker with auto-calibrated liquidity in ~50 lines of Python
Bots with genuinely different trading behaviour, not just different names
Contrarian: a RandomForest trained on inverted labels that literally bets against consensus and sometimes crushes everyone
A fully replayable event log that powers the live trading floor, migration map, leaderboard, and commentary from one data structure

What we learned

Prediction markets are softmax. The LMSR pricing formula is identical to softmax in neural networks. Market microstructure and ML are the same math wearing different hats.

Spread is real. Even synthetic markets have transaction costs. Bots that trade too often lose to spread, regardless of prediction quality.

ML models diverge when trained on the same data. Five different algorithms on the same 13 features produced genuinely different predictions. GradientBoosting overfits. LogisticRegression hedges toward $0.5$. RandomForest depth caps matter.

What's next for GooseThink

Real eBird integration at scale: swap synthetic migration for actual Canada Goose observations across 8 flyway regions
Neural network bots: add a small transformer that predicts migration from time-series weather embeddings
Multi-species markets: Snow Geese, Mallards, Sandhill Cranes. Cross-species arbitrage contracts.
Human vs bot leagues: tournaments where people compete against the AI bots over a full season of real-world data
Climate prediction markets: the real application. Migration timing is a leading indicator of climate shifts. A prediction market on ecological events could aggregate forecasts the way sports markets aggregate game outcomes.