Mar 5 update: is now open. Tell us what APE is doing wrong, suggest research directions, or critique a paper.

Can we automate
policy evaluation?

AI may soon be capable of producing rigorous economic research. If that happens, policy evaluation could scale dramatically: highlighting what works, what fails, and what harms, far faster than human researchers alone.

We want to find out whether an autonomous system can generate, replicate, and revise empirical policy research, with everything made public.

This is an experiment in building reliable AI research systems. For a global overview, click here.

2,464Ideas+342 this week
982Papers+217 this week
17k+Matches
4%Win Rate

Last updated: April 8, 2026

Most policies — probably millions of them globally — are never rigorously evaluated. Data is plenty but there aren't enough researchers. Could AI help? We genuinely don't know. So we're running an experiment. An AI system attempts to produce economics research at scale, , using publicly available data. Will any be good? How would we even know? Ideally, we'd want PhDs or editors of top journals to evaluate all of them. But they are busy. We run an automated tournament evaluating the papers against human benchmarks from top journals. This could help triage. Get to a "you know it when you see it" moment, faster. Most importantly, everything is : papers, code, data, failures. The more people look, the faster mistakes get caught. And we want feedback! In fact, the core thesis is that recursive self-improvement is possible and can be enhanced by human feedback. The next milestone: generate a 1000 papers, evaluate, and share lessons in a report. Can policy evaluation be automated? Or is hallucinated slop unavoidable? Let's find out!

⚠️ Warning: We are learning how to build a reliable, autonomous research system. Expect bugs, errors, hallucinations, and trashable papers. None of the generated papers have been peer-reviewed and should not be used for evidence-based policy making.
What does "autonomous" even mean?



How the Tournament Works

Ranking Metrics

Review Status

Swipe to see more columns

Rank 48hRank change over the last 48 hours.Paper μEstimated skill rating (μ). Higher values indicate better research quality based on pairwise comparisons. σUncertainty (σ). Lower values mean higher confidence in the rating. Cons.Conservative Rating (μ - 3σ), adjusted for integrity penalties. Used for ranking. EloElo rating. Standard chess-like rating where 400 points difference = 90% win probability. MPMatches Played. Valid head-to-head comparisons, excluding annulled matches against papers flagged with severe issues during automated code review. Status✅ Peer reviewed · 🔎 Awaiting review · 🧐 Issues detected · 🚫 Critical errors
140.01.735.02102408
237.61.632.92004366
335.31.231.71911405
435.01.231.51902416
535.41.431.41917362
6234.61.131.11883419
734.51.231.01879364
8234.61.231.01885350
9334.01.230.41862373
10133.81.130.41853399
11133.51.130.11841399
12133.51.230.01841376
13233.51.230.01840344
1433.01.129.81822387
1532.71.129.51809377
16332.71.129.41810404
1732.41.029.31796429
1832.21.129.01788421
19432.21.129.01789452
20732.21.129.01788422
21132.01.029.01781373
22132.11.029.01783391
23531.71.028.71769466
2431.71.128.41767100
251031.61.128.31763398
26131.51.128.1176094
27331.11.028.11744414
28231.11.028.11743427
291131.11.028.11744409
30230.81.027.81733394
31730.61.027.71725404
321429.71.026.81689430
33330.01.126.81698110
34329.81.026.71690391
35929.40.926.61677447
36929.41.026.61678441
37429.10.926.31664488
38429.71.125.9168968
393628.10.925.41625477
40934.53.025.3188014
411128.00.925.31619531
42733.83.025.0185416
432737.74.225.0200810
44628.41.224.9163480
451927.50.924.81600526
461028.61.424.6164668
472737.04.224.5198012
481534.23.324.4186912
49427.00.924.31581483
50428.61.524.1164558
512236.64.224.1196312
52628.31.424.1163054
53627.81.224.1161278
54127.51.224.0160274
55833.73.423.6184810
561226.91.123.61575104
571526.20.923.61547560
5825
AEJ: Policy
26.00.923.41541540
59828.71.623.4164660
601233.03.223.4182010

Total tokens used for tournament (excludes paper generation tokens): 1,434,101,596