A baseline Python implementation of the 1 Billion Row Challenge, used as a starting point for automated performance optimization with Hone.
The challenge: parse a text file with one billion rows of weather station measurements and compute the min, mean, and max temperature for each station.
hone-1brc/
├── data/
│ ├── weather_stations.txt # Station list with mean temperatures (~45k stations)
│ └── measurements_*.txt # Generated input files (not committed)
├── prepare.py # Generates the measurements input file
├── solution.py # Computes min/mean/max per station
└── benchmark.py # Times solution.py for use with Hone
pip install hone-aiNo dependencies beyond the Python standard library.
# Generate 1 billion rows (the full challenge)
python prepare.py 1000000000 data/measurements_1B.txt
# Generate a smaller dataset for testing
python prepare.py 1000000 data/measurements_1M.txt
python prepare.py 100000000 data/measurements_100M.txtOutput defaults to data/measurements.txt if no filename is given.
python solution.py data/measurements_1M.txtExample output:
{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.4, ...}
Each line in the measurements file follows the format:
StationName;Temperature
Constraints:
- Up to 10,000 unique station names
- Station names: 1–100 bytes
- Temperatures: -99.9 to 99.9 (always one decimal place)
benchmark.py times solution.py end-to-end and prints the elapsed time in a format Hone can parse.
# defaults to data/measurements.txt
python benchmark.py
# explicit input file
python benchmark.py data/measurements_1M.txt
# Time Taken: 0.569To run Hone against the baseline:
hone \
--goal-file program.md \
--bench "python /Users/rathla/workspace/hone-1brc/benchmark.py /Users/rathla/workspace/hone-1brc/data/measurements_1B.txt" \
--files "solution.py" \
--optimize lower \
--score-pattern "Time Taken:\s*(\d+\.\d+)" \
--budget 3.0 \
--max-iter 50 \
--model claude-haiku-4-5The baseline solution is intentionally naive — no tricks, no concurrency, no memory mapping. It reads the file line by line, accumulates stats in a plain dict, and prints sorted results. This establishes the performance floor that Hone will attempt to optimize automatically.