Hone vs. The 1 Billion Row Challenge

A baseline Python implementation of the 1 Billion Row Challenge, used as a starting point for automated performance optimization with Hone.

The challenge: parse a text file with one billion rows of weather station measurements and compute the min, mean, and max temperature for each station.

Repository Structure

hone-1brc/
├── data/
│   ├── weather_stations.txt   # Station list with mean temperatures (~45k stations)
│   └── measurements_*.txt     # Generated input files (not committed)
├── prepare.py                 # Generates the measurements input file
├── solution.py                # Computes min/mean/max per station
└── benchmark.py               # Times solution.py for use with Hone

Setup

Install Hone

pip install hone-ai

Solution dependencies

No dependencies beyond the Python standard library.

1. Generate the measurements file

# Generate 1 billion rows (the full challenge)
python prepare.py 1000000000 data/measurements_1B.txt

# Generate a smaller dataset for testing
python prepare.py 1000000 data/measurements_1M.txt
python prepare.py 100000000 data/measurements_100M.txt

Output defaults to data/measurements.txt if no filename is given.

2. Run the solution

python solution.py data/measurements_1M.txt

Example output:

{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.4, ...}

Input Format

Each line in the measurements file follows the format:

StationName;Temperature

Constraints:

Up to 10,000 unique station names
Station names: 1–100 bytes
Temperatures: -99.9 to 99.9 (always one decimal place)

Benchmarking with Hone

benchmark.py times solution.py end-to-end and prints the elapsed time in a format Hone can parse.

# defaults to data/measurements.txt
python benchmark.py

# explicit input file
python benchmark.py data/measurements_1M.txt
# Time Taken: 0.569

To run Hone against the baseline:

hone \
     --goal-file program.md \
     --bench "python /Users/rathla/workspace/hone-1brc/benchmark.py /Users/rathla/workspace/hone-1brc/data/measurements_1B.txt" \
     --files "solution.py" \
     --optimize lower \
     --score-pattern "Time Taken:\s*(\d+\.\d+)" \
     --budget 3.0 \
     --max-iter 50 \
     --model claude-haiku-4-5

Approach

The baseline solution is intentionally naive — no tricks, no concurrency, no memory mapping. It reads the file line by line, accumulates stats in a plain dict, and prints sorted results. This establishes the performance floor that Hone will attempt to optimize automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.hone/runs		.hone/runs
data		data
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
prepare.py		prepare.py
program.md		program.md
solution.py		solution.py
solution_baseline.py		solution_baseline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hone vs. The 1 Billion Row Challenge

Repository Structure

Setup

Install Hone

Solution dependencies

1. Generate the measurements file

2. Run the solution

Input Format

Benchmarking with Hone

Approach

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hone vs. The 1 Billion Row Challenge

Repository Structure

Setup

Install Hone

Solution dependencies

1. Generate the measurements file

2. Run the solution

Input Format

Benchmarking with Hone

Approach

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages