Running

LangArena: A Balanced Programming Language Benchmark Suite

LangArena is a collection of 50+ diverse benchmarks designed for a realistic, apples-to-apples comparison of programming language performance. The goal is not to find the ultimate winner in micro-optimizations, but to evaluate how well each language's compiler or runtime optimizes clean and readable code.

Results Page

Results

Origin & Approach

The suite started with my original implementation in Crystal. AI tools assisted in translating it to other languages. Throughout this process, I reviewed and edited the implementation for semantic correctness and logical consistency to ensure idiomatic accuracy and fair benchmarking. Not all algorithms could be implemented identically across all languages — simply because the languages are too different (this is particularly true for base64 and JSON tests). However, I made every effort to make the implementations as similar as possible to each other. Handling Library Differences: To address performance differences stemming from varying standard library implementations, I created a special tab in the results — Runtime Score. This metric normalizes execution times (seconds) into a 0–100 scoring system, where 50 represents the average performance across all languages. The overall Runtime Score is calculated as the average across all benchmarks. This approach reduces the impact of outliers and ensures a fair overall assessment: a language that excels in most tasks but struggles with one particular library implementation (like JSON parsing) isn't severely penalized. It reflects the real-world scenario where developers use a mix of algorithms and libraries.

Sources: Benchmark ideas were taken from:

The Computer Language Benchmarks Game
My own collections: benchmarks, jit-benchmarks, crystal-benchmarks-game, crystal-metric
Crystal code samples

Core Philosophy

Clean Code: Benchmarks are written in a clear, idiomatic style that prioritizes readability and maintainability.
Algorithmic Consistency: The same core algorithm is implemented across all languages for each task to ensure a fair comparison.
Standard vs Unsafe Modes: All benchmarks use standard production compiler flags for each language (safe mode by default). However, we also provide a separate "Hacking" section that compares performance with specialized unsafe flags (like disabling bounds checks, removing runtime checks, or other language-specific optimizations that trade safety for speed). This shows what's possible when you prioritize performance over safety guarantees.
Testing Language "Muscle": We measure the cost of abstractions. Can a language take clean, idiomatic code and optimize it to efficient machine code? Languages that can (like Rust, Java) prove their compilers are powerful. Languages that can't show the honest price of their abstractions. Benchmarks like matrix multiplication use naive implementations intentionally. We're not measuring how fast a language can call a C library (like BLAS via numpy), but how efficiently it handles fundamental computational patterns — because one day you'll have to write that loop yourself.
Pull Requests Welcome: While consistency is key, improvements that maintain the philosophy and fix suboptimal implementations are encouraged.

Benchmarking Methodology

Each benchmark's execution time is measured in isolation, with data preparation excluded from timing. The suite includes a separate warmup phase for JIT-based languages (C#, Java, Julia, etc.) to allow compilation and optimization before measurements begin. This ensures fair comparisons by measuring steady-state performance where applicable, while still capturing cold-start characteristics for AOT-compiled languages. All benchmarks produce verifiable checksums to ensure algorithmic correctness across implementations.

Benchmark Categories

The benchmarks cover common practical tasks:

JSON Processing: Parsing and generation
Data Encoding: Base64 encoding/decoding
Text Processing: Regex matching, string manipulation
Cryptography & Hashing: SHA-256, CRC32
Sorting Algorithms: Quick sort, merge sort
Graph Algorithms: BFS, DFS, Dijkstra, A* pathfinding
Mathematical Computations: Matrix multiplication, prime calculation, spectral norm
Simulations: N-body, Game of Life, neural network
Classic Benchmarks: Binary trees, Fannkuchredux, Mandelbrot (from Computer Language Benchmarks Game)

Evaluated Languages

The suite currently focuses on compiled and high-performance managed languages: C, C++, Crystal, Rust, Go, Swift, C#, Java, Kotlin, TypeScript, Zig, D, V, Julia, Nim, F#, Dart, Python, Odin, Scala.

Languages like Python, Ruby, or PHP are intentionally excluded to maintain a focused comparison within a similar performance bracket.

Beyond Just Ranking

This suite is also a practical tool for:

Compiler Tracking: Monitor performance regressions/improvements across compiler versions.
New Language Evaluation: Get a standardized "score" to position a new language against established ones.

Hardware

AMD Ryzen 7 3800X 8-Core Processor 78GB (x86_64-linux-gnu)

Running

Without docker:

cd rust
./test [BenchName]
./run [BenchName]

With docker:

Require docker-compose-plugin v2, check if it installed: run docker compose version, version should be v2.xxx. Or install it.

docker compose build rust
docker compose run rust
./test [BenchName]
./run [BenchName]

Run all benchmarks

sh build-docker.sh
ruby benchmarks.rb

Generate Website

cd docs
ruby gen.rb ../results/2026-02-02-x86_64-linux-gnu.js
open index.html

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
c		c
cpp		cpp
crystal		crystal
csharp		csharp
d		d
dart		dart
docker		docker
docs		docs
fsharp		fsharp
golang		golang
java		java
julia		julia
kotlin		kotlin
nim		nim
odin		odin
python		python
results		results
rust		rust
scala		scala
swift		swift
typescript		typescript
v		v
zig		zig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmarks.rb		benchmarks.rb
build-docker.sh		build-docker.sh
docker-compose.yaml		docker-compose.yaml
fmt		fmt
pc_specs.py		pc_specs.py
run.js		run.js
test.js		test.js
xtime.rb		xtime.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangArena: A Balanced Programming Language Benchmark Suite

Results Page

Origin & Approach

Core Philosophy

Benchmarking Methodology

Benchmark Categories

Evaluated Languages

Beyond Just Ranking

Hardware

Running

Without docker:

With docker:

Run all benchmarks

Generate Website

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LangArena: A Balanced Programming Language Benchmark Suite

Results Page

Origin & Approach

Core Philosophy

Benchmarking Methodology

Benchmark Categories

Evaluated Languages

Beyond Just Ranking

Hardware

Running

Without docker:

With docker:

Run all benchmarks

Generate Website

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages