This project benchmarks matrix multiplication across four different implementations:
- Vanilla Python
- NumPy (NumPy isn't Python)
- Vanilla Rust
- Parallel Rust with Rayon
The focus is on highlighting raw performance, memory usage, and scalability on multi-core systems.
- ❗ NumPy is not really Python: It leverages compiled C and Fortran libraries (like BLAS/MKL), uses CPU caching, SIMD instructions, and multithreading under the hood.
- Vanilla Rust, despite being compiled, can be slower than NumPy for small matrices due to:
- Memory access latency
- Lack of SIMD by default
- Thread underutilization
- CPU usage often capped around 10–12% for single-threaded operations
- Memory Efficiency:
- Rust (both sequential and parallel) uses minimal memory, often staying within a few hundred MB even at large matrix sizes.
- NumPy and Vanilla Python can use 2–3 GB of RAM for large matrix operations due to temporary arrays, caching, and less efficient memory layouts.
- NumPy’s reported memory usage is often misleading, as it doesn’t account for cached memory used internally by libraries and the L1, L2, L3 cache abuse.
- Parallel Rust (via
rayon) scales beautifully:- Utilizes all CPU cores
- Near-linear speedups on larger matrices (e.g., 1000×1000 and above)
- Infinitely scalable with cores and threads
- CPU: 8-core Ryzen 7 5800H
- Matrix Sizes: 100×100 to 5000×5000
- Languages: Python (Basic & NumPy), Rust (Sequential & Parallel)
This project demonstrates that raw language choice isn't everything—what matters is how well the code leverages the hardware, whether through compiled libraries, parallelism, or memory efficiency. NumPy achieves its performance by using optimized native libraries, while Rust offers true parallel scalability with full control and low-level memory efficiency, making it ideal for high-performance computing on modern multi-core systems. Parallalization with RUST becomes more and more efficient with the increase in processing cores.
| Scenario | Winner |
|---|---|
| Naïve Rust vs NumPy | NumPy 🏆 |
| Optimized Parallel Rust (Rayon + Blocking) vs NumPy | Rust 🏆 |
| Memory-bound, non-parallel Rust | NumPy 🏆 |
| All-core Rust on large, unique data | Rust 🏆 |



