-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Through preparing new functions, I've discovered that the conversion of nested arrays in Rust to numpy in Python can be a bottleneck, especially for wider arrays.
So I've looked for an alternative, and found that ndarray, in comparison with rust-numpy (https://github.com/PyO3/rust-numpy) could provide some nice speedups. For some of these functions, it is even faster to use ndarray in Rust, but it seems to mostly provide little-to-no-change across standard functions (e.g., sampling, ancestry).
I show here benchmarks for sample_matrix, get_ancestry, pre_precision (a function in preparation), and sample_vector (which could potentially be converted to an Array1).
Code is available here: https://github.com/Neclow/phylo2vec/tree/f5f60d37b8dfd2b30e6afe6be64a527a9fec129e
Python benchmarks:
Rust benchmarks:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from tqdm import tqdm
import phylo2vec._phylo2vec_core as core
def bench(f_old, f_new, f_input, mul):
data = {}
for i in tqdm(range(1, 10)):
n_leaves = mul * i
inp = f_input(n_leaves)
result_old = %timeit -q -o f_old(inp)
result_new = %timeit -q -o f_new(inp)
data[i] = {
"vec": result_old.average,
"ndarray": result_new.average,
}
df = (
pd.DataFrame.from_dict(data, orient="index").mul(1_000_000).reset_index(names="n_leaves")
)
df["n_leaves"] = df["n_leaves"].astype(int) * mul
return df
def plot_bench(df, name, logy=False):
df_melt = df.melt(
id_vars=["n_leaves"],
var_name="Implementation",
value_name="Time (us)",
)
sns.scatterplot(
data=df_melt,
x="n_leaves",
y="Time (us)",
hue="Implementation",
)
if logy:
plt.yscale("log")
plt.title(name)
plt.show()Hardware specs
I've attached all the code in https://github.com/Neclow/phylo2vec/tree/ndarray_vs_vec
Specs:
Linux Distribution: Ubuntu 22.04.4 LTS
Linux Kernel: 5.15.0-142-generic
Computer Model: ASUS System Product Name System Version
Processor (CPU): AMD Ryzen Threadripper PRO 5995WX 64-Cores
CPU Sockets/Cores/Threads: 1/64/128
CPU Caches: L1d: 32 KiB × 64 (2.0MiB)
L1i: 32 KiB × 64 (2.0MiB)
L2: 512 KiB × 64 (32MiB)
L3: 32,768 KiB × 8 (256MiB)
Architecture: x86_64 (64-bit)
Total memory (RAM): 257,536 MiB (252GiB) (270,046 MB (271GB))
Total swap space: 8,191 MiB (8.0GiB) (8,589 MB (8.6GB))