ENH: Add Array API support to centralize and vlr by LarytheLord · Pull Request #2429 · scikit-bio/scikit-bio

LarytheLord · 2026-03-19T09:27:49Z

centralize() and vlr() are the last two composition stats functions that still use
hard-coded NumPy / SciPy calls. This brings them in line with closure, clr, ilr,
alr, and their inverses, which already go through ingest_array.

What changed

centralize()

Replaced the scipy.stats.gmean import with exp(mean(log(x))), which only uses
operations available in the Array API standard.
Uses ingest_array → xp namespace like the rest of the module.
Added a validate parameter (consistent with closure, clr, etc.).

vlr() / _vlr()

The non-robust path now receives the xp namespace from _closure_two and computes
log-ratio variance using xp.log, xp.mean, xp.sum.
The robust=True path stays on NumPy (np.ma masked arrays have no Array API
equivalent). The docstring notes this explicitly.

Tests

Added Tests_Centralize and Tests_VLR classes in test_ndarray.py with NumPy,
PyTorch, and JAX backend checks (same pattern as Tests_Closure, Tests_CLR, etc.).

What I verified locally

$ python -m pytest skbio/stats/composition/tests/test_base.py \
                   skbio/stats/composition/tests/test_ndarray.py -q
40 passed, 40 skipped

(Torch/JAX/CUDA tests are skipped locally — they'll run in CI where those deps are available.)

Convert centralize() and vlr() in the composition module to work with any array backend that follows the Python Array API standard (NumPy, CuPy, JAX, PyTorch, etc.). centralize() previously imported scipy.stats.gmean, which tied it to NumPy. The geometric mean is now computed as exp(mean(log(x))) using only operations available in the Array API standard. vlr() previously called np.log and np.var directly. The non-robust path now accepts arrays from any backend via ingest_array and _closure_two. The robust path (masked arrays) stays on NumPy since np.ma has no Array API equivalent. Both functions gain the same ingest_array / xp-namespace pattern used by closure, clr, ilr, alr, and their inverses. Tests added in test_ndarray.py for NumPy, PyTorch, and JAX backends.

sfiligoi · 2026-03-20T14:55:10Z

-    from scipy.stats import gmean
-
-    mat = closure(mat)
-    cen = gmean(mat, axis=0)


Does this have any impact on performance?
Both in terms of runtime and memory usage?

Especially when input is nparray.

LarytheLord · 2026-03-20T17:21:45Z

For NumPy inputs the overhead is negligible. ingest_array on an ndarray just returns the numpy namespace so no copy, no conversion.

The gmean replacement is actually faster because scipy.stats.gmean adds input coercion and NaN policy handling on top of the same exp(mean(log(x))) formula. Benchmarked on a (1000, 500) composition matrix:

scipy.stats.gmean: 7.48 ms
exp(mean(log(x))): 1.59 ms (~4.7x faster)
Max abs diff: 0.00e+00

Memory footprint is the same, both allocate one temporary from log(). The composition-specific checks (no negatives, no NaN, no all-zero rows) are already handled upstream by _check_composition when validate=True

sfiligoi · 2026-03-20T17:25:53Z

Can you benchmark something bigger?
Say 30k x 30k?

LarytheLord · 2026-03-20T18:47:47Z

At 30k x 30k:

exp(mean(log(x))): 53.9 s, 7.2 GB peak
scipy.stats.gmean: 173.2 s, 14.4 GB peak

~3.2x faster at this scale, half the memory. The speedup is larger here than at mid-range sizes (~1.3x at 5k x 5k) because scipy's decorator machinery allocates a full extra copy of the input, which becomes the bottleneck once the matrix doesn't fit comfortably in cache. Numerical diff remains exactly 0.

sfiligoi · 2026-03-22T15:30:33Z

-        return _robust_vlr(**kwargs)
-    else:
-        return _vlr(**kwargs)
+        # robust path uses NumPy masked arrays; keep on numpy


Keeping the if... else.. pattern would be preferred.

sfiligoi · 2026-03-22T15:32:12Z

I am impressed it is faster on the CPU this way, and it should definitely benefit on the GPU.

@mataton What do you think about the added tests?
How do they fit in the changes you are are already working on?

sfiligoi reviewed Mar 20, 2026

View reviewed changes

sfiligoi reviewed Mar 22, 2026

View reviewed changes

STY: use explicit else branch in vlr robust/xp dispatch

cff1ea5

LarytheLord mentioned this pull request Mar 28, 2026

ENH: Add Array API support to center_distance_matrix #2436

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add Array API support to centralize and vlr#2429

ENH: Add Array API support to centralize and vlr#2429
LarytheLord wants to merge 2 commits intoscikit-bio:mainfrom
LarytheLord:enh/array-api-centralize-vlr

LarytheLord commented Mar 19, 2026

Uh oh!

sfiligoi Mar 20, 2026

Uh oh!

LarytheLord commented Mar 20, 2026

Uh oh!

sfiligoi commented Mar 20, 2026

Uh oh!

LarytheLord commented Mar 20, 2026

Uh oh!

sfiligoi Mar 22, 2026

Uh oh!

sfiligoi commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LarytheLord commented Mar 19, 2026

What changed

What I verified locally

Uh oh!

sfiligoi Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

LarytheLord commented Mar 20, 2026

Uh oh!

sfiligoi commented Mar 20, 2026

Uh oh!

LarytheLord commented Mar 20, 2026

Uh oh!

sfiligoi Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

sfiligoi commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants