Skip to content

ARD Regressor accuracy degrades when upgrading Scipy 1.2.1 -> 1.3.0 #14055

@timstaley

Description

@timstaley

Hi,
bit of a tricky one, I'm hoping someone will have some time and/or suggestions for further investigation!

There seems to be an often-occurring worsening of performance (i.e. accuracy, although run-time increases too!) from the ARD regressor when upgrading from Scipy 1.2.1 -> 1.3.0.

Description

On a very simple dataset (see code snippets below) where a near-perfect fit should be achievable, typical error seems to degrade from order 1E-5 to 1E-2. Notably, convergence iterations seem to increase also from ~a few (~5) to around 50-200 iterations.

Here's the headline plot, plotting absolute co-efficient error when fit across 1000 datasets generated with different random seeds:
coeff_abs_error_histograms

Note how with Scipy==1.2.1, errors are largely constrained to <0.01, while with Scipy==1.3.0 they range up to 0.05 (and in a few rare cases the algorithm produces garbage results, see later).

I guess this could be (probably is?) a Scipy rather than Sklearn issue, but probably the only way to confirm / isolate that would be to start here.

It's also possible that this worsening of behaviour is a weirdness of my particular toy example, but the difference in behaviour seems large and unexpected enough to warrant further investigation, I'd hope!

Steps/Code to Reproduce

Single Seed:

OK, so here's a short snippet on just a single seed if you're curious to try this yourself. I'm generating three vectors of normally distributed values, 250 samples. Then the target is just a perfect copy of one of those vectors (index=1). We measure the accuracy of the fit by simply checking how close that coefficient is to 1.0 (the other coefficients always shrink to 0., as you'd hope):

import scipy
import sklearn
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import ARDRegression

sklearn.show_versions()

def test_ard_regressor(dataset: np.ndarray) -> float:
    X = dataset
    y = X[:,1]
    regressor = ARDRegression(verbose=True)
    regressor.fit(X, y)
    abs_coef_error = np.abs(1 - regressor.coef_[1])
    print(abs_coef_error)
    return abs_coef_error

size=250
X = np.random.RandomState(seed=45).normal(size=(size,3))

test_ard_regressor(X)

Results

Scipy 1.2.1:

python single_seed.py 

System:
    python: 3.6.7 (default, Oct 22 2018, 11:32:17)  [GCC 8.2.0]
executable: /home/staley/.virtualenvs/sklearn-bug-scipy-1.2.1/bin/python
   machine: Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic

BLAS:
    macros: HAVE_CBLAS=None
  lib_dirs: /usr/lib/x86_64-linux-gnu
cblas_libs: openblas, openblas

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.2
     numpy: 1.16.4
     scipy: 1.2.1
    Cython: None
    pandas: None
Converged after 4 iterations
9.647701516568574e-07

Scipy 1.3.0

python single_seed.py 

System:
    python: 3.6.7 (default, Oct 22 2018, 11:32:17)  [GCC 8.2.0]
executable: /home/staley/.virtualenvs/sklearn-bug-scipy-1.3/bin/python
   machine: Linux-4.15.0-47-generic-x86_64-with-Ubuntu-18.04-bionic

BLAS:
    macros: HAVE_CBLAS=None
  lib_dirs: /usr/lib/x86_64-linux-gnu
cblas_libs: openblas, openblas

Python deps:
       pip: 19.1.1
setuptools: 41.0.1
   sklearn: 0.21.2
     numpy: 1.16.4
     scipy: 1.3.0
    Cython: None
    pandas: None
Converged after 18 iterations
0.16538104739325354

Datasets from 1000 different seeds

It could be that there's some oddity of the random data from a single seed, so I set up some short scripts to first generate a static collection of 1000 of the datasets as seen above, then collate the results from both versions of scipy. The snippets are as follows:

Make data:

import numpy as np
size=250
random_datasets = {seed: np.random.RandomState(seed).normal(size=(size,3)) 
                   for seed in range(1000)}
np.savez('random_datasets.npz', data=list(random_datasets.values()), seeds=list(random_datasets.keys()))

Test sklearn:

import scipy
import sklearn
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import ARDRegression

random_datasets = np.load('random_datasets.npz')
random_datasets=dict(zip(random_datasets['seeds'], random_datasets['data']))

def test_ard_regressor(dataset: np.ndarray) -> float:
    X = dataset
    y = X[:,1]
    regressor = ARDRegression(verbose=True)
    regressor.fit(X, y)
    abs_coef_error = np.abs(1 - regressor.coef_[1])
    print(abs_coef_error)
    return abs_coef_error

results = []
for seed, data in random_datasets.items():
    print("Seed:",seed)
    results.append(test_ard_regressor(data))

np.save(f'scipy_{scipy.__version__}_results', np.array(results))

Plot results:

import numpy as np
import matplotlib.pyplot as plt

results_1_2_1 = np.load("./scipy_1.2.1_results.npy")
results_1_3_0 = np.load("./scipy_1.3.0_results.npy")

counts, bin_edges = np.histogram(results_1_2_1)

ax = plt.gca()
ax.hist(results_1_2_1, label="scipy==1.2.1", alpha=0.5, bins=bin_edges)
ax.hist(results_1_3_0, label="scipy==1.3.0", alpha=0.5, bins=bin_edges)
# ax.set_xlim(0, 1.0)
ax.legend()
plt.show()

A little investigating summary statistics of those datasets in notebook gives the following points of comparison:

> np.median(results_1_2_1)
1.1909624002770514e-05
> np.median(results_1_3_0)
0.008368892887510193

>np.percentile(results_1_2_1, 99)
0.03166983391537859
>np.percentile(results_1_3_0, 99)
0.16551247976283737


> results_1_2_1.max()
0.08478086928684647
>results_1_3_0.max()
46606.5545533851 

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugModerateAnything that requires some knowledge of conventions and best practicesSprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions