Create a benchmark for LibLinear/LibSVM to quantify past and future improvements to the C code

Following PR #13511 it appears that there is not reference benchmark for SVMs in scikit-learn or in any side-project (sklearn-contrib). 

This seems quite risky on the long run, maybe we should create one - especially to quantify the impact of changes to C code such as in PR #13511 .

I have been working quite a bit on this topic of creating reference benchmarks in the past years, leading to the creation of tools in the pytest ecosystem: [`pytest-cases`](https://smarie.github.io/python-pytest-cases/) and [`pytest-harvest`](https://smarie.github.io/python-pytest-harvest/), with a beginning of tutorial [here](https://smarie.github.io/pytest-patterns/examples/data_science_benchmark/) (outdated I'm afraid). I can therefore certainly try to help with a benchmark framework structure if you find such an idea interesting. 

However I do not know a good set of reference datasets to start with (apart from creating challenging ones "by hand").


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a benchmark for LibLinear/LibSVM to quantify past and future improvements to the C code #16864

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Create a benchmark for LibLinear/LibSVM to quantify past and future improvements to the C code #16864

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions