Skip to content

Better sample_weight support in Ridge #1190

@mblondel

Description

@mblondel

Currently only the dense_cholesky solver in Ridge supports sample_weight. To support it consistently in all solvers one can use the following trick (extract from my post on the ML):

We want to minimize \sum_i mu_i (w^T x_i - y_i)^2 where mu_i is the sample weight. This should be equivalent to \sum_i (sqrt(mu_i) w^T x_i - sqrt(mu_i) y_i)^2. So, we obtain the same result by multiplying each y_i and x_i by sqrt(mu_i).

In the dense case, it is trivial to implement but in the sparse case there's a bit of work to do as scipy sparse matrices do not support element-by-element multiplication with a vector (here the vector size is equal to n_samples). One should add an inplace_csr_row_scale utility to sparsefuncs.pyx.

The test coverage of sample_weight needs to be greatly improved too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementModerateAnything that requires some knowledge of conventions and best practices

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions