Skip to content

sample weight support for robust regression via weighted percentile algo#10

Merged
pprett merged 5 commits intogbrt-sample-weightfrom
gbrt-sample-weight-weighted-percentile
Sep 17, 2014
Merged

sample weight support for robust regression via weighted percentile algo#10
pprett merged 5 commits intogbrt-sample-weightfrom
gbrt-sample-weight-weighted-percentile

Conversation

@pprett
Copy link
Copy Markdown
Owner

@pprett pprett commented Sep 15, 2014

No description provided.

@pprett pprett force-pushed the gbrt-sample-weight-weighted-percentile branch from 33052ba to f07f8ad Compare September 15, 2014 09:32
@pprett
Copy link
Copy Markdown
Owner Author

pprett commented Sep 15, 2014

@arjoly @glouppe @ogrisel here is a branch that supports sample_weights also for robust regression. If you are fine with it I merge it into my sample_weight PR

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this could go in utils.stats.

What do you think of working with quantile instead of percentile?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjoly what exactly do you propose? changing the name percentile to quantile and using fractions instead of 0-100 ?

I agree that would be nicer -- I did it like this to be consistent with scipy.stats.mstats.scoreatpercentile

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you proposes, I would rename the function.

@arjoly
Copy link
Copy Markdown

arjoly commented Sep 15, 2014

Do you have tests for this?

@arjoly
Copy link
Copy Markdown

arjoly commented Sep 15, 2014

@arjoly @glouppe @ogrisel here is a branch that supports sample_weights also for robust regression. If you are fine with it I merge it into my sample_weight PR

It's ok for me if you add support for this feature.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no test for the robust regression losses with sample weights, right?

@ogrisel
Copy link
Copy Markdown

ogrisel commented Sep 15, 2014

+1 for adding those features to the sample_weight PR but this need to be properly tested.

This would be interesting at some point to evaluate the use of models that support sample_weight to leverage co-variate shift corrections as implemented in http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction .

A new example covariate shift correction would be great. Although probably not the for GB w/ sample_weight PR itself.

@pprett
Copy link
Copy Markdown
Owner Author

pprett commented Sep 17, 2014

added robust regression tests for boston housing and some tests for weighted percentile

@pprett
Copy link
Copy Markdown
Owner Author

pprett commented Sep 17, 2014

@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this? One could use a checkerboard synthetic dataset where one changes P(x) (ie the probability that you draw an example from one of the checkerboard cells)

pprett added a commit that referenced this pull request Sep 17, 2014
…tile

sample weight support for robust regression via weighted percentile algo
@pprett pprett merged commit 8c1a95f into gbrt-sample-weight Sep 17, 2014
@pprett pprett deleted the gbrt-sample-weight-weighted-percentile branch September 17, 2014 09:41
@ogrisel
Copy link
Copy Markdown

ogrisel commented Sep 17, 2014

@ogrisel a covariate shift example would be indeed great -- has anybody a nice dataset for this?

We could use one of the existing datasets and create an artificial train / test split that introduces a shift. For instance we could use the Boston dataset and use in the test set samples with higher tax rate (the TAX feature) with a higher likelihood than in the training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants