TST Replace boston in histgradboost test_predictor by lucyleeow · Pull Request #16918 · scikit-learn/scikit-learn

lucyleeow · 2020-04-14T13:29:02Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Replace boston dataset with ~~diabetes~~ California housing dataset in sklearn/ensemble/_hist_gradient_boosting/tests/test_predictor.py

Any other comments?

Unsure of the best n_bins values/what this is testing. I noticed that the boston features are more spread out and generally has a longer right tail cf the diabetes dataset. Also the R2 values with n_bin 200 and 256 with diabetes were the same.

The R2 values with California housing are:

train: (bins=200; 0.8233 (bins=256; 0.8340)
test: (bins=200; 0.8112) (bins=256; 0.8094)

adrinjalali · 2020-04-14T18:00:14Z

sklearn/ensemble/_hist_gradient_boosting/tests/test_predictor.py

+    assert r2_score(y_train, predictor.predict(X_train)) > 0.69
+    assert r2_score(y_test, predictor.predict(X_test)) > 0.30


aren't these rather low? Same with the other PR you have, maybe using another dataset would be more easonale?

Yes, that is the problem with the diabetes dataset. I have changed to california housing and it seems to work reasonably well with the original bins.

train: (bins=200; 0.8233 (bins=256; 0.8340)

test: (bins=200; 0.8112) (bins=256; 0.8094)

The downside of using fetch_california_housing is that it requires network access, which means we would need to mark these test with @pytest.mark.network.

With some parameter tuning on the diabetes dataset I can get these results:

n_bins=50
train: 0.4253613178731953
test: 0.38498296812822475

n_bins=100
train: 0.4298426536827863
test: 0.3991035630532065

Parameters:
min_samples_leaf = 50
max_leaf_nodes = None

@thomasjpfan and @adrinjalali which dataset do you guys suggest to use?

would a make_regression with some tuned parameters not be a good option?

Good point, I'll try this tomorrow.

ogrisel · 2020-04-22T08:19:43Z

+1 for using make_regression to avoid relying on the network for such tests.

lucyleeow · 2020-04-22T10:16:26Z

Thanks @ogrisel and @adrinjalali. I've amended to make_regression.

adrinjalali

thanks @lucyleeow . This LGTM

thomasjpfan

LGTM thank you @lucyleeow !

lucyleeow added 2 commits April 14, 2020 14:59

replace boston

3590891

lint

39c12f0

github-actions bot added the module:ensemble label Apr 14, 2020

adrinjalali reviewed Apr 14, 2020

View reviewed changes

lucyleeow added 2 commits April 14, 2020 22:40

use cali house

e18c155

tune diabetes

358e1cc

ogrisel mentioned this pull request Apr 22, 2020

Replace boston in ensemble test_forest #16927

Merged

use make regression

72f8797

adrinjalali approved these changes Apr 22, 2020

View reviewed changes

thomasjpfan approved these changes Apr 23, 2020

View reviewed changes

thomasjpfan changed the title ~~Replace boston in histgradboost test_predictor~~ TST Replace boston in histgradboost test_predictor Apr 23, 2020

thomasjpfan merged commit 7844d1c into scikit-learn:master Apr 23, 2020

lucyleeow deleted the test_predictor branch April 24, 2020 11:58

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

TST Replace boston in histgradboost test_predictor (scikit-learn#16918)

14476cc

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

TST Replace boston in histgradboost test_predictor (scikit-learn#16918)

d87ec80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TST Replace boston in histgradboost test_predictor#16918

TST Replace boston in histgradboost test_predictor#16918
thomasjpfan merged 5 commits intoscikit-learn:masterfrom
lucyleeow:test_predictor

lucyleeow commented Apr 14, 2020 •

edited

Loading

Uh oh!

adrinjalali Apr 14, 2020

Uh oh!

lucyleeow Apr 14, 2020

Uh oh!

thomasjpfan Apr 15, 2020

Uh oh!

lucyleeow Apr 21, 2020

Uh oh!

adrinjalali Apr 21, 2020

Uh oh!

adrinjalali Apr 21, 2020

Uh oh!

adrinjalali Apr 21, 2020

Uh oh!

lucyleeow Apr 21, 2020

Uh oh!

ogrisel commented Apr 22, 2020

Uh oh!

lucyleeow commented Apr 22, 2020

Uh oh!

adrinjalali left a comment

Uh oh!

thomasjpfan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		assert r2_score(y_train, predictor.predict(X_train)) > 0.69
		assert r2_score(y_test, predictor.predict(X_test)) > 0.30

Uh oh!

Conversation

lucyleeow commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Apr 22, 2020

Uh oh!

lucyleeow commented Apr 22, 2020

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucyleeow commented Apr 14, 2020 •

edited

Loading