[MRG] Fix fetch_openml when ignore attributes are numeric#12330
[MRG] Fix fetch_openml when ignore attributes are numeric#12330amueller merged 9 commits intoscikit-learn:masterfrom
Conversation
|
If all goes green LGTM. Not sure if it's worth adding a dataset to |
jnothman
left a comment
There was a problem hiding this comment.
Best to test this, to avoid a regression.
|
Btw, should I add a note in the what's new doc? In 0.20 or 0.21? I could append it to the previous entry that was merged by #12246 |
|
It can be a separate note in 0.20.1
|
jnothman
left a comment
There was a problem hiding this comment.
I've not checked that the test falls in master. I've also not checked how big the test data is (is it reasonably small?)
Otherwise LGTM
|
|
||
| @pytest.mark.parametrize('gzip_response', [True, False]) | ||
| def test_fetch_openml_adultcensus(monkeypatch, gzip_response): | ||
| # Check because of the numeric row attribute |
sklearn/datasets/openml.py
Outdated
| data_home) | ||
| arff_data = arff['data'] | ||
| # nominal attributes is a dict mapping from the attribute name to the | ||
| # possible values. Includes also the target column |
There was a problem hiding this comment.
Note however that the target is popped off below
|
I left 10 observations in the dataset, making it to my opinion reasonably small. |
|
regression test fails on master, whole data is 24k, looks good. |
…rn#12330) * modularized data column functionality * small bugfix * removes redundant line breaks * added some documentation on the added fn * added additional comment on advice of Nicholas Hug * added test case * merged master into branch, and added small comments by Joel * added doc item
Fixes #12329
I modularized the function that determines per column whether it is a valid data column (opposed to a target or ignore attribute). This functionality was before (in slightly different manifestations) on 2 different places in the code. This simplifies the main function.
I did not add an additional unit test, as that would require packaging more openml files etc. Personally I think that the fact that the code is more modular and the bug is removed adds to the readability and quality of the code. Let me know if you think otherwise, I can of course add a unit test.