[MRG] pulling data from openml.org rather than original data source#12004
[MRG] pulling data from openml.org rather than original data source#12004rth merged 3 commits intoscikit-learn:masterfrom maxcopeland:mauna-loa-openml
Conversation
| else: | ||
| # aggregate monthly sum to produce average | ||
| ppmv_sums[-1] += float(ppmv) | ||
| ppmv_sums[-1] += float(ppmvs[i]) |
There was a problem hiding this comment.
you're right, this is redundant
| month_float = y + (m - 1) / 12 | ||
| ppmvs = ml_data.target | ||
|
|
||
| for i in range(len(ppmvs)): |
There was a problem hiding this comment.
You might as well just iterate over zip(month_float, ppmvs)
|
Why is this marked WIP? |
|
That is, what work do you intend to do before this is safe to consider merging? |
|
On the openml.org side, the dataset version still needs to be approved and set to "active". While its status is "in_preparation", their admins could in theory reject the dataset and set as "inactive" (due to compliance issues with tasks or workflows, etc) and would break |
|
@janvanrijn what's the chance of https://www.openml.org/d/41187 not being approved? ;) @maxcopeland is there a reason not to say "Fixes #..." in the PR description? Using that wording, rather than "Works on" means that github will automatically close the original issue when this is merged. |
|
@jnothman Sorry about that! Edited my PR comment. I'll use "Fixes #..." in the future. Newbie error :/ |
|
cool, new datasets :) |
|
@maxcopeland Thanks for uploading the dataset. Seems that there are still some formatting issues in the wiki part. Also, maybe we can provide more information in the wiki (e.g., the url you obtain the data) |
|
Thanks @qinhanmin2014, I've updated the wiki here. Let me know if it's acceptable. |
rth
left a comment
There was a problem hiding this comment.
Very nice, the example looks much cleaner using the openml fetcher!
Fixes #11858
What does this implement/fix? Explain your changes.
This pulls data from openml.org rather than the original data source.
Any other comments?
The dataset has spent several days on openml.org but is still "in_preparation". Data can be pulled via
fetch_openml, but gives a warning about the "in_preparation" status of the dataset's current version.plot_gpr_co2is technically functional, but need to get dataset to "active" state.Left to-do: