ENH allows to overwrite read_csv parameter in fetch_openml#25488
ENH allows to overwrite read_csv parameter in fetch_openml#25488glemaitre wants to merge 3 commits intoscikit-learn:mainfrom
Conversation
thomasjpfan
left a comment
There was a problem hiding this comment.
Thank you for the PR! I am okay with adding this option.
| the default options. Internally, we used the default parameters of | ||
| :func:`pandas.read_csv` except for the following parameters: | ||
|
|
||
| - `header`: set to `None` |
There was a problem hiding this comment.
Should this be in fetch_openml as part of the public API?
| dtype=dtypes, | ||
| skipinitialspace=True, # skip spaces after delimiter to follow ARFF specs | ||
| ) | ||
| frame = pd.read_csv(gzip_file, **read_csv_kwargs) |
There was a problem hiding this comment.
Currently, if there is an exception while reading the data, one would need to enter a debugger to find out where the file is and what the read_csv_kwargs are. I think it would be helpful reraise an exception that outputs the read_csv_kwargs and gzip_file to help with debugging the issue.
|
I will close this one. Let's keep in mind that it exists if we really need more flexibility and tweak the parameter in the future. |
|
OK, so opening back this one. It seems that we will need it if we want to manage ourself some |
Allows to overwrite the parameter passed to
read_csvwhen reading a dataframe.It is not intended to be used widely but it could be worth it when things go sideways.