-
-
Notifications
You must be signed in to change notification settings - Fork 212
Closed
Description
It looks like pre-#548 pickled data was in np.array format, where now the default is pandas.dataframe. When I now try to use run_model_on_task for which I still have a cached dataset with np.array as data instead of pd.DataFrame, this line is called with (data, dataset_format=="array", [some list of attribute names). This raises an error because as far as I can tell _convert_array_format assumes that the input data is pd.DataFrame if specified dataformat is "array", which makes this line raise an error because np.array does not have an attribute columns.
The fix seems easy enough, just check if data is already of the preferred type, e.g. start the function with
def _convert_array_format(data, array_format, attribute_names):
if array_format == "array" and not scipy.sparse.issparse(data):
if isinstance(data, np.ndarray):
return data
...
Does this make sense? Shall I set up a PR?
@glemaitre @mfeurer
Metadata
Metadata
Assignees
Labels
No labels