Skip to content

Datasets cached pre-#548 can no longer be used for run_model_on_task #646

@PGijsbers

Description

@PGijsbers

It looks like pre-#548 pickled data was in np.array format, where now the default is pandas.dataframe. When I now try to use run_model_on_task for which I still have a cached dataset with np.array as data instead of pd.DataFrame, this line is called with (data, dataset_format=="array", [some list of attribute names). This raises an error because as far as I can tell _convert_array_format assumes that the input data is pd.DataFrame if specified dataformat is "array", which makes this line raise an error because np.array does not have an attribute columns.

The fix seems easy enough, just check if data is already of the preferred type, e.g. start the function with

def _convert_array_format(data, array_format, attribute_names):
    if array_format == "array" and not scipy.sparse.issparse(data):
        if isinstance(data, np.ndarray):
            return data
        ...

Does this make sense? Shall I set up a PR?
@glemaitre @mfeurer

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions