Question: availability of parquet files

In scikit-learn, we were about to bring a simple new ARFF parser based on `pandas.read_csv`. In short, it skipped the header, read the dataset and cast the nominal columns (we don't really care about the `datetime` format). It is from x4-x10 faster and take x2 less memory.

However, we now wonder if we should indeed integrate this parser since it could become obsolete. Basically, it would depend on the timing regarding making the dataset available in parquet format through the OpenML site. I saw in some previous issue that it could be available soon.

Do you have an estimate (even rough) of the timeline for the feature to land?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: availability of parquet files #1133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question: availability of parquet files #1133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions