Skip to content

fetch_openml with mnist_784 uses excessive memory  #19774

@louisabraham

Description

@louisabraham
from sklearn.datasets import fetch_openml
fetch_openml(name="mnist_784")

Uses 3GB of RAM during execution and then 1.5 GB. Additional runs make the memory usage go up by 500 MB each time.

The whole dataset has 70k values data of dimension 784. It should take about 500MB in memory. I don't understand why the function uses so much memory.

This has caused numerous people to have memory errors in the past:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions