Skip to content

Use parquet metadata to get length #6387

@martindurant

Description

@martindurant

len(df) is often the first thing that a user does when opening a new dataset. This value is stored in the global metadata (if it exists) and stored in the footers of each file in the dataset - and therefore should require no actual data loading. The length could also be found after filtering, if that filtering is applied partition-wise (e.g., a condition on one of the columns inferred from the directory structure).

Obviously, such an enhancement would have no benefit after any filtering (or merge, etc.) that depends on the data itself and is only useful as far as the row count can ever be useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.parquet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions