-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
dataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.parquet
Description
len(df) is often the first thing that a user does when opening a new dataset. This value is stored in the global metadata (if it exists) and stored in the footers of each file in the dataset - and therefore should require no actual data loading. The length could also be found after filtering, if that filtering is applied partition-wise (e.g., a condition on one of the columns inferred from the directory structure).
Obviously, such an enhancement would have no benefit after any filtering (or merge, etc.) that depends on the data itself and is only useful as far as the row count can ever be useful.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.parquet