Skip to content

[Data] Improve appearance of repr(dataset) #59482

@bveeramani

Description

@bveeramani

Currently, repr(dataset) looks like this:

>>> import ray
>>> ray.data.read_parquet("example://iris.parquet")
Dataset(
   num_rows=?,
   schema={
      sepal.length: double,
      sepal.width: double,
      petal.length: double,
      petal.width: double,
      variety: string
   }
)
>>>  ray.data.read_parquet("example://iris.parquet").materialize()
MaterializedDataset(
   num_blocks=20,
   num_rows=150,
   schema={
      sepal.length: double,
      sepal.width: double,
      petal.length: double,
      petal.width: double,
      variety: string
   }
)

I think we can improve the UX by making this look like polars:

shape: (150, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
│ sepal.length ┆ sepal.width ┆ petal.length ┆ petal.width ┆ variety   │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ Setosa    │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ Setosa    │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ Setosa    │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ Setosa    │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ Setosa    │
│ …            ┆ …           ┆ …            ┆ …           ┆ …         │
│ 6.7          ┆ 3.0         ┆ 5.2          ┆ 2.3         ┆ Virginica │
│ 6.3          ┆ 2.5         ┆ 5.0          ┆ 1.9         ┆ Virginica │
│ 6.5          ┆ 3.0         ┆ 5.2          ┆ 2.0         ┆ Virginica │
│ 6.2          ┆ 3.4         ┆ 5.4          ┆ 2.3         ┆ Virginica │
│ 5.9          ┆ 3.0         ┆ 5.1          ┆ 1.8         ┆ Virginica │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

Requirements

If the dataset isn't materialized, show a repr like this:

┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
│ sepal.length ┆ sepal.width ┆ petal.length ┆ petal.width ┆ variety   │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

(Dataset isn't materialized)

And if the dataset is materialized, show a repr like this:

┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
│ sepal.length ┆ sepal.width ┆ petal.length ┆ petal.width ┆ variety   │
│ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
│ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
│ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ Setosa    │
│ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ Setosa    │
│ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ Setosa    │
│ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ Setosa    │
│ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ Setosa    │
│ …            ┆ …           ┆ …            ┆ …           ┆ …         │
│ 6.7          ┆ 3.0         ┆ 5.2          ┆ 2.3         ┆ Virginica │
│ 6.3          ┆ 2.5         ┆ 5.0          ┆ 1.9         ┆ Virginica │
│ 6.5          ┆ 3.0         ┆ 5.2          ┆ 2.0         ┆ Virginica │
│ 6.2          ┆ 3.4         ┆ 5.4          ┆ 2.3         ┆ Virginica │
│ 5.9          ┆ 3.0         ┆ 5.1          ┆ 1.8         ┆ Virginica │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

(Showing 10 of 150 rows)

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to Ray

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions