-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to RayGreat starter issue for someone just starting to contribute to Raystability
Description
What happened + What you expected to happen
includes_paths=Truedoes not addpathsto the schema- it would fail some checks in the
groupby
For example, ds = ray.data.read_parquet("data/", include_paths=True), gives
In [24]: ds
Out[24]: Dataset(num_rows=?, schema={id: int64, data: double, uuid: string})without the expected path column.
Then if we want to do
ds.groupby("path").count().take_all()It fails in SortKey.validate_schema(self, schema):
81 for column in self._columns:
82 if column not in schema_names_set:
---> 83 raise ValueError(
84 f"You specified the column '{column}', but there's no such "
85 "column in the dataset. The dataset has columns: "
86 f"{schema.names}"
87 )
ValueError: You specified the column 'path', but there's no such column in the dataset. The dataset has columns: ['id', 'data', 'uuid']For debugging purpose, it would work if:
- disable that line of check
- or use materialize()
Versions / Dependencies
master
Reproduction script
Any dataset:
ds = ray.data.read_parquet("data/", include_paths=True)
ds.groupby("path").count()Issue Severity
Medium: It is a significant difficulty but I can work around it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcommunity-backlogdataRay Data-related issuesRay Data-related issuesgood-first-issueGreat starter issue for someone just starting to contribute to RayGreat starter issue for someone just starting to contribute to Raystability