Skip to content

streaming dataset with concatenating splits raises an error #4804

@Bing-su

Description

@Bing-su

Describe the bug

streaming dataset with concatenating splits raises an error

Steps to reproduce the bug

from datasets import load_dataset

# no error
repo = "nateraw/ade20k-tiny"
dataset = load_dataset(repo, split="train+validation")
from datasets import load_dataset

# error
repo = "nateraw/ade20k-tiny"
dataset = load_dataset(repo, split="train+validation", streaming=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-4-a6ae02d63899>](https://localhost:8080/#) in <module>()
      3 # error
      4 repo = "nateraw/ade20k-tiny"
----> 5 dataset = load_dataset(repo, split="train+validation", streaming=True)

1 frames
[/usr/local/lib/python3.7/dist-packages/datasets/builder.py](https://localhost:8080/#) in as_streaming_dataset(self, split, base_path)
   1030             splits_generator = splits_generators[split]
   1031         else:
-> 1032             raise ValueError(f"Bad split: {split}. Available splits: {list(splits_generators)}")
   1033 
   1034         # Create a dataset for each of the given splits

ValueError: Bad split: train+validation. Available splits: ['validation', 'train']

Colab

Expected results

load successfully or throws an error saying it is not supported.

Actual results

above

Environment info

  • datasets version: 2.4.0
  • Platform: Windows-10-10.0.22000-SP0 (windows11 x64)
  • Python version: 3.9.13
  • PyArrow version: 8.0.0
  • Pandas version: 1.4.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions