Skip to content

Chainer/Concater from single datapipe? #648

@NicolasHug

Description

@NicolasHug

The Concater datapipe takes multiple DPs as input. Is there a class that would take a single datapipe of iterables instead? Something like this:

class ConcaterIterable(IterDataPipe):
    def __init__(self, source_datapipe):
        self.source_datapipe = source_datapipe

    def __iter__(self):
        for iterable in self.source_datapipe:
            yield from iterable

Basically:

itertools.chain == Concater
itertools.chain.from_iterable == ConcaterIterable

Maybe a neat way of implementing this would be to keep a single Concater class, which would fall back to the ConcaterIterable behaviour if it's passed only one DP as input?


Details: I need this for my benchmarking on manifold where each file is a big pickle archive of multiple images. My DP builder looks like this:

def make_manifold_dp(root, dataset_size):
    handler = ManifoldPathHandler()
    dp = IoPathFileLister(root=root)
    dp.register_handler(handler)

    dp = dp.shuffle(buffer_size=dataset_size).sharding_filter()

    dp = IoPathFileOpener(dp, mode="rb")
    dp.register_handler(handler)

    dp = PickleLoaderDataPipe(dp)
    dp = ConcaterIterable(dp)  # <-- Needed here!
    return dp

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions