Skip to content

Device extension should implement async buffer copying for CUDA #245

@paleolimbot

Description

@paleolimbot

Currently, the device extension is unreasonably slow when copying any array to a device because it synchronizes for every buffer copy. For string/binary types, an additional copy is needed to find the length of the next buffer. There is no technical limitation preventing these copies from occuring in parallel; however, the initial PR for this didn't get there.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions