[Proposal] Use accelerator in docker container

Hi, all

#23917 introduces [Nvidia-docker](https://github.com/NVIDIA/nvidia-docker), it has provided a convenient way to use Nvidia GPU accelerator in container.
To use more hardware accelerators(e.g. AMD GPU, FPGA, QAT) in containers,
we propose a common way to provide this feature with docker plugins.

### Abstract

Accelerators can be considered as a special kind of resource like `cpu` and `memory`,
though it may contain some other basic resources (e.g.,
volume and devices in nvidia-docker).
The implementation of accelerator should be transparent to containers,
which means container should not need to know which devices or runtime libs are required,
or whether they are matched.
It is accelerator driver's responsibility for handling all these things.

Accelerator request is made up of accelerator runtime descriptions, and descriptions only.
Docker image do not need to hold accelerator runtime libraries in it,
so it would not bind to any specific device.
Accelerator requests can be image labels or docker run arguments.

Accelerator supports should be able to extended by docker plugin mechanism.
Vendors can build their own accelerator plugin to express:
  * which accelerator runtimes (e.g. cuda, opencl, rsa) it support
  * how to allocate/release accelerator resource
  * how to prepare required environment, e.g. devices, libs, etc.
  * how to reset/reuse accelerator resource
  * how to collecting or preprocessing status data
  * ...

When docker engine receives the request to run a container with accelerator , it will:
  * selecting the right accelerator plugin
  * allocating accelerator resource from the plugin
  * updating container configuration according to information provided by the plugin.

The lifecycle of accelerator is really hard to determine,
some may die with the container while others may survive(used by other container),
and accelerator sharing may be needed,
so a docker subcommand may be necessary to maintain accelerator status.

### How to request accelerators in a container

Apps may request accelerators through runtime description.
Apps are implemented with specified runtimes like 'cuda:8.0' or 'opencl',
thus this is the accelerator request.
It can be used through one of the following ways:

* docker image label

    runtime request are determined when docker image is built,
    so users can store this information in images through label:

    `LABEL runtime="gpu0=cuda:8.0;gpu1=opencl:2.2"` // means nothing ,just for example

    `LABEL runtime="fpga0=com.company/fpga/sha256:1.0"`

    In the first example, the runtime label means request two accelerators:
    one with cuda:8.0 and the other opencl:2.2.

    The second example will allocate one fpga accelerator that support sha256:v1.0
    from **_company.com_**.

* docker run arguments

    users can also specify accelerators when creating/running a container, just like this:

    `# docker create --accel cuda:8.0 nvidia/digits:4.0`

    `# docker create --accel fpga0=com.company/fpga/compress:1.0 --accel fpga1=com.company/fpga/decomp:1.0 XXXX`

    or overwrite an accelerator runtime LABEL defined in image

    `# docker create --accel gpu0=cuda:8.0 nvidia/digits:4.0`

    Accelerators `explicit` allocated by --accel options are `persistent`,
    which means they are reserved to container until the container is removed.
    These `implicit` accelerators defined by image LABEL are `non-persistent`,
    which only reserved to container when it is running.
    This distinction is made to prevent waste of resources for unused&stopped container.
    If accelerators should be reserved even when container is stopped,
    they should be declared as `persistent` using --accel option.

### How to parse requests into accelerators

Docker engine can query all the accelerator driver plugins with the requested runtime description.
Drivers scan its capacity(all runtimes it supports), and check whether it can provide specific runtimes or not.
Then engine will request driver to allocate accelerators.

### Accelerator abstraction in docker

The accelerators in container contain three attributes:

  * `volume`(to provide runtime libraries),
  * `device`,
  * `env`(maybe not necessary, but it will provide flexibility to accelerators)

Drivers will provide these informations about a accelerator to docker engine,
and container configurations about `volume` ,`device` and `env` will be updated.

### Interfaces provided by accelerator (driver) plugin

The accelerator drivers need to support basic accelerator operations, include:

 * `Query` // Is runtime supported in this driver
 * `AllocateAccel` // To create a new accelerator, correspond to docker create.
 * `PrepareAccel` // Get accelerator ready(e.g. program bitstream into fpga),
   return volume, device and env, correspond to docker start.
 * `ResetAccel` // Reset accelerator, correspond to docker stop.
 * `ReleaseAccel` // To remove accelerator, correspond to docker rm

Accelerator drivers should implement all these functions to fulfil accelerator
management in container lifecycle.
Docker engine will call these functions in the right place to use accelerators
in container.

These functions are easy to expand with docker plugins.
Accelerator vendor can provide their device support by implementing
all the accelerator plugin interfaces.

### Accelerator management in docker
As metioned above, we may need accelerator subcommand to maintain
all accelerator status and to create/list/remove accelerators,
just like network and volume(not sure if this is realy necessary).


ping @justincormack @flx42 @3XX0 , please take a look, thks :smiley: 

cc @forever043

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Use accelerator in docker container #28642

Abstract

How to request accelerators in a container

How to parse requests into accelerators

Accelerator abstraction in docker

Interfaces provided by accelerator (driver) plugin

Accelerator management in docker

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Proposal] Use accelerator in docker container #28642

Description

Abstract

How to request accelerators in a container

How to parse requests into accelerators

Accelerator abstraction in docker

Interfaces provided by accelerator (driver) plugin

Accelerator management in docker

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions