Skip to content

[RFC] Intel GPU Runtime Upstreaming #114842

@guangyey

Description

@guangyey

Motivation

As mentioned in [RFC] Intel GPU Upstreaming, Intel GPU runtime is the cornerstone to support other features. Technically, to support Intel GPU, the 1st feature that needs to land is Intel GPU runtime support. This RFC is an overview of Intel GPU runtime upstreaming design.

Overall Design

For Intel GPU runtime's support, we have Device, Stream, Event, Guard, Generator, and Allocator, a total of 6 components, that need to be delivered. Essentially, each component has a counterpart already existing in PyTorch upstreamed by other backends. We should follow the current design in PyTorch and reuse their implementation as much as possible. No need for more detailed descriptions for Device, Stream, Event, Guard, and Generator as their current designs and implementations are relatively straightforward in PyTorch. Due to the intricacy of Allocator, we will give an additional RFC for its design for further review [RFC] Intel GPU Runtime Upstreaming for Allocator.

For PyTorch c10/aten library design:

Intel GPU runtime support depends on the SYCL programming language and the SYCL runtime library. The SYCL specification is here and one of the implementations of SYCL runtime is provided by Intel oneAPI base toolkit.
From the engineering perspective, we will build the Intel runtime library using the same compiler as PyTorch (default GCC compiler) and link the SYCL runtime library. We will follow the current design of PyTorch, aligning with other backends already upstreamed, here is our design diagram (library dependence graph).

image

In detail, libc10_xpu.so will implement the majority of the code about:

  • Device
  • Stream
  • Guard
  • Device Allocator

libtorch_xpu.so will implement the majority of the context about:

  • Event
  • Generator
  • Host Allocator for pin-memory

For PyTorch frontend design:

Then, some PyTorch C++ frontend code that needs to bind will be added to libtorch_python.so. There is also some PyTorch frontend Python code that needs to be placed in torch/xpu folder. Eventually, We will have torch.xpu.Stream, torch.xpu.Event, and other torch.xpu.* runtime APIs, like torch.xpu.device_count, torch.xpu.set_device and so on. These torch.xpu.* runtime APIs all have a counterpart already existing in PyTorch and there is no torch.xpu.* runtime API that is specific to XPU.
Here is an architecture diagram of the Intel GPU runtime we designed, illustrated below.

image

Additional Context

In conclusion, we will

  • upstream the majority of Intel GPU runtime, like Device, Stream, Event, Guard, Generator, and Allocator in the c10/aten library;
  • upstream the majority of Intel GPU runtime, like Device, Stream, Event, Generator, and Allocator in PyTorch C++ frontend;
  • bind the PyTorch C++ frontend to the PyTorch Python frontend module torch.xpu;

The code changes involve many parts of the PyTorch. To be clear and concise, we are preparing to separate these code changes into a few PRs to prevent very hard code reviews. We will upstream these components as below priority.

  • Device
  • Stream
  • Event
  • Allocator
  • Guard
  • Generator

cc @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: intelSpecific to x86 architecturetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions