-
Notifications
You must be signed in to change notification settings - Fork 27.4k
[RFC] Intel GPU Runtime Upstreaming #114842
Description
Motivation
As mentioned in [RFC] Intel GPU Upstreaming, Intel GPU runtime is the cornerstone to support other features. Technically, to support Intel GPU, the 1st feature that needs to land is Intel GPU runtime support. This RFC is an overview of Intel GPU runtime upstreaming design.
Overall Design
For Intel GPU runtime's support, we have Device, Stream, Event, Guard, Generator, and Allocator, a total of 6 components, that need to be delivered. Essentially, each component has a counterpart already existing in PyTorch upstreamed by other backends. We should follow the current design in PyTorch and reuse their implementation as much as possible. No need for more detailed descriptions for Device, Stream, Event, Guard, and Generator as their current designs and implementations are relatively straightforward in PyTorch. Due to the intricacy of Allocator, we will give an additional RFC for its design for further review [RFC] Intel GPU Runtime Upstreaming for Allocator.
For PyTorch c10/aten library design:
Intel GPU runtime support depends on the SYCL programming language and the SYCL runtime library. The SYCL specification is here and one of the implementations of SYCL runtime is provided by Intel oneAPI base toolkit.
From the engineering perspective, we will build the Intel runtime library using the same compiler as PyTorch (default GCC compiler) and link the SYCL runtime library. We will follow the current design of PyTorch, aligning with other backends already upstreamed, here is our design diagram (library dependence graph).
In detail, libc10_xpu.so will implement the majority of the code about:
DeviceStreamGuard- Device
Allocator
libtorch_xpu.so will implement the majority of the context about:
EventGenerator- Host
Allocatorfor pin-memory
For PyTorch frontend design:
Then, some PyTorch C++ frontend code that needs to bind will be added to libtorch_python.so. There is also some PyTorch frontend Python code that needs to be placed in torch/xpu folder. Eventually, We will have torch.xpu.Stream, torch.xpu.Event, and other torch.xpu.* runtime APIs, like torch.xpu.device_count, torch.xpu.set_device and so on. These torch.xpu.* runtime APIs all have a counterpart already existing in PyTorch and there is no torch.xpu.* runtime API that is specific to XPU.
Here is an architecture diagram of the Intel GPU runtime we designed, illustrated below.
Additional Context
In conclusion, we will
- upstream the majority of Intel GPU runtime, like
Device,Stream,Event,Guard,Generator, andAllocatorin the c10/aten library; - upstream the majority of Intel GPU runtime, like
Device,Stream,Event,Generator, andAllocatorin PyTorch C++ frontend; - bind the PyTorch C++ frontend to the PyTorch Python frontend module
torch.xpu;
The code changes involve many parts of the PyTorch. To be clear and concise, we are preparing to separate these code changes into a few PRs to prevent very hard code reviews. We will upstream these components as below priority.
DeviceStreamEventAllocatorGuardGenerator
cc @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10
Metadata
Metadata
Assignees
Labels
Type
Projects
Status

