[RFC] Intel GPU Runtime Upstreaming

# Motivation

As mentioned in [[RFC] Intel GPU Upstreaming](https://github.com/pytorch/pytorch/issues/114723), Intel GPU runtime is the cornerstone to support other features. Technically, to support Intel GPU, the 1st feature that needs to land is Intel GPU runtime support. This RFC is an overview of Intel GPU runtime upstreaming design.

# Overall Design

For Intel GPU runtime's support, we have `Device`, `Stream`, `Event`, `Guard`, `Generator`, and `Allocator`, a total of 6 components, that need to be delivered. Essentially, each component has a counterpart already existing in PyTorch upstreamed by other backends. We should follow the current design in PyTorch and reuse their implementation as much as possible. No need for more detailed descriptions for `Device`, `Stream`, `Event`, `Guard`, and `Generator` as their current designs and implementations are relatively straightforward in PyTorch. Due to the intricacy of `Allocator`, we will give an additional RFC for its design for further review  [[RFC] Intel GPU Runtime Upstreaming for Allocator](https://github.com/pytorch/pytorch/issues/116322).

## For PyTorch c10/aten library design:

Intel GPU runtime support depends on the SYCL programming language and the SYCL runtime library. The SYCL specification is [here](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html) and one of the implementations of SYCL runtime is provided by [Intel oneAPI base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
From the engineering perspective, we will build the Intel runtime library using the same compiler as PyTorch (default GCC compiler) and link the SYCL runtime library. We will follow the current design of PyTorch, aligning with other backends already upstreamed, here is our design diagram (library dependence graph).
<p align="center">
<img width="425" alt="image" src="https://github.com/pytorch/pytorch/assets/106960996/cc16892c-ea94-403b-8c4d-270a6a250e7a">
</p>

In detail, `libc10_xpu.so` will implement the majority of the code about:
- `Device`
- `Stream`
- `Guard`
- Device `Allocator`

`libtorch_xpu.so` will implement the majority of the context about:
- `Event`
- `Generator`
- Host `Allocator` for pin-memory

## For PyTorch frontend design:

Then, some PyTorch C++ frontend code that needs to bind will be added to `libtorch_python.so`. There is also some PyTorch frontend Python code that needs to be placed in `torch/xpu` folder. Eventually, We will have `torch.xpu.Stream`, `torch.xpu.Event`, and other `torch.xpu.*` runtime APIs, like `torch.xpu.device_count`, `torch.xpu.set_device` and so on. These `torch.xpu.*` runtime APIs all have a counterpart already existing in PyTorch and there is no `torch.xpu.*` runtime API that is specific to XPU.
Here is an architecture diagram of the Intel GPU runtime we designed, illustrated below.
<p align="center">
<img width="785" alt="image" src="https://github.com/pytorch/pytorch/assets/106960996/aa472552-9e96-4402-9cda-019139f8de3c">
</p>

# Additional Context
In conclusion, we will
- upstream the majority of Intel GPU runtime, like `Device`, `Stream`, `Event`, `Guard`, `Generator`, and `Allocator` in the c10/aten library;
- upstream the majority of Intel GPU runtime, like `Device`, `Stream`, `Event`, `Generator`, and `Allocator` in PyTorch C++ frontend;
- bind the PyTorch C++ frontend to the PyTorch Python frontend module `torch.xpu`;

The code changes involve many parts of the PyTorch. To be clear and concise, we are preparing to separate these code changes into a few PRs to prevent very hard code reviews. We will upstream these components as below priority.
- `Device`
- `Stream`
- `Event`
- `Allocator`
- `Guard`
- `Generator`


cc @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Intel GPU Runtime Upstreaming #114842

Motivation

Overall Design

For PyTorch c10/aten library design:

For PyTorch frontend design:

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Intel GPU Runtime Upstreaming #114842

Description

Motivation

Overall Design

For PyTorch c10/aten library design:

For PyTorch frontend design:

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions