Motivation
XPU as a new device will be upstreamed into Pytorch, to support the new device and features upstreaming, we must enable Pytorch CI/CD XPU device related tests to gate the quality of the develop pull requests accordingly.
Design philosophy
For the new part CI/CD enabling, we want to reuse existing infrastructure in Pytorch CI/CD as much as possible. And refer to the workflows of other devices. XPU related build / test will be dispatched to Intel Develop Cloud (IDC) Instance self-hosted runners, which has XPU hardware.
- Docker based tests
- Unify entrance workflow of XPU tests
- Multiple stage tests
- Base Docker image build on AWS instance runners
- Wheel build with base Docker image on IDC instance runners
- Tests can be sharded and dispatched on IDC instance runners
Detail
Entrance workflow
The XPU device related tests will be triggered by PR with specific label or regular triggered by timer. Plan to add a new workflow .github/workflows/xpu.yml
name: xpu
on:
push:
branches:
- main
- release/*
tags:
- ciflow/xpu/*
workflow_dispatch:
schedule:
- cron: 29 8 * * * # about 1:29am PDT
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}-${{ github.event_name == 'schedule' }}
cancel-in-progress: true
jobs:
linux-focal-xpu-py3_8-build:
name: linux-focal-xpu-py3.8
uses: ./.github/workflows/_linux-build.yml
with:
build-environment: linux-focal-xpu-py3.8
docker-image-name: pytorch-linux-focal-xpu-n-py3
sync-tag: xpu-build
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 2, runner: "linux.idc.xpu" },
{ config: "default", shard: 1, num_shards: 2, runner: "linux.idc.xpu" },
]}
linux-focal-xpu-py3_8-test:
name: linux-focal-xpu-py3.8
uses: ./.github/workflows/_xpu-test.yml
needs: linux-focal-xpu-py3_8-build
with:
build-environment: linux-focal-xpu-py3.8
docker-image: ${{ needs.linux-focal-xpu-py3_8-build.outputs.docker-image }}
test-matrix: ${{ needs.linux-focal-xpu-py3_8-build.outputs.test-matrix }}
Build & Test
Will add XPU specific base image Dockerfile .ci/docker/ubuntu-xpu/Dockerfile and XPU part into image build script .ci/docker/build.sh to support XPU based image build on linux.2xlarge runners.
For Pytorch wheel build, different with other devices, currently we need dispatch it to IDC instance runners. We will reuse .github/workflows/_linux-build.yml with XPU specific build-environment and add XPU part into Pytorch build script .ci/pytorch/build.sh.
For the test part, we will add a new XPU test workflow .github/workflows/_xpu-test.yml and some necessary GHA such as setup-xpu, teardown-xpu, etc. We also will add new part in test script .ci/pytorch/test.sh and a series utils scripts for XPU.
We also consider the other tests, for example inductor related, which will be implemented with similar philosophy in the future.
Open
We noticed that currently Pytorch CI/CD infrastructure relies on AWS docker registry and S3 service for Docker image and Wheel/Test results sharing among the workflows / test jobs. We want to know if we dispatched the wheel build and test into the IDC instance runner, how can the IDC runner interactive with those services?
Motivation
XPU as a new device will be upstreamed into Pytorch, to support the new device and features upstreaming, we must enable Pytorch CI/CD XPU device related tests to gate the quality of the develop pull requests accordingly.
Design philosophy
For the new part CI/CD enabling, we want to reuse existing infrastructure in Pytorch CI/CD as much as possible. And refer to the workflows of other devices. XPU related build / test will be dispatched to Intel Develop Cloud (IDC) Instance self-hosted runners, which has XPU hardware.
Detail
Entrance workflow
The XPU device related tests will be triggered by PR with specific label or regular triggered by timer. Plan to add a new workflow
.github/workflows/xpu.ymlBuild & Test
Will add XPU specific base image Dockerfile
.ci/docker/ubuntu-xpu/Dockerfileand XPU part into image build script.ci/docker/build.shto support XPU based image build onlinux.2xlargerunners.For Pytorch wheel build, different with other devices, currently we need dispatch it to IDC instance runners. We will reuse
.github/workflows/_linux-build.ymlwith XPU specificbuild-environmentand add XPU part into Pytorch build script.ci/pytorch/build.sh.For the test part, we will add a new XPU test workflow
.github/workflows/_xpu-test.ymland some necessary GHA such assetup-xpu,teardown-xpu, etc. We also will add new part in test script.ci/pytorch/test.shand a series utils scripts for XPU.We also consider the other tests, for example inductor related, which will be implemented with similar philosophy in the future.
Open
We noticed that currently Pytorch CI/CD infrastructure relies on AWS docker registry and S3 service for Docker image and Wheel/Test results sharing among the workflows / test jobs. We want to know if we dispatched the wheel build and test into the IDC instance runner, how can the IDC runner interactive with those services?