Skip to content

[pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation#115948

Closed
SS-JIA wants to merge 1 commit intopytorch:mainfrom
SS-JIA:export-D52087084
Closed

[pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation#115948
SS-JIA wants to merge 1 commit intopytorch:mainfrom
SS-JIA:export-D52087084

Conversation

@SS-JIA
Copy link
Copy Markdown
Contributor

@SS-JIA SS-JIA commented Dec 15, 2023

Summary:
This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

#define PRECISION $precision
#define FORMAT $format

to

#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

  • gen_vulkan_spv.py
    • Majority of functional changes are in this file, which controls how shader templates are processed.
  • shader_params.yaml
    • controls how shader variants are generated

Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. .glsl) is treated as a shader template. To this effect, the templates/ folder has been removed and there is now a global shader_params.yaml file to describe the shader variants that should be generated for all shader templates.

Taking inspiration from XNNPACK's xngen tool, shader templates can now use Python codeblocks. One example is:

$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;

Another is:

  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

generate_variant_forall in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a generate_variant_forall field to iterate over various settings for a specific parameter for each variant defined. Example:

unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)

Previously, the inplace variants would need to have separate shader_variants entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run vulkan_api_test.

# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"

Reviewed By: digantdesai

Differential Revision: D52087084

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Dec 15, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115948

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 49cfbb9 with merge base d85314c (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR module: vulkan release notes: vulkan release notes category labels Dec 15, 2023
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 15, 2023
@SS-JIA SS-JIA added suppress-api-compatibility-check Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) labels Dec 15, 2023
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

SS-JIA added a commit to SS-JIA/pytorch that referenced this pull request Dec 16, 2023
… shader template generation (pytorch#115948)

Summary:
Pull Request resolved: pytorch#115948

This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

## Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

```
#define PRECISION $precision
#define FORMAT $format
```

to

```
#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}
```

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

* `gen_vulkan_spv.py`
  * Majority of functional changes are in this file, which controls how shader templates are processed.
* `shader_params.yaml`
  * controls how shader variants are generated

## Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates.

**Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks**.  One example is:

```
$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
```

Another is:

```
  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
```

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

## `generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example:

```
unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)
```

Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`.

```
# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"
```

Reviewed By: digantdesai, manuelcandales

Differential Revision: D52087084

fbshipit-source-id: 9190e1236bc56d929acd74c17543c72457a5d287
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

SS-JIA added a commit to SS-JIA/pytorch that referenced this pull request Dec 18, 2023
… shader template generation (pytorch#115948)

Summary:
Pull Request resolved: pytorch#115948

This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

## Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

```
#define PRECISION $precision
#define FORMAT $format
```

to

```
#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}
```

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

* `gen_vulkan_spv.py`
  * Majority of functional changes are in this file, which controls how shader templates are processed.
* `shader_params.yaml`
  * controls how shader variants are generated

## Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates.

**Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks**.  One example is:

```
$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
```

Another is:

```
  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
```

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

## `generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example:

```
unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)
```

Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`.

```
# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"
```

Reviewed By: kimishpatel

Differential Revision: D52087084

fbshipit-source-id: 295b38c43f01fc4f500318b27a5e912414861303
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

SS-JIA added a commit to SS-JIA/pytorch that referenced this pull request Dec 19, 2023
… shader template generation (pytorch#115948)

Summary:
Pull Request resolved: pytorch#115948

This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

## Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

```
#define PRECISION $precision
#define FORMAT $format
```

to

```
#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}
```

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

* `gen_vulkan_spv.py`
  * Majority of functional changes are in this file, which controls how shader templates are processed.
* `shader_params.yaml`
  * controls how shader variants are generated

## Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates.

**Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks**.  One example is:

```
$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
```

Another is:

```
  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
```

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

## `generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example:

```
unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)
```

Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`.

```
# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"
```

Reviewed By: kimishpatel

Differential Revision: D52087084

fbshipit-source-id: 8b8e5df4638b1ebc4aae107d6fb1912406ceaeb1
… shader template generation (pytorch#115948)

Summary:
Pull Request resolved: pytorch#115948

This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

## Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

```
#define PRECISION $precision
#define FORMAT $format
```

to

```
#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}
```

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

* `gen_vulkan_spv.py`
  * Majority of functional changes are in this file, which controls how shader templates are processed.
* `shader_params.yaml`
  * controls how shader variants are generated

## Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates.

**Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks**.  One example is:

```
$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
```

Another is:

```
  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
```

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

## `generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example:

```
unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)
```

Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`.

```
# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"
```

Reviewed By: kimishpatel

Differential Revision: D52087084

fbshipit-source-id: 3b5f15e942f765a9e79415f793678bb816dfc368
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D52087084

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
… shader template generation (pytorch#115948)

Summary:
This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

## Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

```
#define PRECISION $precision
#define FORMAT $format
```

to

```
#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}
```

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

* `gen_vulkan_spv.py`
  * Majority of functional changes are in this file, which controls how shader templates are processed.
* `shader_params.yaml`
  * controls how shader variants are generated

## Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. `.glsl`) is treated as a shader template. To this effect, the `templates/` folder has been removed and there is now a global `shader_params.yaml` file to describe the shader variants that should be generated for all shader templates.

**Taking inspiration from XNNPACK's [`xngen` tool](https://github.com/google/XNNPACK/blob/master/tools/xngen.py), shader templates can now use Python codeblocks**.  One example is:

```
$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
```

Another is:

```
  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
```

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

## `generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a `generate_variant_forall` field to iterate over various settings for a specific parameter for each variant defined. Example:

```
unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)
```

Previously, the `inplace` variants would need to have separate `shader_variants` entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run `vulkan_api_test`.

```
# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"
```

Reviewed By: digantdesai

Differential Revision: D52087084

Pull Request resolved: pytorch#115948
Approved by: https://github.com/manuelcandales
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 12, 2024
…hapes

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 12, 2024
…rary that enables dynamic shapes"

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 12, 2024
…s dynamic shapes"

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 13, 2024
…rary that enables dynamic shapes"

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 13, 2024
…s dynamic shapes"

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 13, 2024
…hapes

Pull Request resolved: #2366

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.
ghstack-source-id: 218421178
@exported-using-ghexport

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 13, 2024
…rary that enables dynamic shapes"

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
SS-JIA added a commit to pytorch/executorch that referenced this pull request Mar 13, 2024
…s dynamic shapes"

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.


## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.

Differential Revision: [D54754545](https://our.internmc.facebook.com/intern/diff/D54754545/)

[ghstack-poisoned]
facebook-github-bot pushed a commit to pytorch/executorch that referenced this pull request Mar 13, 2024
…2366)

Summary:
Pull Request resolved: #2366

## Context

pytorch/pytorch#121598 introduces the ability to support dynamic shapes through tensor metadata updates.

The idea is fairly simple. Instead of shaders accepting a UBO with size data for all arguments:

```
layout(set = 0, binding = 2) uniform PRECISION restrict Block {
  ivec4 output_sizes;
  ivec4 other_sizes;
  float alpha;
}
```

Shaders will accept separate UBOs for each piece of tensor metadata:

```
layout(set = 0, binding = 3) uniform PRECISION restrict OutSizes {
  ivec4 data;
}
out_sizes;

layout(set = 0, binding = 4) uniform PRECISION restrict InSizes {
  ivec4 data;
}
in_sizes;

layout(set = 0, binding = 5) uniform PRECISION restrict OtherSizes {
  ivec4 data;
}
other_sizes;

layout(set = 0, binding = 6) uniform PRECISION restrict Alpha {
  float data;
}
alpha;
```

Each UBO will be owned and maintained by the corresponding `vTensor` instance. To support a graph input resize, every tensor in the graph only needs to update their metadata UBOs via the `tensor.virtual_resize(new_sizes)` call. Shader dispatches in subsequent command buffer submissions will then see the updated metadata and execute as if the tensor were the updated sizes.

This changeset introduces a new shader library for the Vulkan graph runtime that enables dynamic shapes through this technique in favor of relying on the shader library from PyTorch Vulkan.

## Considerations

Technically, the UBO update technique can be applied to the shaders from PyTorch Vulkan as well. If that's the case, why introduce a new shader library for the graph runtime?

The primary motivation is code quality.

First, having `vTensor` supply UBOs for their own metadata greatly reduces the need to have operator specifc ad-hoc `Params` structs to organize arguments to write into a `api::UniformParamsBuffer`.

Constructing an `ExecuteNode` for binary operators is now

```
  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      api::shader_registry().get_shader_info(kernel_name.str()),
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      {t_out.gpu_sizes_ubo(),
       t_in1.gpu_sizes_ubo(),
       t_in2.gpu_sizes_ubo(),
       graph.create_params_buffer(alpha_val)}))
```

instead of

```
ArithmeticParams block{
      get_size_as_ivec4(t_out),
      get_size_as_ivec4(t_in1),
      get_size_as_ivec4(t_in2),
      alpha_val,
  };
  api::UniformParamsBuffer params(graph.context(), block);

  graph.execute_nodes().emplace_back(new ExecuteNode(
      graph,
      shader,
      global_size,
      local_size,
      {{out, api::MemoryAccessType::WRITE},
       {{arg1, arg2}, api::MemoryAccessType::READ}},
      std::move(params)));
```

Another consideration is that pytorch/pytorch#115948 which was landed fairly recently enables much more expressive shader templates through the use of Python code blocks in the GLSL template. This enables shader templates that can easily express variants for different data types, packing structures, etc. Introducing a new shader library provides the opportunity to rewrite the shaders in PyTorch Vulkan in a more generic and extensible way.
ghstack-source-id: 218429132
exported-using-ghexport

bypass-github-export-checks
bypass-github-pytorch-ci-checks
bypass-github-executorch-ci-checks

Reviewed By: jorgep31415

Differential Revision: D54754545

fbshipit-source-id: 7e2074699b61f8358358775a8b790d34dcb99ee6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: vulkan release notes: vulkan release notes category suppress-api-compatibility-check Suppresses the failures of API backward-compatibility linter (Lint/bc_linter) suppress-bc-linter Suppresses the failures of API backward-compatibility linter (Lint/bc_linter)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants