[pt-vulkan] Enable Python code blocks in shader templates and upgrade shader template generation by SS-JIA · Pull Request #115948 · pytorch/pytorch

SS-JIA · 2023-12-15T21:23:56Z

Summary:
This change makes two major improvements to PyTorch Vulkan's shader authoring workflow.

Review Guide

There are a lot of changed files because every GLSL shader had to be touched. The majority of changes is changing

#define PRECISION $precision
#define FORMAT $format

to

#define PRECISION ${PRECISION}
#define FORMAT ${FORMAT}

due to changes in how shader templates are processed.

For reviewers, the primary functional changes to review are:

gen_vulkan_spv.py
- Majority of functional changes are in this file, which controls how shader templates are processed.
shader_params.yaml
- controls how shader variants are generated

Python Codeblocks in Shader Templates

From now on, every compute shader (i.e. .glsl) is treated as a shader template. To this effect, the templates/ folder has been removed and there is now a global shader_params.yaml file to describe the shader variants that should be generated for all shader templates.

Taking inspiration from XNNPACK's xngen tool, shader templates can now use Python codeblocks. One example is:

$if not INPLACE:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict writeonly image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uInput;
  layout(set = 0, binding = 2) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 3) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 input_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;
$else:
  layout(set = 0, binding = 0, FORMAT) uniform PRECISION restrict image3D uOutput;
  layout(set = 0, binding = 1) uniform PRECISION sampler3D uOther;
  layout(set = 0, binding = 2) uniform PRECISION restrict Block {
    ivec4 output_sizes;
    ivec4 other_sizes;
    float alpha;
  }
  uArgs;

Another is:

  // PYTHON CODEBLOCK
  $if not IS_DIV:
    const int c_index = (pos.z % ((uArgs.output_sizes.z + 3) / 4)) * 4;
    if (uArgs.other_sizes.z != 1 && c_index + 3 >= uArgs.output_sizes.z) {
      ivec4 c_ind = ivec4(c_index) + ivec4(0, 1, 2, 3);
      vec4 mask = vec4(lessThan(c_ind, ivec4(uArgs.output_sizes.z)));
      other_texel = other_texel * mask + vec4(1, 1, 1, 1) - mask;
    }

  // PYTHON CODEBLOCK
  $if not INPLACE:
    ivec3 input_pos =
        map_output_pos_to_input_pos(pos, uArgs.output_sizes, uArgs.input_sizes);
    const vec4 in_texel =
        load_texel(input_pos, uArgs.output_sizes, uArgs.input_sizes, uInput);

    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));
  $else:
    const vec4 in_texel = imageLoad(uOutput, pos);
    imageStore(uOutput, pos, OP(in_texel, other_texel, uArgs.alpha));

In addition to making it easier and clearer to write shader templates, this enables shaders that were previously unable to be consolidated into a single template to now be represented using a single template, such as non inplace and inplace variants of the same shader.

`generate_variant_forall` in shader variant YAML configuration

YAML files that describe how shader variants should be generated can now use a generate_variant_forall field to iterate over various settings for a specific parameter for each variant defined. Example:

unary_op:
  parameter_names_with_default_values:
    OPERATOR: exp(X)
    INPLACE: 0
  generate_variant_forall:
    INPLACE:
      - VALUE: 0
        SUFFIX: ""
      - VALUE: 1
        SUFFIX: "inplace"
  shader_variants:
    - NAME: exp
      OPERATOR: exp(X)
    - NAME: sqrt
      OPERATOR: sqrt(X)
    - NAME: log
      OPERATOR: log(X)

Previously, the inplace variants would need to have separate shader_variants entries. If there are multiple variables that need to be iterated across, then all possible combinations will be generated. Would be good to take a look to see how the new YAML configuration works.

Test Plan:
There is no functional change to this diff; we only need to make sure that the generated shaders are still correct. Therefore, we only need to run vulkan_api_test.

# On Mac Laptop
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 -- --gtest_filter="*"

Reviewed By: digantdesai

Differential Revision: D52087084

pytorch-bot · 2023-12-15T21:24:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115948

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 49cfbb9 with merge base d85314c ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

periodic / linux-focal-rocm5.7-py3.8 / test (distributed, 1, 2, linux.rocm.gpu) (gh)
distributed/test_functional_api.py::TestNCCLCollectivesWithWorldSize4::test_tracing_with_fakepg

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-focal-py3.11-clang10 / test (dynamo, 4, 7, linux.2xlarge, unstable) (gh)
test_unary_ufuncs.py::TestUnaryUfuncsCPU::test_reference_numerics_large_mvlgamma_mvlgamma_p_1_cpu_bfloat16
pull / linux-focal-py3.11-clang10 / test (dynamo, 5, 7, linux.2xlarge, unstable) (gh)
test_unary_ufuncs.py::TestUnaryUfuncsCPU::test_reference_numerics_large__refs_special_multigammaln_mvlgamma_p_5_cpu_float32
pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 7, linux.2xlarge, unstable) (gh)
test_unary_ufuncs.py::TestUnaryUfuncsCPU::test_reference_numerics_large_mvlgamma_mvlgamma_p_1_cpu_bfloat16
pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 7, linux.2xlarge, unstable) (gh)
test_unary_ufuncs.py::TestUnaryUfuncsCPU::test_reference_numerics_large__refs_special_multigammaln_mvlgamma_p_5_cpu_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-12-15T21:24:38Z