🚀 Feature
The goal is to make a "magic scheduler" that takes an algorithm with a reduction op and applies a TensorExpression schedule to match the performance of Pytorch's ATen TensorIterator.
My plan and heuristic are shown in this (NVIDIA Internal) document: https://docs.google.com/document/d/15b8JSnLYu9PIGwEltPXeKOoX5XR_EE_RMjPkRtQ8EHo/
Work is happening on this branch:
https://github.com/csarofeen/pytorch/tree/20_6_11_devel_redsched
Evaluation is happening with this code base:
https://github.com/kevinstephano/codegen_perf
Plan
Part1: Basic Assumptions 2D, only scheduling just a reduction
Part2: Start Addressing assumptions
🚀 Feature
The goal is to make a "magic scheduler" that takes an algorithm with a reduction op and applies a TensorExpression schedule to match the performance of Pytorch's ATen TensorIterator.
My plan and heuristic are shown in this (NVIDIA Internal) document: https://docs.google.com/document/d/15b8JSnLYu9PIGwEltPXeKOoX5XR_EE_RMjPkRtQ8EHo/
Work is happening on this branch:
https://github.com/csarofeen/pytorch/tree/20_6_11_devel_redsched
Evaluation is happening with this code base:
https://github.com/kevinstephano/codegen_perf
Plan
Part1: Basic Assumptions 2D, only scheduling just a reduction
Part2: Start Addressing assumptions