Adding a Reduction Heuristic Scheduler to match the Pytorch ATen TensorIterator

## 🚀 Feature


The goal is to make a "magic scheduler" that takes an algorithm with a reduction op and applies a TensorExpression schedule to match the performance of Pytorch's ATen TensorIterator.

My plan and heuristic are shown in this (NVIDIA Internal) document: https://docs.google.com/document/d/15b8JSnLYu9PIGwEltPXeKOoX5XR_EE_RMjPkRtQ8EHo/

Work is happening on this branch:
https://github.com/csarofeen/pytorch/tree/20_6_11_devel_redsched

Evaluation is happening with this code base:
https://github.com/kevinstephano/codegen_perf

## Plan
### Part1: Basic Assumptions 2D, only scheduling just a reduction

- [x]  Get simple schedule up and running in a test
- [x] Reverse engineer the ATEN heuristic
- [x] Implement Aten's schedule
  - [x] Get a scheduler file and function stubbed out
  - [x] Add code to calculate BIDx, BIDy, TIDx, and TIDy splits
  - [x] Fix scheduling that needs to split based on remaining size
  - [x] Modify my codegen_perf code to compare against ATen
  - [x] Fix errors in going over fastest dimension reduction differences
  - [x] Add cross-block reductions
  - [x] Fix reductions not on fastest dimension.
  - [x] Need to fix outer dimension reductions with Vectorize.  The perf is off.
  - [x] Need to add FP16 support.  I am going to write out a test first to capture the behavior. 
- [ ] Perf test schedule

### Part2: Start Addressing assumptions

- [x] Generalize to more than 2D tensors
- [ ] Make schedule usable in the presence of other fusion ops. Should the scheduling be from the bottom up?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a Reduction Heuristic Scheduler to match the Pytorch ATen TensorIterator #116

🚀 Feature

Plan

Part1: Basic Assumptions 2D, only scheduling just a reduction

Part2: Start Addressing assumptions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Adding a Reduction Heuristic Scheduler to match the Pytorch ATen TensorIterator #116

Description

🚀 Feature

Plan

Part1: Basic Assumptions 2D, only scheduling just a reduction

Part2: Start Addressing assumptions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions