Skip to content

Commit cdc2d28

Browse files
ezyangfacebook-github-bot
authored andcommitted
Structured kernel definitions (#45277)
Summary: Pull Request resolved: #45277 Implements structured kernels as per pytorch/rfcs#9 and ports upsample_nearest1d to use the framework. The general structure of this diff: - Define a new syntax for specifying structured kernels in `native_functions.yaml`. You put `structured: True` on the `out` function (that's what you implement) and `structured_delegate: foo.out` on the functional/inplace variants to define them in terms of the `out` function. There's a bunch of new consistency checking to see if you've done this right, though the error messages are of varying quality. This is most of what's going on in tools.codegen.model - NativeFunctionGroup turns into StructuredNativeFunctions. Previously I thought that maybe we would use this grouping mechanism for both structured and unstructured kernels, but it turned out that Jiakai needed to make his own grouping structure. So now I've specialized it for structured kernels, which also means I get to add a bunch of invariants, like requiring structured kernels to have both a functional and an out variant. This is the lower bundle of changes in tools.codegen.model - When you make an out kernel structured, this induces us to generate a new meta function signature for you to write shape checking and output allocation code. The signatures of these is defined by `tools.codegen.api.meta` and generated into `MetaFunctions.h`. Coverage here is very bare bones and will be driven by actual operators we port as we go. - The meaty part of code generation is what we do when we have some grouped StructuredNativeFunctions. We continue to generate a wrapper per function type, but they're are a bit different as the call your meta functions, and make reference to the actual implementations in out. - Then there's a port of `upsample_nearest1d`; easiest to review by just looking at what the final code looks like. Missing pieces: - Stride calculation in TensorMeta - Sufficient sanity checking for inplace/out variants - Enough rope to make TensorIterator work This PR improves instruction counts on `upsample_nearest1d` because it eliminates an extra redispatch. Testing `at::upsample_nearest1d(x, {10});` * Functional: before 1314105, after 1150705 * Out: before 915705, after 838405 These numbers may be jittered up to +-16400 (which is the difference when I tested against an unaffected operator `at::upsample_linear1d`), though that may also because unrelated changes affected all operators globally. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D24253555 Test Plan: Imported from OSS Reviewed By: smessmer Pulled By: ezyang fbshipit-source-id: 4ef58dd911991060f13576864c8171f9cc614456
1 parent d7e8384 commit cdc2d28

10 files changed

Lines changed: 412 additions & 144 deletions

File tree

BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ genrule(
136136
"aten/src/ATen/Functions.h",
137137
"aten/src/ATen/Functions.cpp",
138138
"aten/src/ATen/NativeFunctions.h",
139+
"aten/src/ATen/MetaFunctions.h",
139140
"aten/src/ATen/core/TensorBody.h",
140141
"aten/src/ATen/core/TensorMethods.cpp",
141142
"aten/src/ATen/core/ATenOpList.cpp",

aten/src/ATen/TensorMeta.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#pragma once
2+
3+
#include <ATen/ATen.h> // TODO: improve
4+
// #include <ATen/NativeFunctions.h>
5+
6+
namespace at {
7+
8+
struct TensorMeta {
9+
DimVector sizes;
10+
// TODO: DimVector strides;
11+
TensorOptions options;
12+
13+
TensorMeta(IntArrayRef _sizes, TensorOptions _options)
14+
: sizes(_sizes), options(_options) {}
15+
};
16+
17+
inline Tensor tensor_from_meta(const TensorMeta& meta) {
18+
// TODO: eliminate indirection
19+
return at::empty(meta.sizes, meta.options);
20+
}
21+
22+
// Analogous to self.new_empty(sizes)
23+
inline TensorMeta new_meta(const Tensor& self, IntArrayRef sizes) {
24+
return TensorMeta(sizes, self.options());
25+
}
26+
27+
} // namespace at
Lines changed: 46 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,12 @@
11
#include <ATen/ATen.h>
22
#include <ATen/NativeFunctions.h>
33
#include <ATen/native/UpSample.h>
4+
#include <ATen/MetaFunctions.h>
45

56
namespace at {
6-
namespace native {
7-
namespace {
8-
9-
static void upsample_nearest1d_out_cpu_template(
10-
Tensor& output,
11-
const Tensor& input,
12-
IntArrayRef output_size,
13-
c10::optional<double> scales) {
14-
TORCH_CHECK(
15-
output_size.size() == 1,
16-
"It is expected output_size equals to 1, but got size ",
17-
output_size.size());
18-
19-
int64_t output_width = output_size[0];
20-
21-
int64_t nbatch = input.size(0);
22-
int64_t channels = input.size(1);
23-
int64_t input_width = input.size(2);
24-
25-
upsample_1d_shape_check(
26-
input,
27-
Tensor(),
28-
nbatch,
29-
channels,
30-
input_width,
31-
output_width);
7+
namespace meta {
328

33-
output.resize_({nbatch, channels, output_width});
34-
35-
AT_ASSERT(input_width > 0 && output_width > 0);
36-
upsample_nearest1d_kernel(kCPU, output, input, scales);
37-
}
38-
39-
static void upsample_nearest1d_backward_out_cpu_template(
40-
Tensor& grad_input,
41-
const Tensor& grad_output,
42-
IntArrayRef output_size,
43-
IntArrayRef input_size,
44-
c10::optional<double> scales) {
9+
static std::array<int64_t, 3> upsample_nearest1d_common_check(IntArrayRef input_size, IntArrayRef output_size) {
4510
TORCH_CHECK(
4611
output_size.size() == 1,
4712
"It is expected output_size equals to 1, but got size ",
@@ -58,36 +23,50 @@ static void upsample_nearest1d_backward_out_cpu_template(
5823
int64_t channels = input_size[1];
5924
int64_t input_width = input_size[2];
6025

61-
upsample_1d_shape_check(
62-
Tensor(),
63-
grad_output,
64-
nbatch,
65-
channels,
26+
TORCH_CHECK(
27+
input_width > 0 && output_width > 0,
28+
"Input and output sizes should be greater than 0, but got input (W: ",
6629
input_width,
67-
output_width);
30+
") and output (W: ",
31+
output_width,
32+
")");
6833

69-
grad_input.resize_({nbatch, channels, input_width});
70-
grad_input.zero_();
34+
return {nbatch, channels, output_width};
35+
}
7136

72-
upsample_nearest1d_backward_kernel(kCPU, grad_input, grad_output, scales);
37+
TensorMeta upsample_nearest1d(const Tensor& input, IntArrayRef output_size, c10::optional<double> scales) {
38+
auto full_output_size = upsample_nearest1d_common_check(input.sizes(), output_size);
39+
40+
// Allow for empty batch size but not other dimensions
41+
TORCH_CHECK(
42+
(input.size(1) != 0 && input.size(2) != 0) && input.dim() == 3,
43+
"Non-empty 3D data tensor expected but got a tensor with sizes ",
44+
input.sizes());
45+
46+
return new_meta(input, full_output_size);
7347
}
74-
} // namespace
7548

76-
Tensor& upsample_nearest1d_out_cpu(
77-
Tensor& output,
78-
const Tensor& input,
79-
IntArrayRef output_size,
80-
c10::optional<double> scales) {
81-
upsample_nearest1d_out_cpu_template(output, input, output_size, scales);
82-
return output;
49+
TensorMeta upsample_nearest1d_backward(const Tensor& grad_output, IntArrayRef output_size, IntArrayRef input_size, c10::optional<double> scales) {
50+
auto full_output_size = upsample_nearest1d_common_check(input_size, output_size);
51+
52+
check_dim_size(grad_output, 3, 0, full_output_size[0]);
53+
check_dim_size(grad_output, 3, 1, full_output_size[1]);
54+
check_dim_size(grad_output, 3, 2, full_output_size[2]);
55+
56+
return new_meta(grad_output, input_size);
8357
}
8458

85-
Tensor upsample_nearest1d_cpu(
59+
} // namespace meta
60+
61+
62+
namespace native {
63+
64+
Tensor& upsample_nearest1d_out_cpu(
65+
Tensor& output,
8666
const Tensor& input,
8767
IntArrayRef output_size,
8868
c10::optional<double> scales) {
89-
auto output = at::empty({0}, input.options());
90-
upsample_nearest1d_out_cpu_template(output, input, output_size, scales);
69+
upsample_nearest1d_kernel(kCPU, output, input, scales);
9170
return output;
9271
}
9372

@@ -97,51 +76,38 @@ Tensor& upsample_nearest1d_backward_out_cpu(
9776
IntArrayRef output_size,
9877
IntArrayRef input_size,
9978
c10::optional<double> scales) {
100-
upsample_nearest1d_backward_out_cpu_template(
101-
grad_input, grad_output, output_size, input_size, scales);
102-
return grad_input;
103-
}
104-
105-
Tensor upsample_nearest1d_backward_cpu(
106-
const Tensor& grad_output,
107-
IntArrayRef output_size,
108-
IntArrayRef input_size,
109-
c10::optional<double> scales) {
110-
auto grad_input = at::zeros(input_size, grad_output.options());
111-
upsample_nearest1d_backward_out_cpu_template(
112-
grad_input, grad_output, output_size, input_size, scales);
79+
grad_input.zero_();
80+
upsample_nearest1d_backward_kernel(kCPU, grad_input, grad_output, scales);
11381
return grad_input;
11482
}
11583

11684
using at::native::upsample::compute_output_size;
11785
using at::native::upsample::get_scale_value;
11886

119-
Tensor upsample_nearest1d_cpu(
87+
// vec variants
88+
89+
Tensor upsample_nearest1d(
12090
const Tensor& input,
12191
c10::optional<IntArrayRef> output_size,
12292
c10::optional<ArrayRef<double>> scale_factors) {
123-
auto output = at::empty({0}, input.options());
12493
auto osize = compute_output_size(input.sizes(), output_size, scale_factors);
12594
auto scale_w = get_scale_value(scale_factors, 0);
126-
upsample_nearest1d_out_cpu_template(output, input, osize, scale_w);
127-
return output;
95+
return at::upsample_nearest1d(input, osize, scale_w);
12896
}
12997

130-
Tensor upsample_nearest1d_backward_cpu(
98+
Tensor upsample_nearest1d_backward(
13199
const Tensor& grad_output,
132100
c10::optional<IntArrayRef> output_size,
133101
IntArrayRef input_size,
134102
c10::optional<ArrayRef<double>> scale_factors) {
135103
auto osize = compute_output_size(input_size, output_size, scale_factors);
136104
auto scale_w = get_scale_value(scale_factors, 0);
137-
auto grad_input = at::zeros(input_size, grad_output.options());
138-
upsample_nearest1d_backward_out_cpu_template(
139-
grad_input, grad_output, osize, input_size, scale_w);
140-
return grad_input;
105+
return at::upsample_nearest1d_backward(grad_output, osize, input_size, scale_w);
141106
}
142107

143108
DEFINE_DISPATCH(upsample_nearest1d_kernel);
144109
DEFINE_DISPATCH(upsample_nearest1d_backward_kernel);
145110

146111
} // namespace native
112+
147113
} // namespace at

aten/src/ATen/native/native_functions.yaml

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8253,15 +8253,13 @@
82538253
use_c10_dispatcher: full
82548254
python_module: nn
82558255
dispatch:
8256-
CPU: upsample_nearest1d_cpu
8257-
CUDA: upsample_nearest1d_cuda
8256+
DefaultBackend: upsample_nearest1d
82588257

82598258
- func: upsample_nearest1d_backward.vec(Tensor grad_output, int[]? output_size, int[] input_size, float[]? scale_factors) -> Tensor
82608259
use_c10_dispatcher: full
82618260
python_module: nn
82628261
dispatch:
8263-
CPU: upsample_nearest1d_backward_cpu
8264-
CUDA: upsample_nearest1d_backward_cuda
8262+
DefaultBackend: upsample_nearest1d_backward
82658263

82668264
- func: upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> Tensor
82678265
use_c10_dispatcher: full
@@ -8401,29 +8399,27 @@
84018399

84028400
- func: upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> Tensor(a!)
84038401
python_module: nn
8402+
structured: True
84048403
dispatch:
84058404
CPU: upsample_nearest1d_out_cpu
84068405
CUDA: upsample_nearest1d_out_cuda
84078406

84088407
- func: upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> Tensor
84098408
use_c10_dispatcher: full
84108409
python_module: nn
8411-
dispatch:
8412-
CPU: upsample_nearest1d_cpu
8413-
CUDA: upsample_nearest1d_cuda
8410+
structured_delegate: upsample_nearest1d.out
84148411

84158412
- func: upsample_nearest1d_backward.grad_input(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None, *, Tensor(a!) grad_input) -> Tensor(a!)
84168413
python_module: nn
8414+
structured: True
84178415
dispatch:
84188416
CPU: upsample_nearest1d_backward_out_cpu
84198417
CUDA: upsample_nearest1d_backward_out_cuda
84208418

84218419
- func: upsample_nearest1d_backward(Tensor grad_output, int[1] output_size, int[3] input_size, float? scales=None) -> Tensor
84228420
use_c10_dispatcher: full
84238421
python_module: nn
8424-
dispatch:
8425-
CPU: upsample_nearest1d_backward_cpu
8426-
CUDA: upsample_nearest1d_backward_cuda
8422+
structured_delegate: upsample_nearest1d_backward.grad_input
84278423

84288424
- func: upsample_nearest2d.out(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None, *, Tensor(a!) out) -> Tensor(a!)
84298425
python_module: nn
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#pragma once
2+
3+
// ${generated_comment}
4+
5+
#include <ATen/ATen.h> // TODO: improve
6+
#include <ATen/TensorMeta.h>
7+
8+
namespace at {
9+
namespace meta {
10+
11+
${declarations}
12+
13+
} // namespace meta
14+
} // namespace at

aten/src/ATen/templates/RegisterDispatchKey.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include <c10/core/Allocator.h>
1212
#include <ATen/DeviceGuard.h>
1313
#include <ATen/NativeFunctions.h>
14+
#include <ATen/MetaFunctions.h>
1415
#include <ATen/NamedTensorUtils.h>
1516
#include <ATen/Utils.h>
1617
#include <ATen/WrapDimUtils.h>

tools/codegen/api/meta.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from tools.codegen.model import *
2+
from tools.codegen.api.types import MetaArgument
3+
4+
import tools.codegen.api.cpp as cpp
5+
import tools.codegen.api.dispatcher as dispatcher
6+
7+
from typing import Sequence
8+
import itertools
9+
10+
# Follows dispatcher calling convention, but:
11+
# - Mutable arguments not allowed. Meta functions are always
12+
# written in functional form. Look at FunctionSchema.signature()
13+
# - No tensor returns; instead we return a TensorMeta describing
14+
# the tensor in question
15+
16+
def name(f: FunctionSchema) -> str:
17+
assert f.name.overload_name == ""
18+
return str(f.name.name)
19+
20+
def argument_type(a: Argument) -> str:
21+
assert not a.is_write
22+
return dispatcher.argumenttype_type(a.type, mutable=False)
23+
24+
def returntype_type(t: Type) -> str:
25+
r = cpp.valuetype_type(t)
26+
if r is not None:
27+
return r
28+
29+
if isinstance(t, BaseType):
30+
if t.name == BaseTy.Tensor:
31+
return 'TensorMeta'
32+
elif isinstance(t, ListType):
33+
raise NotImplementedError("list returns not supported yet")
34+
35+
raise AssertionError(f"unrecognized return type {t}")
36+
37+
def return_type(r: Return) -> str:
38+
assert not r.is_write
39+
return returntype_type(r.type)
40+
41+
def returns_type(rs: Sequence[Return]) -> str:
42+
if len(rs) == 0:
43+
return 'void'
44+
elif len(rs) == 1:
45+
return return_type(rs[0])
46+
else:
47+
args = ','.join(map(return_type, rs))
48+
return f'std::tuple<{args}>'
49+
50+
def argument(a: Argument) -> MetaArgument:
51+
return MetaArgument(
52+
type=argument_type(a),
53+
name=a.name,
54+
argument=a,
55+
)
56+
57+
def arguments(func: FunctionSchema) -> Sequence[MetaArgument]:
58+
assert not func.out_arguments
59+
return list(map(argument, itertools.chain(func.arguments, func.kwarg_only_arguments)))

0 commit comments

Comments
 (0)