Skip to content

Commit fc81e9b

Browse files
committed
Update on "Unconditionally register schema even for manual registration."
The general concept is that I want a centralized location where you can find all of the registrations for a library. I cannot do this if I don't codegen all of the schemas in one spot--right now, most schemas get generated, but not manually registered ones. Let us assume that manual registration has to do with the actual implementations; nothing strange is going on with the schema definition itself. Make it so. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D20929258](https://our.internmc.facebook.com/intern/diff/D20929258) [ghstack-poisoned]
2 parents 6bca398 + 66b8305 commit fc81e9b

113 files changed

Lines changed: 2127 additions & 1034 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.jenkins/caffe2/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
77
echo 'Skipping tests'
88
exit 0
99
fi
10+
if [[ "${BUILD_ENVIRONMENT}" == *-rocm* ]]; then
11+
# temporary to locate some kernel issues on the CI nodes
12+
export HSAKMT_DEBUG_LEVEL=4
13+
fi
1014

1115
# Find where cpp tests and Caffe2 itself are installed
1216
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

.jenkins/pytorch/common.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]] && [[ "${BUILD_ENVIRONMENT}" =~ py((2|
3030
shopt -s expand_aliases
3131
export PYTORCH_TEST_WITH_ROCM=1
3232
alias python="$PYTHON"
33+
# temporary to locate some kernel issues on the CI nodes
34+
export HSAKMT_DEBUG_LEVEL=4
3335
fi
3436
3537
# This token is used by a parser on Jenkins logs for determining

BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1724,6 +1724,7 @@ cc_library(
17241724
"//third_party/miniz-2.0.8:miniz",
17251725
"@com_google_protobuf//:protobuf",
17261726
"@eigen",
1727+
"@fbgemm//:fbgemm_src_headers",
17271728
"@foxi",
17281729
"@gloo",
17291730
"@onnx",

CONTRIBUTING.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
+ [Why LD_PRELOAD in the build function?](#why-ld-preload-in-the-build-function-)
3232
+ [Why no leak detection?](#why-no-leak-detection-)
3333
- [Caffe2 notes](#caffe2-notes)
34+
- [CI failure tips](#ci-failure-tips)
3435

3536
## Contributing to PyTorch
3637

@@ -938,3 +939,38 @@ are Caffe2/PyTorch specific. Here they are:
938939
- `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are
939940
PyTorch-specific. Don't put Caffe2 code in them without extra
940941
coordination.
942+
943+
## CI failure tips
944+
945+
Once you submit a PR or push a new commit to a branch that is in
946+
an active PR, CI jobs will be run automatically. Some of these may
947+
fail and you will need to find out why, by looking at the logs.
948+
949+
Fairly often, a CI failure might be unrelated to your changes. In this
950+
case, you can usually ignore the failure.
951+
952+
Some failures might be related to specific hardware or environment
953+
configurations. In this case, if the job is run by CircleCI, you can
954+
ssh into the job's session to perform manual debugging using the
955+
following steps:
956+
957+
1. In the CircleCI page for the failed job, make sure you are logged in
958+
and then click the `Rerun` actions dropdown button on the top right.
959+
Click `Rerun Job with SSH`.
960+
961+
2. When the job reruns, a new step will be added in the `STEPS` tab
962+
labelled `Set up SSH`. Inside that tab will be an ssh command that
963+
you can execute in a shell.
964+
965+
3. Once you are connected through ssh, you may need to enter a docker
966+
container. Run `docker ps` to check if there are any docker
967+
containers running. Note that your CI job might be in the process
968+
of initiating a docker container, which means it will not show up
969+
yet. It is best to wait until the CI job reaches a step where it is
970+
building pytorch or running pytorch tests. If the job does have a
971+
docker container, run `docker exec -it IMAGE_ID /bin/bash` to
972+
connect to it.
973+
974+
4. Now you can find the pytorch working directory, which could be
975+
`~/workspace` or `~/project`, and run commands locally to debug
976+
the failure.

WORKSPACE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ new_local_repository(
7979

8080
new_local_repository(
8181
name = "fbgemm",
82-
build_file = "//third_party:fbgemm.BUILD",
82+
build_file = "//third_party:fbgemm/BUILD.bazel",
8383
path = "third_party/fbgemm",
8484
)
8585

@@ -103,7 +103,7 @@ new_local_repository(
103103

104104
new_local_repository(
105105
name = "asmjit",
106-
build_file = "//third_party:asmjit.BUILD",
106+
build_file = "//third_party:fbgemm/third_party/asmjit.BUILD",
107107
path = "third_party/fbgemm/third_party/asmjit",
108108
)
109109

aten/src/ATen/Declarations.cwrap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1044,7 +1044,7 @@
10441044
- THTensor* q
10451045
- THIndexTensor* J
10461046
- long num_samples
1047-
- Generator generator
1047+
- c10::optional<Generator> generator
10481048
]]
10491049
[[
10501050
name: _th_copy_ignoring_overlaps_

aten/src/ATen/Utils.h

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -102,10 +102,11 @@ inline int64_t prod_intlist(ArrayRef<int64_t> list) {
102102
* the backend generator type (CPU/CUDAGeneratorImpl etc.)
103103
*/
104104
template <typename T>
105-
static inline T * check_generator(Generator gen) {
106-
TORCH_CHECK(gen.defined(), "Generator with undefined implementation is not allowed");
107-
TORCH_CHECK(T::device_type() == gen.device().type(), "Expected a '", T::device_type(), "' device type for generator but found '", gen.device().type(), "'");
108-
return gen.get<T>();
105+
static inline T * check_generator(c10::optional<Generator> gen) {
106+
TORCH_CHECK(gen.has_value(), "Expected Generator but received nullopt");
107+
TORCH_CHECK(gen->defined(), "Generator with undefined implementation is not allowed");
108+
TORCH_CHECK(T::device_type() == gen->device().type(), "Expected a '", T::device_type(), "' device type for generator but found '", gen->device().type(), "'");
109+
return gen->get<T>();
109110
}
110111

111112
/**
@@ -115,8 +116,8 @@ static inline T * check_generator(Generator gen) {
115116
* the backend generator type (CPU/CUDAGeneratorImpl etc.)
116117
*/
117118
template <typename T>
118-
static inline T* get_generator_or_default(const Generator& gen, const Generator& default_gen) {
119-
return gen.defined() ? check_generator<T>(gen) : check_generator<T>(default_gen);
119+
static inline T* get_generator_or_default(const c10::optional<Generator>& gen, const Generator& default_gen) {
120+
return gen.has_value() && gen->defined() ? check_generator<T>(gen) : check_generator<T>(default_gen);
120121
}
121122

122123
} // at

aten/src/ATen/core/Generator.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,6 @@ struct CAFFE2_API Generator {
6161
}
6262
}
6363

64-
// TODO(pbelevich): delete this after replace Generator generator = nullptr with c10::optional<at::Generator> = c10::nullopt
65-
Generator(std::nullptr_t gen_impl) {}
66-
6764
bool operator==(const Generator& rhs) const {
6865
return this->impl_ == rhs.impl_;
6966
}

aten/src/ATen/core/dispatch/DispatchKeyExtractor.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,11 @@ namespace detail {
7272
ts = ts | gen.key_set();
7373
}
7474
}
75+
void operator()(c10::optional<at::Generator> gen) {
76+
if (gen.has_value() && gen->defined()) {
77+
ts = ts | gen->key_set();
78+
}
79+
}
7580
template <typename T>
7681
void operator()(const T& x) {
7782
// do nothing

aten/src/ATen/core/dispatch/Dispatcher.cpp

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -286,13 +286,4 @@ void Dispatcher::setManuallyBoxedKernelFor_(const OperatorHandle& op, KernelFunc
286286
op.operatorIterator_->op.setManuallyBoxedKernel_(func);
287287
}
288288

289-
bool Dispatcher::isValid(const OperatorHandle& op) const {
290-
for (auto iter = operators_.begin(); iter != operators_.end(); ++iter) {
291-
if (iter == op.operatorIterator_) {
292-
return true;
293-
}
294-
}
295-
return false;
296-
}
297-
298289
}

0 commit comments

Comments
 (0)