Fix segmentation fault in grad_fn by vishwakftw · Pull Request #9292 · pytorch/pytorch

vishwakftw · 2018-07-10T00:48:34Z

Fixes #8774 .

zou3519 · 2018-07-10T00:58:21Z

Should grad_fn be callable at all?

vishwakftw · 2018-07-10T02:09:59Z

@zou3519 grad_fn represents a form of parameterized gradient function. It probably should not be accessible publicly, but I don't know the design decisions taken. There seems to be a detailed writeup in torch/csrc/autograd/functions.h.

This reverts commit 5ddc3cb. Reverting because this is redundant

apaszke

Thats still an incomplete fix because it might be called with a wrong number. It would be better to use the information we stash in the forward pass to validate the inputs.

vishwakftw · 2018-07-10T14:35:29Z

@apaszke Is my fix correct - because there are some failing tests which I don't know how to debug?

yf225 · 2018-07-10T17:05:54Z

@pytorchbot retest this please

ssnl · 2018-07-11T16:17:00Z

error looks legit

17:19:32 ======================================================================
17:19:32 ERROR: test_once_differentiable (__main__.TestAutograd)
17:19:32 ----------------------------------------------------------------------
17:19:32 Traceback (most recent call last):
17:19:32   File "test_autograd.py", line 131, in test_once_differentiable
17:19:32     x, y = self._function_test(MyFunction)
17:19:32   File "test_autograd.py", line 79, in _function_test
17:19:32     result.sum().backward(go, create_graph=True)
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/tensor.py", line 93, in backward
17:19:32     torch.autograd.backward(self, gradient, retain_graph, create_graph)
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/autograd/__init__.py", line 90, in backward
17:19:32     allow_unreachable=True)  # allow_unreachable flag
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/autograd/function.py", line 76, in apply
17:19:32     return self._forward_cls.backward(self, *args)
17:19:32   File "/opt/python/2.7/lib/python2.7/site-packages/torch/autograd/function.py", line 223, in wrapper
17:19:32     return err_fn(*[fake_requires_grad(v) for v in outputs])
17:19:32 RuntimeError: /var/lib/jenkins/workspace/torch/csrc/autograd/function.h:120: operator(): Assertion `num_inputs() == inputs.size()` failed: expected 0 arguments, got 3 instead
17:19:32 
17:19:32 ----------------------------------------------------------------------

apaszke · 2018-07-11T16:39:02Z

@vishwakftw can you try to see what Function fails the assertion? It might be the case that there's a bug elsewhere

torch/csrc/autograd/functions/basic_ops.h


  virtual variable_list apply(const variable_list& inputs) override;

+  virtual variable_list operator()(const variable_list& inputs) {


apaszke

Oh I didn't notice you added the check at every single function call! Maybe just put it in THPCppFunction_call so that it only applies to Python functions that cause the segfault. The rest of the system should be correct "by design"

torch/csrc/autograd/functions/basic_ops.h


+  variable_list operator()(const variable_list& inputs) {
+    return apply(inputs);
+  }


vishwakftw · 2018-07-11T22:06:14Z

Adding it to THPCppFunction_call fails for the DelayedError function call, after building locally.

vishwakftw · 2018-07-11T23:25:48Z

Same test error as earlier. @apaszke

torch/csrc/autograd/python_cpp_function.cpp

  int num_inputs = PyTuple_GET_SIZE(args);
+  int num_inputs_required = ((THPCppFunction*)self)->cdata->num_inputs();
+  std::string self_name = ((THPCppFunction*)self)->cdata->name();
+  if ((self_name.find("Error") == std::string::npos) && (num_inputs != num_inputs_required)) {


…to reflect this change

torch/csrc/autograd/functions/basic_ops.h

+  DelayedError(std::string msg, int num_inputs)
+    : msg(std::move(msg)) {
+      for (int i = 0; i < num_inputs; i++)
+        input_metadata_.emplace_back();


torch/csrc/autograd/functions/basic_ops.h


  std::string msg;
+
+  uint32_t input_nr;


facebook-github-bot

@apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Fixes #8774 . Reviewed By: soumith Differential Revision: D8836478 Pulled By: apaszke fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38

ssnl · 2018-07-15T14:26:30Z

I know that this has been merged... But can we have a test for this?

vishwakftw · 2018-07-15T14:27:41Z

How do I do it? Should I send in a PR?

ssnl · 2018-07-15T14:28:59Z

That would be great! I can open an issue to track this if you are busy.

vishwakftw · 2018-07-15T14:30:25Z

I think I should be able to do it; doesn't seem like a lot of effort. :)

vishwakftw · 2018-07-15T14:59:10Z

PR in at #9457

* upstream/master: (24 commits) Implement tensor weak references (pytorch#9363) Nuke TestCollectEnv (pytorch#9459) Add test case for segmentation fault fix in grad_fn (pytorch#9457) Add peephole optimization for type_as operators. (pytorch#9316) Fix out-of-range error for test_neg (pytorch#9431) add depthwise conv support for mkldnn (pytorch#8782) Refactor `_log_sum_exp` (pytorch#9173) Add ModuleDict and ParameterDict containers (pytorch#8463) Introduce SupervisedPtr, delete THAllocator and THCDeviceAllocator (pytorch#9358) Introducing IsInf (pytorch#9169) add device to CUDAEvent (pytorch#9415) Make localScalar error message more intuitive (pytorch#9443) Only accept continguous tensors in TopK for cuda (pytorch#9441) Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (pytorch#9299) Only view() rhs of index_put if we need to (pytorch#9424) Add BatchBucketizeOp in caffe2 (pytorch#9385) Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (pytorch#9001) Implementation and operator test for Wngrad optimizer (pytorch#8999) Fix segmentation fault in grad_fn (pytorch#9292) update docs (pytorch#9423) ...

Summary: Fixes pytorch#8774 . Reviewed By: soumith Differential Revision: D8836478 Pulled By: apaszke fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38

Fix segmentation fault in grad_fn

4751a2c

vishwakftw requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 10, 2018 00:48

soumith approved these changes Jul 10, 2018

View reviewed changes

Fix one more source of segmentation fault

5ddc3cb

Revert "Fix one more source of segmentation fault"

7fa71cb

This reverts commit 5ddc3cb. Reverting because this is redundant

apaszke suggested changes Jul 10, 2018

View reviewed changes

Assert if correct number of inputs have been passed

315035f

vishwakftw force-pushed the grad_fn-fix branch from 302aa29 to 315035f Compare July 10, 2018 14:49

Fix bug in DelayedError

9e5c36a

vishwakftw commented Jul 11, 2018

View reviewed changes

torch/csrc/autograd/functions/basic_ops.h Outdated

virtual variable_list apply(const variable_list& inputs) override;

virtual variable_list operator()(const variable_list& inputs) {

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Make operator() virtual for Function

f26f344

apaszke reviewed Jul 11, 2018

View reviewed changes

torch/csrc/autograd/functions/basic_ops.h Outdated

variable_list operator()(const variable_list& inputs) {

return apply(inputs);

}

This comment was marked as off-topic.

Sign in to view

Move error check to THPCppFunction_call

2486b41

apaszke approved these changes Jul 11, 2018

View reviewed changes

Exclude check for Error and DelayedError

3b81072

apaszke suggested changes Jul 12, 2018

View reviewed changes

Add number of inputs to constructor of DelayedError, modify Function …

d00cd68

…to reflect this change

Remove unnecessary code

de06ce6

apaszke suggested changes Jul 13, 2018

View reviewed changes

Update basic_ops.h

33a56d7

apaszke approved these changes Jul 13, 2018

View reviewed changes

facebook-github-bot reviewed Jul 13, 2018

View reviewed changes

facebook-github-bot pushed a commit that referenced this pull request Jul 13, 2018

Fix segmentation fault in grad_fn (#9292)

86eeeab

Summary: Fixes #8774 . Reviewed By: soumith Differential Revision: D8836478 Pulled By: apaszke fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38

vishwakftw closed this Jul 13, 2018

vishwakftw deleted the grad_fn-fix branch July 13, 2018 21:48

ezyang added the open source label Jun 24, 2019


		virtual variable_list apply(const variable_list& inputs) override;

		virtual variable_list operator()(const variable_list& inputs) {

Conversation

vishwakftw commented Jul 10, 2018

Uh oh!

zou3519 commented Jul 10, 2018

Uh oh!

vishwakftw commented Jul 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Jul 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yf225 commented Jul 10, 2018

Uh oh!

ssnl commented Jul 11, 2018

Uh oh!

apaszke commented Jul 11, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

vishwakftw commented Jul 11, 2018

Uh oh!

vishwakftw commented Jul 11, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ssnl commented Jul 15, 2018

Uh oh!

vishwakftw commented Jul 15, 2018

Uh oh!

ssnl commented Jul 15, 2018

Uh oh!

vishwakftw commented Jul 15, 2018

Uh oh!

vishwakftw commented Jul 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

vishwakftw commented Jul 10, 2018 •

edited

Loading

vishwakftw commented Jul 10, 2018 •

edited

Loading