Skip to content

Add tag to define differentiable/non-differrentiable variables#2723

Merged
wschin merged 16 commits intoonnx:masterfrom
wschin:label-diff
Jun 2, 2020
Merged

Add tag to define differentiable/non-differrentiable variables#2723
wschin merged 16 commits intoonnx:masterfrom
wschin:label-diff

Conversation

@wschin
Copy link
Copy Markdown
Collaborator

@wschin wschin commented Apr 14, 2020

We are adding a tag to explicitly describe if an input is differentiable or not. This tag, if specified, may be automatically added into operator's signature in Operator.md.

There are two example uses of this new tag, one for defining Split spec and the other one for defining Reshape spec. You can see in Split, both of its input and output are differentiable. For Reshape, its 2nd input is not differentiable.

This PR uses the new tag defining differentiability for 5 operators. Their differentiability is explained below.

Reshape:
This operator has been discussed in #2794.

Shape:
Output is not a differentiable function of input because shape is a vector with discrete values.
Thus, the Jacobian matrix needed for backward doesn't exist.

Size:
Output is not a differentiable function of input because size is a discrete scalar.
Thus, the Jacobian matrix needed for backward doesn't exist.

Concat:
It's straight to derive the Jacobian matrix in this case. Let's consider a concatenation of two one-element vector, [x] and [y]. If axis = 0, the corresponding output should be z=[x, y]. Because dL/dz=[dL/dx, dL/dy], its Jacobian matrix'

[[dx/dx dx/dy]
[dy/dx dy/dy]]

is just an identity matrix. The same idea can be extended to the concatenation of any tensors; we just copy the right part in backward operator's input (dL/dz) to backward operators' inputs (dL/dx and dL/dy).

Split:
The computation of Split's Jacobian is very similar to Concat; it just inversely maps output elements to their original input locations. Assume that an input w=[x, y, z] is split into [x], [y], [z]. The corresponding backward operator maps 3 inputs [dL/dx], [dL/y], [dL/dz] into [dL/dx, dL/dy, dL/dz].

@wschin wschin requested review from a team as code owners April 14, 2020 18:16
@wschin wschin added the topic: training Issues related to ONNX training label Apr 14, 2020
@wschin wschin changed the title [Draft] Add tag to define differentiable/non-differrentiable variables Add tag to define differentiable/non-differrentiable variables Apr 14, 2020
Comment thread onnx/defs/gen_doc.py
if differentiable == formal_parameter.differentiationCategory:
tags.append('differentiable')
elif non_differentiable == formal_parameter.differentiationCategory:
tags.append('non-differentiable')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "unknown" case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether a separate "training" op doc should be generated, that will make the doc understanding easier, given ONNX may support more scenarios, other than DNN. thoughts?

Copy link
Copy Markdown
Collaborator Author

@wschin wschin Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown would be an empty string. Would it be better to use differentiability unknown?

For the second comment, I don't feel splitting this tag from Operator.md is a good idea. To understand why an input/output is differentiable and how to compute its gradient, the reader must read that operator's entire document.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that training algorithms used for deep learning training can be applied to many traditional ML models. Hence, we probably don't need to add differentiability tags to ops in ONNX-ML.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wschin I'm ok for "differentialbility - unknown" or "undefined" to make the 3 cases very clear - "differentiable, not differentiable, undefined", if we don't want to put the training ops into a separate doc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe one line of "statement" - inputs without specifying "differentiability" have that as "undefined" at the beginning of operators.md?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linkerzhang, training ops may not be put in another MD because --- this newly added attribute will be added to ALL existing operators.

Because each operator has their own differentiability, I am not sure how to create an one-line statement in the beginning of operator.md. Do you mean create one line for each operator?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I mean one line in the beginning of operator.md to save clarifying it in each operator :).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linkerzhang, I think I get your point. Is 1f1fc67 we want?

@wschin wschin requested a review from ebarsoum May 12, 2020 17:38
Comment thread onnx/defs/tensor/defs.cc
OpSchema::Single,
true,
1,
OpSchema::Differentiable)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to prove it?

  1. Pytorch/Tensorflow has this differentiable.
  2. Add math of backward to operator and ask reviewer to review the math.
  3. Implement a backward using an existing auto-diff library.

Copy link
Copy Markdown
Member

@linkerzhang linkerzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@wschin wschin merged commit 925b365 into onnx:master Jun 2, 2020
jcwchen pushed a commit to jcwchen/onnx that referenced this pull request Sep 23, 2020
…2723)

* Draft

* Polish code and fix bugs

* One line to explain how we specify undefined differentiability

* Work on entire def.cc in tensor folder

* Revert some operators' changes to cut PR's size

* Clean unused changes

Co-authored-by: Ke Zhang <kezhan@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic: training Issues related to ONNX training

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants