Add tag to define differentiable/non-differrentiable variables#2723
Add tag to define differentiable/non-differrentiable variables#2723wschin merged 16 commits intoonnx:masterfrom
Conversation
| if differentiable == formal_parameter.differentiationCategory: | ||
| tags.append('differentiable') | ||
| elif non_differentiable == formal_parameter.differentiationCategory: | ||
| tags.append('non-differentiable') |
There was a problem hiding this comment.
I'm wondering whether a separate "training" op doc should be generated, that will make the doc understanding easier, given ONNX may support more scenarios, other than DNN. thoughts?
There was a problem hiding this comment.
Unknown would be an empty string. Would it be better to use differentiability unknown?
For the second comment, I don't feel splitting this tag from Operator.md is a good idea. To understand why an input/output is differentiable and how to compute its gradient, the reader must read that operator's entire document.
There was a problem hiding this comment.
I don't believe that training algorithms used for deep learning training can be applied to many traditional ML models. Hence, we probably don't need to add differentiability tags to ops in ONNX-ML.
There was a problem hiding this comment.
@wschin I'm ok for "differentialbility - unknown" or "undefined" to make the 3 cases very clear - "differentiable, not differentiable, undefined", if we don't want to put the training ops into a separate doc.
There was a problem hiding this comment.
or maybe one line of "statement" - inputs without specifying "differentiability" have that as "undefined" at the beginning of operators.md?
There was a problem hiding this comment.
@linkerzhang, training ops may not be put in another MD because --- this newly added attribute will be added to ALL existing operators.
Because each operator has their own differentiability, I am not sure how to create an one-line statement in the beginning of operator.md. Do you mean create one line for each operator?
There was a problem hiding this comment.
yes, I mean one line in the beginning of operator.md to save clarifying it in each operator :).
There was a problem hiding this comment.
@linkerzhang, I think I get your point. Is 1f1fc67 we want?
| OpSchema::Single, | ||
| true, | ||
| 1, | ||
| OpSchema::Differentiable) |
There was a problem hiding this comment.
How to prove it?
- Pytorch/Tensorflow has this differentiable.
- Add math of backward to operator and ask reviewer to review the math.
- Implement a backward using an existing auto-diff library.
…2723) * Draft * Polish code and fix bugs * One line to explain how we specify undefined differentiability * Work on entire def.cc in tensor folder * Revert some operators' changes to cut PR's size * Clean unused changes Co-authored-by: Ke Zhang <kezhan@microsoft.com>
We are adding a tag to explicitly describe if an input is differentiable or not. This tag, if specified, may be automatically added into operator's signature in Operator.md.
There are two example uses of this new tag, one for defining
Splitspec and the other one for definingReshapespec. You can see inSplit, both of its input and output are differentiable. ForReshape, its 2nd input is not differentiable.This PR uses the new tag defining differentiability for 5 operators. Their differentiability is explained below.
Reshape:
This operator has been discussed in #2794.
Shape:
Output is not a differentiable function of input because shape is a vector with discrete values.
Thus, the Jacobian matrix needed for backward doesn't exist.
Size:
Output is not a differentiable function of input because size is a discrete scalar.
Thus, the Jacobian matrix needed for backward doesn't exist.
Concat:
It's straight to derive the Jacobian matrix in this case. Let's consider a concatenation of two one-element vector, [x] and [y]. If axis = 0, the corresponding output should be z=[x, y]. Because dL/dz=[dL/dx, dL/dy], its Jacobian matrix'
[[dx/dx dx/dy]
[dy/dx dy/dy]]
is just an identity matrix. The same idea can be extended to the concatenation of any tensors; we just copy the right part in backward operator's input (dL/dz) to backward operators' inputs (dL/dx and dL/dy).
Split:
The computation of Split's Jacobian is very similar to Concat; it just inversely maps output elements to their original input locations. Assume that an input w=[x, y, z] is split into [x], [y], [z]. The corresponding backward operator maps 3 inputs [dL/dx], [dL/y], [dL/dz] into [dL/dx, dL/dy, dL/dz].