sparse gradcheck: reparametrize some tests to remove masked=True#98490
sparse gradcheck: reparametrize some tests to remove masked=True#98490nikitaved wants to merge 51 commits intogh/nikitaved/35/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98490
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e1bcfa2: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
…d=True" [ghstack-poisoned]
Consider the function >>> a = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_(True)
>>> torch.autograd.gradcheck(lambda x: torch.mm(x, x).to_dense(masked_grad=False), (a,), masked=False)
True
>>> torch.autograd.gradcheck(lambda x: torch.mm(x, x).to_dense(masked_grad=True), (a,), masked=True)
<snip>
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[0.0000, 0.0000, 0.0000, 0.0000],
[2.0000, 3.0000, 0.0000, 2.0000],
[1.0000, 0.0000, 3.0000, 1.0000],
[0.0000, 1.0000, 2.0000, 6.0000]], dtype=torch.float64)
analytical:tensor([[0., 1., 2., 0.],
[2., 3., 0., 2.],
[1., 0., 3., 1.],
[0., 1., 2., 6.]], dtype=torch.float64)The statement holds when the input sparse tensor is a full tensor: Next, consider >>> a = torch.tensor([[0, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_(True)
>>> torch.autograd.gradcheck(lambda x: torch.sparse.mm(x, x).to_dense(masked_grad=True), (a,), masked=True)
Truebut with >>> torch.autograd.gradcheck(lambda x: torch.sparse.mm(x, x).to_dense(masked_grad=False), (a,), masked=False)
<snip>
torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[0.0000, 1.0000, 2.0000, 0.0000],
[2.0000, 3.0000, 0.0000, 2.0000],
[1.0000, 0.0000, 3.0000, 1.0000],
[0.0000, 1.0000, 2.0000, 6.0000]], dtype=torch.float64)
analytical:tensor([[0., 0., 0., 0.],
[2., 3., 0., 2.],
[1., 0., 3., 1.],
[0., 1., 2., 6.]], dtype=torch.float64)unless the input sparse tensor is full: >>> a = torch.tensor([[10, 1], [2, 3]], dtype=torch.float64).to_sparse().requires_grad_(True)
>>> torch.autograd.gradcheck(lambda x: torch.sparse.mm(x, x).to_dense(masked_grad=False), (a,), masked=False)
TrueBased on the above, I cannot confirm that |
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
|
@pearu, it can be once the gradients are properly mapped to the manifold with sparse_mask. Since |
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
I agree. This is what masked tensor support should handle, that is,
Our aim is to deprecate/eliminate Until then, removing the usage of In general, we should deprecate a feature before removing the corresponding tests. In this PR, tests are removed but deprecating the feature is not immediately possible because masked tensors support is not ready yet. |
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
|
I am in no rush to merge. It serves as a proof of concept: |
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
…d=True" Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with `masked=False` will imply success with `masked=True`. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized with `torch.sparse_mask` so that the gradcheck succeeds when `masked=False`. Hence, we can remove `masked=True` altogether. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Most of the sparse functions that work with sparse tensors assume that sparse is an optimization, so a green check with
masked=Falsewill imply success withmasked=True. Functions that assume the sparse semantics and do not explicitly ignore grads outside of the sparse pattern can be re-parametrized withtorch.sparse_maskso that the gradcheck succeeds whenmasked=False. Hence, we can removemasked=Truealtogether.Stack from ghstack (oldest at bottom):
cc @alexsamardzic @pearu @cpuhrsch @amjames @bhosmer @ezyang @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7