Skip to content

[jit] Add dictionary as output of tracer#31860

Merged
wanchaol merged 12 commits intogh/wanchaol/75/basefrom
gh/wanchaol/75/head
Apr 16, 2020
Merged

[jit] Add dictionary as output of tracer#31860
wanchaol merged 12 commits intogh/wanchaol/75/basefrom
gh/wanchaol/75/head

Conversation

@wanchaol
Copy link
Copy Markdown
Collaborator

@wanchaol wanchaol commented Jan 4, 2020

Stack from ghstack:

Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D20638438

@wanchaol wanchaol requested a review from apaszke as a code owner January 4, 2020 18:52
wanchaol added a commit that referenced this pull request Jan 4, 2020
Summary:

This add dictionary as a supported output of tracer.

Current issue: StringType is not handled in getOutput, so it will error
out

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: b62934e
Pull Request resolved: #31860
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jan 4, 2020
@kostmo
Copy link
Copy Markdown
Member

kostmo commented Jan 4, 2020

💊 Build failures summary and remediations

As of commit 89ad086 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 39 times.

@wanchaol wanchaol changed the title [WIP] Add dictionary as output of tracer [jit] Add dictionary as output of tracer Mar 3, 2020
wanchaol added a commit that referenced this pull request Mar 3, 2020
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: eade920
Pull Request resolved: #31860
wanchaol added a commit that referenced this pull request Mar 5, 2020
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c866a16
Pull Request resolved: #31860
@bddppq bddppq force-pushed the gh/wanchaol/75/head branch from 0cf4e6f to 9938cc8 Compare March 24, 2020 10:49
@bddppq bddppq force-pushed the gh/wanchaol/75/head branch from 9938cc8 to 9f81e8d Compare March 24, 2020 10:57
@wanchaol wanchaol requested review from suo and removed request for ebetica, ezyang, pietern, soumith and yf225 March 24, 2020 22:27
@bddppq
Copy link
Copy Markdown
Contributor

bddppq commented Apr 3, 2020

The surprising behavior demonstrated in #31860 (review) is actually not caused by the limitation of static dict, but rather the size Tensor has been casted to Python int in the range(x.size(0)) expression. With other container types tracing has similar behavior:

import torch

def foo_list(x):
  rv = []
  for i in range(x.size(0)):
    rv.append(x[i])
  return rv

traced = torch.jit.trace(foo_list, (torch.randn(3, 1)))
print(traced(torch.randn(3, 1)))
print(traced(torch.randn(5, 1)))

def foo_tuple(x):
    return tuple(x[i] for i in range(x.size(0)))

traced = torch.jit.trace(foo_tuple, (torch.randn(3, 1)))
print(traced(torch.randn(3, 1)))
print(traced(torch.randn(5, 1)))
 TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of P
ython values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  for i in range(x.size(0)):
[tensor([-0.0323]), tensor([-2.0990]), tensor([0.3493])]
[tensor([-0.3254]), tensor([1.0486]), tensor([-0.7183])]
TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of
Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return tuple(x[i] for i in range(x.size(0)))
(tensor([0.5241]), tensor([-0.1070]), tensor([-2.0607]))
(tensor([-1.0013]), tensor([1.5692]), tensor([-0.1939]))

@bddppq
Copy link
Copy Markdown
Contributor

bddppq commented Apr 3, 2020

So if we are going to add a flag to prevent this silently wrong behavior, maybe it should be guarding the casting (but it will be a breaking change, right now the tracer only gives warning).

@wanchaol
Copy link
Copy Markdown
Collaborator Author

wanchaol commented Apr 3, 2020

So if we are going to add a flag to prevent this silently wrong behavior, maybe it should be guarding the casting (but it will be a breaking change, right now the tracer only gives warning).

@bddppq you are right, the case only happens in dynamic control flow, and casting currently gives a warning to the user that might get ignored by user who don't read the doc and understand the internals of tracer. The general issue might not only happening for tensor casting though, i.e. user could check len(list) or len(dict) inside the tracer in the control flow and do whatever they want, but a changed input might be wrong there too.

Had a discussion with @jamesr66a and other folks today. The short term plan is to guard the risk behavior under a strict flag, and throw warnings/errors to user. Right now tensor casting will throw warning, others not. To ensure not BC breaking with existing user code too much, we should probably:

  1. throw warning in tracer when discovering list (input/output) + dict(input), these are thing already supported but no warning yet.
  2. for dict output, we will throw error and explicitly guide the user to use strict=False flag.

Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
@wanchaol wanchaol requested a review from jamesr66a April 8, 2020 22:57
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Apr 9, 2020
Summary:

This add dictionary as a supported output of tracer. it will throw error
to user by default if they want to use it, and only allow this when
strict=False when call `torch.jit.trace`

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: a574e9f
Pull Request resolved: #31860
Copy link
Copy Markdown
Collaborator

@jamesr66a jamesr66a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I had a question about how we're doing ordering for Tensor-keyed dictionaries. I want to make sure that's safe and stable across compilation/machines etc

Comment thread torch/csrc/jit/frontend/tracer.cpp Outdated
TypePtr key_type = dict.keyType();
TypePtr value_type = dict.valueType();

if (!(key_type->cast<StringType>() || key_type->cast<TensorType>()) ||
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda hard to read in the !(A || B) || !C form. Can we do something like:

bool key_type_valid = key_type->cast<StringType>() || key_type->cast<TensorType>();
bool value_type_valid = value_type->cast<TensorType>();

if (!key_type_valid || !value_type_valid) {
  <...>

<< "or dict[Tensor, Tensor] can be a dictionary output of a traced function";
throw std::runtime_error(os.str());
}
const auto order = iterationOrder(dict);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue with your PR, but I looked at the declaration of iterationOrder and I'm not sure what it does :p. It could use some comments or a rename to indicate its function

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterationOrder will return a pair of ivalues that ordered by CompareKeys, in the case that key is Tensor, it's ordered by TensorImpl pointer address.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you (or someone) add that as a comment?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this in #36609

std::iota(perm.begin(), perm.end(), 0);
std::sort(perm.begin(), perm.end(), [&](int64_t i, int64_t j) {
return keys[i]->unique() < keys[j]->unique();
});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on the stability of these unique IDs? Doesn't Value::unique() potentially depend on the order of compilation of values?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is quite stable, by the time we generate the output, the tracer already record the values of the dict keys, and the tracer record the keys based on the order of encountering tensors, so unique() should depend on the order of that, and tracing the same program will always generate the same sequence of unique ids.

Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Apr 10, 2020
Summary:

This add dictionary as a supported output of tracer. it will throw error
to user by default if they want to use it, and only allow this when
strict=False when call `torch.jit.trace`

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 3081ce8
Pull Request resolved: #31860
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Apr 13, 2020
Summary:

This add dictionary as a supported output of tracer. it will throw error
to user by default if they want to use it, and only allow this when
strict=False when call `torch.jit.trace`

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 29a3010
Pull Request resolved: #31860
@wanchaol wanchaol requested a review from jamesr66a April 13, 2020 17:05
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Apr 13, 2020
Summary:

This add dictionary as a supported output of tracer. it will throw error
to user by default if they want to use it, and only allow this when
strict=False when call `torch.jit.trace`

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 026b826
Pull Request resolved: #31860
Summary:

This add dictionary as a supported output of tracer.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D20638438](https://our.internmc.facebook.com/intern/diff/D20638438)

[ghstack-poisoned]
@wanchaol wanchaol merged commit 4d2da2e into gh/wanchaol/75/base Apr 16, 2020
wanchaol added a commit that referenced this pull request Apr 16, 2020
Summary:

This add dictionary as a supported output of tracer. it will throw error
to user by default if they want to use it, and only allow this when
strict=False when call `torch.jit.trace`

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 83af02c
Pull Request resolved: #31860
@wanchaol
Copy link
Copy Markdown
Collaborator Author

something weird happens with ghstack on this PR, replace this with #36696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants