Skip to content

Dict is a reference type#20669

Closed
smessmer wants to merge 6 commits intomasterfrom
export-D15404911
Closed

Dict is a reference type#20669
smessmer wants to merge 6 commits intomasterfrom
export-D15404911

Conversation

@smessmer
Copy link
Contributor

@smessmer smessmer commented May 18, 2019

Stack:
    :white_circle:  #20871 Use c10::List  💛
    :white_circle:  #20870 List  💛
    :black_circle:  #20669 Dict is a reference type  💚

Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.

Differential Revision: D15404911

Differential Revision: D15404911
Differential Version: 82207842
@pytorchbot pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label May 18, 2019
@smessmer smessmer requested review from dzhulgakov, li-roy and zdevito May 20, 2019 21:22
@dzhulgakov
Copy link
Collaborator

So now the Dict behavior is different from List or Tuple (both of which while inheriting from intrusive_ptr_target, are actually Impl's not the reference-like wrappers). They should behave the same.

But then there's the question of how they should behave - as regular C++ types (so you'd pass const Dict& and intrusive_ptr<Dict>) or like magical wrapper like you're proposing here and how at::Tensor is implemented.

I personally don't like the idea of magical wrappers proliferation as it's non-C++ and is pretty confusing. Tensor is kind of an exception, but in other cases, would it be so bad to explicitly write wrapper? (or we could define explicit DictPtr)

@smessmer
Copy link
Contributor Author

IValue stores Dicts as some kind of GenericDict (i.e. Dict<IValue, IValue>) but we want to pass concrete types to the kernel (i.e. Dict<Key, Value>). If Dict was a value type, we would need to copy or move move the contents from Dict<IValue, IValue> to the Dict<Key, Value>. Hopefully avoid the copy and do move, but that invalidates the Dict<IValue, IValue>. This is actually how it works currently (i.e. before this PR).

Unfortunately, this breaks with IValue because there might be multiple IValues holding a pointer to the same Dict<IValue, IValue>, so we cannot invalidate it and have to copy the contents from Dict<IValue, IValue> into Dict<Key, Value>. This is bad. The only way out I see is to make Dict<IValue, IValue> and Dict<Key, Value> share the underlying data and making Dict a reference type.

I'm open to alternative ideas.

@dzhulgakov
Copy link
Collaborator

Thanks for the detailed description, it wasn't clear from the PR description originally!

In that case indeed, whatever C++ type we pass into the kernel has to be "reference semantics" in C++. In that case - how about we just rename Dict in your PR into DictRef? It'd make it explicit to user that it's a reference and thus behaves differently from const vector&. I'm just worried that Dict x = y; is too different from regular C++ semantics. (note that Tensor is not a C++ class so it's ok to follow Python, but for dict, it's easy to be confused with unordered_map).

@smessmer
Copy link
Contributor Author

smessmer commented May 22, 2019

Dict<Key,Value> can also be returned from kernels. ...Ref types in my mind have the connotation of being non-owning, like ArrayRef. Returning an ArrayRef from a function would be deadly for example. DictPtr might be better, but that then somehow makes me think about nullability, which also isn't true for Dict. After doing DictPtr<Key, Value> var;, I expect to have to point it to some Dict and initialize the pointer before I can use it. Opinions?

@smessmer
Copy link
Contributor Author

smessmer commented May 22, 2019

@dzhulgakov proposed to call it DictPtr and remove the default constructor, adding a make_dict function to create dicts. This sounds reasonable to me. It's a bit more tedious to use, but it's much clearer what happens and harder to get wrong for users. I'll do that.

Differential Revision: D15404911
Differential Version: 82764811
@pytorchbot pytorchbot added oncall: jit Add this issue/PR to JIT oncall triage queue module: cpp Related to C++ API module: pybind Related to our Python bindings / interactions with other Python libraries labels May 22, 2019
@smessmer
Copy link
Contributor Author

changes done. Please re-review.

Differential Revision: D15404911
Differential Version: 82766851
Copy link
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation seems good.

But this will give us at::ArrayRef<...> and c10::DictPtr. What is the distinction between a Ref and a Ptr?

@smessmer
Copy link
Contributor Author

@zdevito ArrayRef is non-owning, DictPtr is owning similar to shared_ptr or Tensor.

smessmer added 3 commits May 22, 2019 15:32
Differential Revision: D15404911
Differential Version: 82786237
Differential Revision: D15404911
Differential Version: 82910251
Differential Revision: D15404911
Differential Version: 82910671
This was referenced May 23, 2019
zdevito pushed a commit to zdevito/ATen that referenced this pull request May 23, 2019
Summary:
Pull Request resolved: pytorch/pytorch#20669

Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.

Reviewed By: dzhulgakov

Differential Revision: D15404911

fbshipit-source-id: dc990d3eb7cae044b74dd0253f8b704dde6a6c86
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in d5b7138.

@ezyang ezyang deleted the export-D15404911 branch May 30, 2019 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpp Related to C++ API module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants