Introducing the StableIValue representation of list :D#165953
Introducing the StableIValue representation of list :D#165953janeyx99 wants to merge 11 commits intogh/janeyx99/317/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165953
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit c01ee49 with merge base d7e2d0a ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
swolchok
left a comment
There was a problem hiding this comment.
I had a lot of confusion around the ownership model before I went back and reviewed how ownership of tensors works and confirmed that (I think) StableIValue owns lists to the same extent that it owns tensors -- it takes care of its own cleanup. I think it's probably OK but might be something to think about.
| StableListHandle* new_handle); | ||
|
|
||
| AOTI_TORCH_EXPORT AOTITorchError | ||
| torch_list_size(StableListHandle list_handle, size_t* size); |
There was a problem hiding this comment.
I'm following the semantic where if it's blah->api(), I'm naming it torch_blah_api, so this one stays.
There was a problem hiding this comment.
Hmm, on one hand I understand the desire to keep error_t foo(inp, *out) semantic, but on the other hand, the only it it could fail if list_handle is nullptr, isn't it? Should we indeed for the sake of speed change semantic a bit to size_t torch_list_get_size(StableListHandle)?
There was a problem hiding this comment.
I'm leaning towards no for now, because it would mean we should guarantee in the long term that the function should never fail, and that's not worth the small speed bump to me.
Some important notes: a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable. b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>. We handle memory similar to AtenTensorHandle, but we have a StableListHandle! [ghstack-poisoned]
Some important notes: a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable. b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>. We handle memory similar to AtenTensorHandle, but we have a StableListHandle! [ghstack-poisoned]
| StableListHandle* new_handle); | ||
|
|
||
| AOTI_TORCH_EXPORT AOTITorchError | ||
| torch_list_size(StableListHandle list_handle, size_t* size); |
There was a problem hiding this comment.
Hmm, on one hand I understand the desire to keep error_t foo(inp, *out) semantic, but on the other hand, the only it it could fail if list_handle is nullptr, isn't it? Should we indeed for the sake of speed change semantic a bit to size_t torch_list_get_size(StableListHandle)?
| AOTI_TORCH_EXPORT AOTITorchError torch_list_at( | ||
| StableListHandle list_handle, | ||
| size_t index, | ||
| StableIValue* element); |
There was a problem hiding this comment.
Same as above: if at::List::operator[] doesn't do boundary checks, this API shouldn't do them either
| [[maybe_unused]] uint64_t extension_build_version, | ||
| [[maybe_unused]] bool is_internal) { |
There was a problem hiding this comment.
Just curious, what are the purpose of those?
There was a problem hiding this comment.
These are scaffolding so that in the unlikely case that the memory layout of a struct passed as a stableivalue changes between two releases, we can appropriately modify from and to (and from_ivalue + to_ivalue that call into these) to handle this. See https://github.com/pytorch/pytorch/pull/165284/files#diff-e260bc7cf1dcf33d3ce5fa484f918553929e57bc8ae76d398d4b1d884fe8654eR420-R437 for an example use
(I'll have better developer documentation to come, there are also tests that ensure the memory layout of the relevant structs don't change coming)
| TORCH_ERROR_CODE_CHECK(torch_list_size(list_handle, &size)); | ||
| std::vector<T> result; | ||
| result.reserve(size); | ||
| for (size_t i = 0; i < size; i++) { |
There was a problem hiding this comment.
Why can't we use irange for stable ABI?
| for (size_t i = 0; i < size; i++) { | |
| for (const auto i : c10::irange(size)) { |
There was a problem hiding this comment.
We'd have to move irange to headeronly first, but otherwise we can
| } | ||
|
|
||
| AOTI_TORCH_EXPORT AOTITorchError | ||
| torch_list_size(StableListHandle list_handle, size_t* size) { |
There was a problem hiding this comment.
Do you want to check first that list_handle and size are not nullptrs?
There was a problem hiding this comment.
Ehhhh these are meant to be used mostly internal so I'm not convinced we need to add checks
There was a problem hiding this comment.
OK, then why do you want to wrap this code with AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE? Semantically this code will never throw, would it?
There was a problem hiding this comment.
FWIW, it's harmless if size() gets inlined and so I am weakly in favor of it because it makes everything visually uniform (lack of AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE is known-bad). https://godbolt.org/z/zbqbhez87
There was a problem hiding this comment.
I see your point; I have just learned that dereferencing a bad pointer will lead to crash/UB but not throw an Exception. Regardless, the way I'm thinking of adding these APIs is to be defensive against future code modifications. I would feel more comfortable removing the macro once we have a test to ensure that all shim APIs never throw Exceptions.
I indeed will land the PR as is but this is a good consideration. Whatever policy we adopt I would want to apply across all APIs --> if we check for nullptr here we should check in the Tensor shims too.
| AOTI_TORCH_EXPORT AOTITorchError | ||
| torch_list_push_back(StableListHandle list_handle, StableIValue element); |
There was a problem hiding this comment.
is it permissible to be wrong about the reserved size?
There was a problem hiding this comment.
I think reserving size is just a performance optimization, it doesn't actually populate the tensor. So after reserving the size is still 0.
Some important notes: a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable. b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>. We handle memory similar to AtenTensorHandle, but we have a StableListHandle! [ghstack-poisoned]
| for (const auto& elem : val) { | ||
| TORCH_ERROR_CODE_CHECK(torch_list_push_back(new_list_handle, from(elem))); | ||
| } |
There was a problem hiding this comment.
you're going to leak memory for the list and for previously-pushed elements if this TORCH_ERROR_CODE_CHECK ever fails, courtesy of StableIValue not having a destructor
| result.reserve(size); | ||
| for (size_t i = 0; i < size; i++) { | ||
| StableIValue element; | ||
| TORCH_ERROR_CODE_CHECK(torch_list_get_item(list_handle, i, &element)); |
Some important notes: a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable. b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>. We handle memory similar to AtenTensorHandle, but we have a StableListHandle! [ghstack-poisoned]
| torch_list_push_back(new_list_handle, from(elem))); | ||
| } | ||
| return from(new_list_handle); | ||
| } catch (const std::runtime_error& e) { |
There was a problem hiding this comment.
don't you need to catch all flavors of exception? why is it ok to leak on things that aren't runtime_error?
There was a problem hiding this comment.
The only thing that TORCH_ERROR_CODE_CHECK can throw is std::runtime_error
There was a problem hiding this comment.
fair enough, nothing should get thrown through ABI boundary
| // clean up memory if an error was thrown | ||
| TORCH_ERROR_CODE_CHECK(torch_delete_list(new_list_handle)); | ||
| } | ||
| throw e; |
There was a problem hiding this comment.
drop the e, just throw;. it's cleaner.
There was a problem hiding this comment.
Wait does that do the same thing? Does that rethrow the error?
There was a problem hiding this comment.
generally rethrowing can also do stuff like cause backtraces to be preserved.
There was a problem hiding this comment.
throw e; can cause object slicing, while throw preserve type of original exception?
| } catch (const std::runtime_error& e) { | ||
| // clean up memory if an exception is thrown, and rethrow | ||
| TORCH_ERROR_CODE_CHECK(torch_delete_list(list_handle)); | ||
| throw e; |
Some important notes: a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable. b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>. We handle memory similar to AtenTensorHandle, but we have a StableListHandle! [ghstack-poisoned]
|
LGTM. I think @malfet has a couple unaddressed comments? |
| } | ||
|
|
||
| AOTI_TORCH_EXPORT AOTITorchError | ||
| torch_list_size(StableListHandle list_handle, size_t* size) { |
There was a problem hiding this comment.
OK, then why do you want to wrap this code with AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE? Semantically this code will never throw, would it?
|
Starting merge as part of PR stack under #167126 |
ghstack-source-id: 36f837f Pull Request resolved: pytorch/pytorch#165953
Some important notes: a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable. b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>. We handle memory similar to AtenTensorHandle, but we have a StableListHandle! Pull Request resolved: pytorch#165953 Approved by: https://github.com/malfet ghstack dependencies: pytorch#164991, pytorch#165152, pytorch#165153
) Pull Request resolved: pytorch#167126 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#164991, pytorch#165152, pytorch#165153, pytorch#165953
Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef.
We handle memory similar to AtenTensorHandle, but we have a StableListHandle!
Stack from ghstack (oldest at bottom):