Skip to content

Introducing the StableIValue representation of list :D#165953

Closed
janeyx99 wants to merge 11 commits intogh/janeyx99/317/basefrom
gh/janeyx99/317/head
Closed

Introducing the StableIValue representation of list :D#165953
janeyx99 wants to merge 11 commits intogh/janeyx99/317/basefrom
gh/janeyx99/317/head

Conversation

@janeyx99
Copy link
Copy Markdown
Contributor

@janeyx99 janeyx99 commented Oct 21, 2025

Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Oct 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165953

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit c01ee49 with merge base d7e2d0a (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

janeyx99 added a commit that referenced this pull request Oct 21, 2025
ghstack-source-id: 9b72dab
Pull Request resolved: #165953
janeyx99 added a commit that referenced this pull request Oct 24, 2025
ghstack-source-id: 55d2891
Pull Request resolved: #165953
janeyx99 added a commit that referenced this pull request Oct 29, 2025
ghstack-source-id: 6b5192b
Pull Request resolved: #165953
janeyx99 added a commit that referenced this pull request Nov 4, 2025
@janeyx99 janeyx99 changed the title [draft] look away this commit only has a sample test Introducing the StableIValue representation of list :D Nov 4, 2025
janeyx99 added a commit that referenced this pull request Nov 4, 2025
@janeyx99 janeyx99 marked this pull request as ready for review November 4, 2025 23:45
janeyx99 added a commit that referenced this pull request Nov 4, 2025
@janeyx99 janeyx99 requested review from albanD and swolchok November 4, 2025 23:48
@janeyx99 janeyx99 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2025
Copy link
Copy Markdown
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a lot of confusion around the ownership model before I went back and reviewed how ownership of tensors works and confirmed that (I think) StableIValue owns lists to the same extent that it owns tensors -- it takes care of its own cleanup. I think it's probably OK but might be something to think about.

Comment thread test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp Outdated
Comment thread torch/csrc/stable/c/shim.h Outdated
Comment thread torch/csrc/shim_common.cpp Outdated
Comment thread torch/csrc/stable/c/shim.h Outdated
Comment thread torch/csrc/stable/c/shim.h Outdated
StableListHandle* new_handle);

AOTI_TORCH_EXPORT AOTITorchError
torch_list_size(StableListHandle list_handle, size_t* size);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch_list_get_size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the semantic where if it's blah->api(), I'm naming it torch_blah_api, so this one stays.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on one hand I understand the desire to keep error_t foo(inp, *out) semantic, but on the other hand, the only it it could fail if list_handle is nullptr, isn't it? Should we indeed for the sake of speed change semantic a bit to size_t torch_list_get_size(StableListHandle)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards no for now, because it would mean we should guarantee in the long term that the function should never fail, and that's not worth the small speed bump to me.

Comment thread torch/csrc/stable/c/shim.h Outdated
Comment thread torch/csrc/stable/stableivalue_conversions.h Outdated
Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 5, 2025
Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 5, 2025
StableListHandle* new_handle);

AOTI_TORCH_EXPORT AOTITorchError
torch_list_size(StableListHandle list_handle, size_t* size);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on one hand I understand the desire to keep error_t foo(inp, *out) semantic, but on the other hand, the only it it could fail if list_handle is nullptr, isn't it? Should we indeed for the sake of speed change semantic a bit to size_t torch_list_get_size(StableListHandle)?

Comment thread torch/csrc/stable/c/shim.h Outdated
Comment on lines +51 to +54
AOTI_TORCH_EXPORT AOTITorchError torch_list_at(
StableListHandle list_handle,
size_t index,
StableIValue* element);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: if at::List::operator[] doesn't do boundary checks, this API shouldn't do them either

Comment on lines +202 to +203
[[maybe_unused]] uint64_t extension_build_version,
[[maybe_unused]] bool is_internal) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what are the purpose of those?

Copy link
Copy Markdown
Contributor

@mikaylagawarecki mikaylagawarecki Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are scaffolding so that in the unlikely case that the memory layout of a struct passed as a stableivalue changes between two releases, we can appropriately modify from and to (and from_ivalue + to_ivalue that call into these) to handle this. See https://github.com/pytorch/pytorch/pull/165284/files#diff-e260bc7cf1dcf33d3ce5fa484f918553929e57bc8ae76d398d4b1d884fe8654eR420-R437 for an example use

(I'll have better developer documentation to come, there are also tests that ensure the memory layout of the relevant structs don't change coming)

TORCH_ERROR_CODE_CHECK(torch_list_size(list_handle, &size));
std::vector<T> result;
result.reserve(size);
for (size_t i = 0; i < size; i++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use irange for stable ABI?

Suggested change
for (size_t i = 0; i < size; i++) {
for (const auto i : c10::irange(size)) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to move irange to headeronly first, but otherwise we can

}

AOTI_TORCH_EXPORT AOTITorchError
torch_list_size(StableListHandle list_handle, size_t* size) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to check first that list_handle and size are not nullptrs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehhhh these are meant to be used mostly internal so I'm not convinced we need to add checks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then why do you want to wrap this code with AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE? Semantically this code will never throw, would it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's harmless if size() gets inlined and so I am weakly in favor of it because it makes everything visually uniform (lack of AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE is known-bad). https://godbolt.org/z/zbqbhez87

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point; I have just learned that dereferencing a bad pointer will lead to crash/UB but not throw an Exception. Regardless, the way I'm thinking of adding these APIs is to be defensive against future code modifications. I would feel more comfortable removing the macro once we have a test to ensure that all shim APIs never throw Exceptions.

I indeed will land the PR as is but this is a good consideration. Whatever policy we adopt I would want to apply across all APIs --> if we check for nullptr here we should check in the Tensor shims too.

Comment thread test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp Outdated
Comment thread test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp Outdated
Comment thread test/cpp_extensions/libtorch_agnostic_extension/libtorch_agnostic/csrc/kernel.cpp Outdated
Comment thread torch/csrc/stable/c/shim.h Outdated
Comment on lines +56 to +57
AOTI_TORCH_EXPORT AOTITorchError
torch_list_push_back(StableListHandle list_handle, StableIValue element);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it permissible to be wrong about the reserved size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reserving size is just a performance optimization, it doesn't actually populate the tensor. So after reserving the size is still 0.

Comment thread torch/csrc/stable/c/shim.h
Comment thread torch/csrc/stable/stableivalue_conversions.h Outdated
Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!




[ghstack-poisoned]
Copy link
Copy Markdown
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking pretty good

Comment on lines +207 to +209
for (const auto& elem : val) {
TORCH_ERROR_CODE_CHECK(torch_list_push_back(new_list_handle, from(elem)));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're going to leak memory for the list and for previously-pushed elements if this TORCH_ERROR_CODE_CHECK ever fails, courtesy of StableIValue not having a destructor

result.reserve(size);
for (size_t i = 0; i < size; i++) {
StableIValue element;
TORCH_ERROR_CODE_CHECK(torch_list_get_item(list_handle, i, &element));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto leaky

Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!




[ghstack-poisoned]
torch_list_push_back(new_list_handle, from(elem)));
}
return from(new_list_handle);
} catch (const std::runtime_error& e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need to catch all flavors of exception? why is it ok to leak on things that aren't runtime_error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that TORCH_ERROR_CODE_CHECK can throw is std::runtime_error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, nothing should get thrown through ABI boundary

// clean up memory if an error was thrown
TORCH_ERROR_CODE_CHECK(torch_delete_list(new_list_handle));
}
throw e;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop the e, just throw;. it's cleaner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait does that do the same thing? Does that rethrow the error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh it issss

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally rethrowing can also do stuff like cause backtraces to be preserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw e; can cause object slicing, while throw preserve type of original exception?

Comment on lines +410 to +413
} catch (const std::runtime_error& e) {
// clean up memory if an exception is thrown, and rethrow
TORCH_ERROR_CODE_CHECK(torch_delete_list(list_handle));
throw e;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!




[ghstack-poisoned]
@swolchok
Copy link
Copy Markdown
Contributor

swolchok commented Nov 7, 2025

LGTM. I think @malfet has a couple unaddressed comments?

}

AOTI_TORCH_EXPORT AOTITorchError
torch_list_size(StableListHandle list_handle, size_t* size) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then why do you want to wrap this code with AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE? Semantically this code will never throw, would it?

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #167126

pytorchmergebot pushed a commit that referenced this pull request Nov 7, 2025
Khanaksahu pushed a commit to Khanaksahu/pytorch that referenced this pull request Nov 17, 2025
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
Some important notes:
a) Just like IValues steal the ownership of ArrayRefs and any std::vectors in order to convert the inner elements into IValues, we do the same thing with StableIValue. This O(N) traverse is ineluctable.
b) As a result, since StableIValues are owning and our contract is that to<T>(StableIValue) transfers ownership, you cannot ever convert from StableIValue to a nonowning HeaderOnlyArrayRef<V>.

We handle memory similar to AtenTensorHandle, but we have a StableListHandle!

Pull Request resolved: pytorch#165953
Approved by: https://github.com/malfet
ghstack dependencies: pytorch#164991, pytorch#165152, pytorch#165153
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
@github-actions github-actions Bot deleted the gh/janeyx99/317/head branch December 8, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants