Skip to content

[schema_upgrader] add C++ upgrader for json based upgrading#156761

Closed
ydwu4 wants to merge 10 commits intogh/ydwu4/267/basefrom
gh/ydwu4/267/head
Closed

[schema_upgrader] add C++ upgrader for json based upgrading#156761
ydwu4 wants to merge 10 commits intogh/ydwu4/267/basefrom
gh/ydwu4/267/head

Conversation

@ydwu4
Copy link
Contributor

@ydwu4 ydwu4 commented Jun 24, 2025

Stack from ghstack (oldest at bottom):

Differential Revision: D77459912

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156761

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit f466c82 with merge base 3ee75b7 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ydwu4 added a commit that referenced this pull request Jun 24, 2025
@ydwu4 ydwu4 added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jun 24, 2025
ydwu4 added a commit that referenced this pull request Jun 25, 2025
@ydwu4 ydwu4 requested review from angelayi and zhxchen17 June 25, 2025 17:16

// NOTE: The following version_0 and version_1 upgraders are for testing
// purposes only. They demonstrate the upgrader system functionality and are
// used in test/export/test_upgrader.py.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be put inside of a testing folder? you could follow what I did to test AOTI custom ops and just added a cpp file in the testing directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I put the test upgraders in a seperate file, add a python binding for it and the python test can dynamically register/deregister it. I don't want to add a legit C++ test, the build system is complicated lol

return keypath < other.keypath;
}

static void registerUpgrader(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this function isn't in the .h? it seems like this function is the most user facing so we should add proper docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I put it into the interface now. The initial idea is all regsiteration can be just done inside this file but it doesn't hurt to add some flexibility.

}

static void registerUpgrader(
int version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add docstring that version is the version to upgrade from?


static void registerUpgrader(
int version,
const std::vector<std::string>& keypath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we just passed in a string keypath and split it with .? ex. instead of passing in {"graph_module", "graph", "nodes"} we can just pass in "graph_module.graph.nodes"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a separate overload for this purpose. My initial thought is that we don't want to hardcode some deliminator into the key path, which can be troublesome somtimes (e.g. the nn_module_stack deliminator).

int current_version = current_artifact["schema_version"]["major"];

// Iteratively apply upgraders until no more are available
while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also stop if current_version == the current highest schema version? Maybe that's a little hard to find, since it's in python...

just wondering what if there aren't any more upgraders, but current_version hasn't reached the version consumable by the deserializer yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could query the schema version and pass it to the upgrader function.

}

// Validate that we reached the target version if requested
if (validate_target && current_version != target_version) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we always want to validate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes, it could be easier to use not provide a target_version e.g. in testing lol.

}

nlohmann::json upgrade(const nlohmann::json& artifact) {
return upgradeImpl(artifact, std::numeric_limits<int>::max(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case would we want to use max? I feel like we would just always want the existing schema version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier for testing. Could change to always upgrade to a target version though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People might not want to always upgrade to the latest version e.g. for debug reasons maybe? It doesn't hurt to provide some flexibility i feel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm but are you using this anywhere? all your test cases are specifying the version


// Query the current Python schema version as target
py::module_ schema_module =
py::module_::import("torch._export.serde.schema");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could add the schema version to generated_serialization_types.h so that we can query the schema version directly in the upgrade cpp function rather than here. cc @zhxchen17

can leave as a TODO for now

Copy link
Contributor Author

@ydwu4 ydwu4 Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it's inside the exported program class, we would need to deserialize ExportedProgram first in order to access the field but we cannot because we need to first do upgrade. Either we could have a top-level class that wraps up ExportedProgram and the version and make sure the schema version can be accessed all the time or we store it somewhere else.

edit: oh, you mean we expose the schema_version in generated_serialization_types.h as a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: oh, you mean we expose the schema_version in generated_serialization_types.h as a constant?

yup

ydwu4 added a commit that referenced this pull request Jun 27, 2025
@@ -0,0 +1,89 @@
#include <torch/csrc/export/example_upgraders.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wish this file was in test/ but I guess if the build system is too annoying to figure out it's ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, it's indeed annoying

}

nlohmann::json upgrade(const nlohmann::json& artifact) {
return upgradeImpl(artifact, std::numeric_limits<int>::max(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm but are you using this anywhere? all your test cases are specifying the version

Copy link
Contributor

@angelayi angelayi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

letsgoo

you might want to import to internal before landing, just to make sure none of the cpp changes will break

@ydwu4
Copy link
Contributor Author

ydwu4 commented Jun 27, 2025

@ydwu4 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ydwu4
Copy link
Contributor Author

ydwu4 commented Jun 27, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ydwu4
Copy link
Contributor Author

ydwu4 commented Jun 28, 2025

@pytorchbot revert -m "break linter test, which doesn't show up in the pr" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jun 28, 2025
…156761)"

This reverts commit 61712e6.

Reverted #156761 on behalf of https://github.com/ydwu4 due to break linter test, which doesn't show up in the pr ([comment](#156761 (comment)))
@pytorchmergebot
Copy link
Collaborator

@ydwu4 your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jun 28, 2025
ydwu4 added a commit that referenced this pull request Jun 28, 2025
@ydwu4
Copy link
Contributor Author

ydwu4 commented Jun 28, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/ydwu4/267/head branch July 29, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants