Use ctypes to serialize raw content for tensors. by muchulee8 · Pull Request #108287 · pytorch/pytorch

muchulee8 · 2023-08-30T21:48:35Z

Summary:
There's a deadlock in current storage's implementation if the size of tensor is too large. Use ctypes to do serialization.

Test Plan:
python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only MT5ForConditionalGeneration

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @aakhundov

pytorch-bot · 2023-08-30T21:48:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108287

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d93c77d with merge base b535ed2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

desertfire

Approve to unblock, but not sure what the underlying root cause was, cc @ezyang @eellison

torch/_inductor/codecache.py

desertfire · 2023-08-30T23:56:48Z

test/inductor/test_aot_inductor.py

        actual = AOTInductorModelRunner.run(model, example_inputs, expected)
        self.assertTrue(same(actual, expected))

+    @requires_cpp_extension()


I recently changed this file. You will need to remove this line.

desertfire · 2023-08-31T00:00:45Z

cc @Chillee as this week's oncall. Last nightly run has shown timeout because of this issue, https://github.com/pytorch/pytorch/actions/runs/6021619982/job/16335510610. I prefer to merge this PR as a forward fix.

Summary: There's a deadlock in current storage's implementation if the size of tensor is too large. Use ctypes to do serialization. Test Plan: python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only MT5ForConditionalGeneration Reviewers: Subscribers: Tasks: Tags:

malfet

It does not feel like a deadlock, but rather than bytes() will create a copy of a tensor, while ctypes creates an alias to the same region. Also, wouldn't it be better, if untyped_storage were serializable on its own?

muchulee8 · 2023-08-31T04:23:30Z

@pytorchbot merge

pytorchmergebot · 2023-08-31T04:27:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

desertfire · 2023-08-31T14:15:31Z

@pytorchbot revert -m "Internal test failure from #107718. Revert this one first and then revert 107718." -c nosignal

pytorchmergebot · 2023-08-31T14:16:58Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-08-31T14:17:09Z

@muchulee8 your PR has been successfully reverted.

This reverts commit 43f28be. Reverted #108287 on behalf of https://github.com/desertfire due to Internal test failure from #107718. Revert this one first and then revert 107718. ([comment](#108287 (comment)))

desertfire · 2023-08-31T14:22:47Z

@malfet @huydhn , what is the right way to revert the base PR (#107718) from both fbcode and github? There is this auto-generated D48866377 for fbcode reverting, but should I just revert the github PR and wait for it to sync? Is there a faster way?

malfet · 2023-08-31T14:24:03Z

@desertfire unland internal diff, it should auto revert PR, doing it now...

muchulee8 requested a review from desertfire August 30, 2023 21:48

github-actions bot added module: inductor ciflow/inductor labels Aug 30, 2023

desertfire approved these changes Aug 30, 2023

View reviewed changes

muchulee8 requested a review from Chillee August 31, 2023 01:04

muchulee8 requested review from a team, albanD, jbschlosser and mikaylagawarecki as code owners August 31, 2023 01:08

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Aug 31, 2023

pytorch-bot bot added the release notes: releng release notes category label Aug 31, 2023

muchulee8 added 2 commits August 30, 2023 18:10

Add comments

d93c77d

muchulee8 force-pushed the mlee8/aot_large_weight branch from a415f3e to d93c77d Compare August 31, 2023 01:11

muchulee8 removed request for a team, albanD, jbschlosser and mikaylagawarecki August 31, 2023 01:11

malfet approved these changes Aug 31, 2023

View reviewed changes

malfet reviewed Aug 31, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 31, 2023

pytorchmergebot added the merging label Aug 31, 2023

pytorchmergebot added Merged and removed merging labels Aug 31, 2023

pytorchmergebot closed this in 43f28be Aug 31, 2023

pytorchmergebot added the Reverted label Aug 31, 2023

github-actions bot deleted the mlee8/aot_large_weight branch March 1, 2025 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ctypes to serialize raw content for tensors.#108287

Use ctypes to serialize raw content for tensors.#108287
muchulee8 wants to merge 2 commits intomainfrom
mlee8/aot_large_weight

muchulee8 commented Aug 30, 2023 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 30, 2023 •

edited

Loading

Uh oh!

desertfire left a comment

Uh oh!

Uh oh!

desertfire Aug 30, 2023

Uh oh!

desertfire commented Aug 31, 2023

Uh oh!

malfet left a comment •

edited

Loading

Uh oh!

muchulee8 commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Uh oh!

desertfire commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Uh oh!

desertfire commented Aug 31, 2023

Uh oh!

malfet commented Aug 31, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

muchulee8 commented Aug 30, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108287

✅ No Failures

Uh oh!

desertfire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

desertfire Aug 30, 2023

Choose a reason for hiding this comment

Uh oh!

desertfire commented Aug 31, 2023

Uh oh!

malfet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muchulee8 commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Merge started

Uh oh!

desertfire commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Uh oh!

desertfire commented Aug 31, 2023

Uh oh!

malfet commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

muchulee8 commented Aug 30, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 30, 2023 •

edited

Loading

malfet left a comment •

edited

Loading

malfet commented Aug 31, 2023 •

edited

Loading