[DCP][HF] [ez]Change where sharded tensors are saved#158069
[DCP][HF] [ez]Change where sharded tensors are saved#158069ankitageorge wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158069
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit 66c81b9 with merge base 7d4228d ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D78108144 |
d23ae54 to
f920f12
Compare
f920f12 to
b6878f3
Compare
|
This pull request was exported from Phabricator. Differential Revision: D78108144 |
b6878f3 to
9abbc47
Compare
|
This pull request was exported from Phabricator. Differential Revision: D78108144 |
9abbc47 to
b7e21a4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D78108144 |
b7e21a4 to
26eccbf
Compare
Summary: Previously was saving sharded tensors to same directory as full tensors. But am realizing this doesn't make sense because on load(), you would be loading for a directory which contains both, with no way to distinguish them, so they should be in separate folders. Test Plan: ensure existing tests pass Rollback Plan: Differential Revision: D78108144
26eccbf to
66c81b9
Compare
|
This pull request was exported from Phabricator. Differential Revision: D78108144 |
|
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 4 checks: pull / linux-jammy-py3.9-gcc11-no-ops / build, pull / linux-jammy-cuda12.8-py3.10-gcc11-sm89 / build, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable), Lint / lintrunner-clang / linux-job Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -c nosignal -m "Didn't remove reference to Error: |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 627ba41. Reverted #158069 on behalf of https://github.com/jithunnair-amd due to Didn't remove reference to `consolidated_output_path` in test_hf_safetensor_e2e.py; CUDA runs do not surface issue because safetensors is not installed and the test silently passes ([comment](#158069 (comment)))
|
@ankitageorge your PR has been successfully reverted. |
#158069 removed the consolidated output path argument without updating the test. Reported by a user here #156705 (comment). Adding back the logic from the original PR #158069 and fixing the test. Pull Request resolved: #158685 Approved by: https://github.com/teja-rao
Summary: Previously was saving sharded tensors to same directory as full tensors. But am realizing this doesn't make sense because on load(), you would be loading for a directory which contains both, with no way to distinguish them, so they should be in separate folders.
Test Plan:
ensure existing tests pass
Rollback Plan:
Differential Revision: D78108144
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta