[1/4][DataPipe] Properly cleanup unclosed files within generator function#89973
[1/4][DataPipe] Properly cleanup unclosed files within generator function#89973ejguan wants to merge 3 commits intogh/ejguan/45/basefrom
Conversation
…tion [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89973
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit d0931c8: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
NivekT
left a comment
There was a problem hiding this comment.
LGTM! It'd be good if we have a unit test to check that things are closed, but not the most essential.
…erator function" There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](https://github.com/pytorch/pytorch/blob/4451eb24e6287dff62ff8a7ec0eda6a6998807b0/torch/utils/data/datapipes/iter/combining.py#L573-L583) to retrieve all remaining data to make sure generator function is fully executed) However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: meta-pytorch/data#865 So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more. [ghstack-poisoned]
…erator function" There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](https://github.com/pytorch/pytorch/blob/4451eb24e6287dff62ff8a7ec0eda6a6998807b0/torch/utils/data/datapipes/iter/combining.py#L573-L583) to retrieve all remaining data to make sure generator function is fully executed) However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: meta-pytorch/data#865 So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more. [ghstack-poisoned]
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](https://github.com/pytorch/pytorch/blob/4451eb24e6287dff62ff8a7ec0eda6a6998807b0/torch/utils/data/datapipes/iter/combining.py#L573-L583) to retrieve all remaining data to make sure generator function is fully executed) However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: #865 So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more. - pytorch/pytorch#89973 - pytorch/vision#6997 - pytorch/pytorch#89974 Pull Request resolved: #910 Reviewed By: wenleix, NivekT Differential Revision: D41633909 Pulled By: ejguan fbshipit-source-id: 5613e131dc8b2962c22d2bc7e3a4bb3039440c48
…tion (pytorch#89973) There is a case that `file.close` is never called because when generator function has never reached to the end. A simple example would be `zip` two datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up by `gc`. So, the line of `file.close` is not executed. (This is the reason that Vitaly has to create this [hack](https://github.com/pytorch/pytorch/blob/4451eb24e6287dff62ff8a7ec0eda6a6998807b0/torch/utils/data/datapipes/iter/combining.py#L573-L583) to retrieve all remaining data to make sure generator function is fully executed) However, this hack introduces another problem where an infinite datapipe would make `zip` never end as it would try to deplete the infinite iterator. See: meta-pytorch/data#865 So, in this PR, I am adding a `try-finally` clause to make sure the `file.close` is always executed during the destruction of `generator` object. Then, we don't need the hack within `zip` any more. Differential Revision: [D41699470](https://our.internmc.facebook.com/intern/diff/D41699470) Pull Request resolved: pytorch#89973 Approved by: https://github.com/NivekT
|
This is the proper fix for meta-pytorch/data#436 that we have been trying to resolve. |
Stack from ghstack:
There is a case that
file.closeis never called because when generator function has never reached to the end. A simple example would beziptwo datepipes with different length. The longer DataPipe would never reach the end of generator and then it will be cleaned up bygc. So, the line offile.closeis not executed. (This is the reason that Vitaly has to create this hack to retrieve all remaining data to make sure generator function is fully executed)However, this hack introduces another problem where an infinite datapipe would make
zipnever end as it would try to deplete the infinite iterator. See: meta-pytorch/data#865So, in this PR, I am adding a
try-finallyclause to make sure thefile.closeis always executed during the destruction ofgeneratorobject. Then, we don't need the hack withinzipany more.Differential Revision: D41699470