[WIP] Adding a test that creates an arbitrarily large .tar file and read from it via HTTP#40
[WIP] Adding a test that creates an arbitrarily large .tar file and read from it via HTTP#40NivekT wants to merge 6 commits intogh/NivekT/7/basefrom
Conversation
…om it via HTTP [ghstack-poisoned]
…ing, minor change to test case" Fixes #42. I plan to add more test in #40 that will test online readers in connection to the various archive readers. Differential Revision: [D31515974](https://our.internmc.facebook.com/intern/diff/D31515974) [ghstack-poisoned]
…ing, minor change to test case" Fixes #42. I plan to add more test in #40 that will test online readers in connection to the various archive readers. Differential Revision: [D31515974](https://our.internmc.facebook.com/intern/diff/D31515974) [ghstack-poisoned]
…change to test case (#51) Summary: Pull Request resolved: #51 Fixes #42. I plan to add more test in #40 that will test online readers in connection to the various archive readers. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31515974 Pulled By: NivekT fbshipit-source-id: 065261aeac5863971d94c4949ed0e6b5df201fa7
… file and read from it via HTTP"
The goal here is to create an arbitrarily large .tar file and read from it via HTTP.
Currently, it seems to work if we cache the file first and then read from it. However, an error is raised when we attempt to read directly from the `HTTPReader` because its stream does not support the operation`seek`:
```
Traceback (most recent call last):
File "/Users/ktse/data/test/test_stream.py", line 66, in <module>
for fname, stream in tar_dp:
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 62, in __iter__
raise e
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 48, in __iter__
tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode=self.mode)
File "/Users/ktse/miniconda3/envs/pytorch/lib/python3.9/tarfile.py", line 1609, in open
saved_pos = fileobj.tell()
io.UnsupportedOperation: seek
```
[ghstack-poisoned]
… file and read from it via HTTP"
The goal here is to create an arbitrarily large .tar file and read from it via HTTP.
Currently, it seems to work if we cache the file first and then read from it. However, an error is raised when we attempt to read directly from the `HTTPReader` because its stream does not support the operation`seek`:
```
Traceback (most recent call last):
File "/Users/ktse/data/test/test_stream.py", line 66, in <module>
for fname, stream in tar_dp:
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 62, in __iter__
raise e
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 48, in __iter__
tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode=self.mode)
File "/Users/ktse/miniconda3/envs/pytorch/lib/python3.9/tarfile.py", line 1609, in open
saved_pos = fileobj.tell()
io.UnsupportedOperation: seek
```
[ghstack-poisoned]
… file and read from it via HTTP"
The goal here is to create an arbitrarily large .tar file and read from it via HTTP.
Currently, it seems to work if we cache the file first and then read from it. However, an error is raised when we attempt to read directly from the `HTTPReader` because its stream does not support the operation`seek`:
```
Traceback (most recent call last):
File "/Users/ktse/data/test/test_stream.py", line 66, in <module>
for fname, stream in tar_dp:
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 62, in __iter__
raise e
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 48, in __iter__
tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode=self.mode)
File "/Users/ktse/miniconda3/envs/pytorch/lib/python3.9/tarfile.py", line 1609, in open
saved_pos = fileobj.tell()
io.UnsupportedOperation: seek
```
[ghstack-poisoned]
… file and read from it via HTTP"
The goal here is to create an arbitrarily large .tar file and read from it via HTTP.
Currently, it seems to work if we cache the file first and then read from it. However, an error is raised when we attempt to read directly from the `HTTPReader` because its stream does not support the operation`seek`:
```
Traceback (most recent call last):
File "/Users/ktse/data/test/test_stream.py", line 66, in <module>
for fname, stream in tar_dp:
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 62, in __iter__
raise e
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 48, in __iter__
tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode=self.mode)
File "/Users/ktse/miniconda3/envs/pytorch/lib/python3.9/tarfile.py", line 1609, in open
saved_pos = fileobj.tell()
io.UnsupportedOperation: seek
```
[ghstack-poisoned]
| httpd.serve_forever() | ||
| while True: | ||
| if self.stop_server: # TODO: This is not closing | ||
| httpd.server_close() |
There was a problem hiding this comment.
You are not leaving while loop after server_close
There was a problem hiding this comment.
I don't think self.stop_server is being set to True either. Perhaps because it is on a different thread?
There was a problem hiding this comment.
You should use thread event to terminate the loop.
ejguan
left a comment
There was a problem hiding this comment.
A few comments below. And, please add underscore to all helper methods in the TestCase .
| httpd.serve_forever() | ||
| while True: | ||
| if self.stop_server: # TODO: This is not closing | ||
| httpd.server_close() |
There was a problem hiding this comment.
You should use thread event to terminate the loop.
test/test_stream.py
Outdated
| self.temp_dir_path = self.temp_dir.name | ||
| self.port = 8006 | ||
| self.stop_server = False | ||
| self.server_thread = threading.Thread( | ||
| target=self.running_server | ||
| ) # TestStream.start_test_server(self.temp_dir_path, self.port) |
There was a problem hiding this comment.
This is not going to work. You need to pass these variables to thread during construction.
There was a problem hiding this comment.
It seems to work for now since running_server has access to self variable. But I will keep an eye out to see if there is any bug related to this.
There was a problem hiding this comment.
But I think it would be better to refactor it and take those as arguments as you suggested.
test/test_stream.py
Outdated
|
|
||
| def tearDown(self) -> None: | ||
| print("Tear down is running...") | ||
| self.stop_server = True |
There was a problem hiding this comment.
Trigger threading event here.
|
|
||
|
|
||
| class TestStream(expecttest.TestCase): | ||
| def setUp(self) -> None: |
There was a problem hiding this comment.
Another thing I want to mention that if setUp and tearDown are shared for all test methods in the future. They should be converted to setUpClass and tearDownClass
There was a problem hiding this comment.
I mean if you don't want these setup methods invoked for every single test run.
… file and read from it via HTTP"
The goal here is to create an arbitrarily large .tar file and read from it via HTTP.
Currently, it seems to work if we cache the file first and then read from it. However, an error is raised when we attempt to read directly from the `HTTPReader` because its stream does not support the operation`seek`:
```
Traceback (most recent call last):
File "/Users/ktse/data/test/test_stream.py", line 66, in <module>
for fname, stream in tar_dp:
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 62, in __iter__
raise e
File "/Users/ktse/data/torchdata/datapipes/iter/util/tararchivereader.py", line 48, in __iter__
tar = tarfile.open(fileobj=cast(Optional[IO[bytes]], data_stream), mode=self.mode)
File "/Users/ktse/miniconda3/envs/pytorch/lib/python3.9/tarfile.py", line 1609, in open
saved_pos = fileobj.tell()
io.UnsupportedOperation: seek
```
[ghstack-poisoned]
|
No longer need it since we have other benchmark and remote testing |
Stack from ghstack:
The goal here is to create an arbitrarily large .tar file and read from it via HTTP.
Currently, it seems to work if we cache the file first and then read from it. However, an error is raised when we attempt to read directly from the
HTTPReaderbecause its stream does not support the operationseek: