I'm looking at using Streams for concurrency, but am confused by how they operate. This issue describes some simple experiments and asks for help. If there is a better place to ask this sort of question please let me know. I would be happy to move it.
Looking at the examples and documentation I believe that code placed within a stream context manager executes concurrently to code on other streams. I tried to test this recently and was unable to get speedup.
map_reduce.py example
First I tried running and modifying the map_reduce.py example in cupy/examples/streams. I ran it normally, and also using a single stream with the following diff.
diff --git a/examples/stream/map_reduce.py b/examples/stream/map_reduce.py
index 365b67c..de59c7c 100644
--- a/examples/stream/map_reduce.py
+++ b/examples/stream/map_reduce.py
@@ -11,8 +11,9 @@ zs = []
map_streams = []
stop_events = []
reduce_stream = cupy.cuda.stream.Stream()
-for i in range(n):
- map_streams.append(cupy.cuda.stream.Stream())
+# for i in range(n):
+# map_streams.append(cupy.cuda.stream.Stream())
+map_streams = [cupy.cuda.stream.Stream()] * n # use one stream
start_time = time.time()
I found that there was no difference in performance.
With threads
I was also surprised that calling code within a context manager allowed cupy to run it concurrently. I thought that maybe it was necessary to call this code in another CPU thread to allow many concurrent calls down to the GPU.
I tried a few approaches in this notebook and was unable to get a speedup: https://gist.github.com/mrocklin/d3b70cea6a555ae2387556e4f0808ac1
I suspect that I am doing something incorrectly here. Perhaps I have not configured my environment correctly? I apologize for my ignorance here. Any help would be very welcome.
I'm looking at using Streams for concurrency, but am confused by how they operate. This issue describes some simple experiments and asks for help. If there is a better place to ask this sort of question please let me know. I would be happy to move it.
Looking at the examples and documentation I believe that code placed within a stream context manager executes concurrently to code on other streams. I tried to test this recently and was unable to get speedup.
map_reduce.py example
First I tried running and modifying the map_reduce.py example in
cupy/examples/streams. I ran it normally, and also using a single stream with the following diff.I found that there was no difference in performance.
With threads
I was also surprised that calling code within a context manager allowed cupy to run it concurrently. I thought that maybe it was necessary to call this code in another CPU thread to allow many concurrent calls down to the GPU.
I tried a few approaches in this notebook and was unable to get a speedup: https://gist.github.com/mrocklin/d3b70cea6a555ae2387556e4f0808ac1
I suspect that I am doing something incorrectly here. Perhaps I have not configured my environment correctly? I apologize for my ignorance here. Any help would be very welcome.