How to get concurrency from cupy Streams

I'm looking at using Streams for concurrency, but am confused by how they operate.  This issue describes some simple experiments and asks for help.  If there is a better place to ask this sort of question please let me know.  I would be happy to move it.

Looking at the examples and documentation I believe that code placed within a stream context manager executes concurrently to code on other streams.  I tried to test this recently and was unable to get speedup.

### map_reduce.py example

First I tried running and modifying the map_reduce.py example in `cupy/examples/streams`.  I ran it normally, and also using a single stream with the following diff.

```diff
diff --git a/examples/stream/map_reduce.py b/examples/stream/map_reduce.py
index 365b67c..de59c7c 100644
--- a/examples/stream/map_reduce.py
+++ b/examples/stream/map_reduce.py
@@ -11,8 +11,9 @@ zs = []
 map_streams = []
 stop_events = []
 reduce_stream = cupy.cuda.stream.Stream()
-for i in range(n):
-    map_streams.append(cupy.cuda.stream.Stream())
+# for i in range(n):
+#     map_streams.append(cupy.cuda.stream.Stream())
+map_streams = [cupy.cuda.stream.Stream()] * n  # use one stream

 start_time = time.time()
```

I found that there was no difference in performance.  

### With threads

I was also surprised that calling code within a context manager allowed cupy to run it concurrently.  I thought that maybe it was necessary to call this code in another CPU thread to allow many concurrent calls down to the GPU.  

I tried a few approaches in this notebook and was unable to get a speedup: https://gist.github.com/mrocklin/d3b70cea6a555ae2387556e4f0808ac1

I suspect that I am doing something incorrectly here.  Perhaps I have not configured my environment correctly?  I apologize for my ignorance here.  Any help would be very welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to get concurrency from cupy Streams #1695

map_reduce.py example

With threads

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

How to get concurrency from cupy Streams #1695

Description

map_reduce.py example

With threads

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions