Add a hard-coded DGX configuration#46
Conversation
|
Currently benchmarking with from dask_cuda import DGX
from dask.distributed import Client, wait
cluster = DGX()
client = Client(cluster)
import cupy, dask.array as da, dask
rs = da.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.normal(10, 1, size=(50000, 50000)).persist()
y = (x + x.T).sum().compute()Scriptfrom dask_cuda import DGX
from dask.distributed import Client, wait
import cupy, dask.array as da, dask
with DGX() as cluster:
with Client(cluster) as client
rs = da.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.normal(10, 1, size=(50000, 50000)).persist()
wait(x)
y = (x + x.T).sum().compute() |
|
@Akshay-Venkatesh @rjzamora if you all have a chance, here is a system that, I think, should set up a DGX properly. I'm still not able to use the |
dask_cuda/dgx.py
Outdated
| scheduler = { | ||
| "cls": Scheduler, | ||
| "options": { | ||
| "interface": "ib0", |
There was a problem hiding this comment.
It might make sense to use a different interface for the scheduler in the case that none of your workers are running on gpu-0 or gpu-1
|
@mrocklin - I am not able to run the benchmark you shared. My env: Anything I am getting wrong here? |
|
That environment looks fine to me. Let me push a couple changes to make it easy to get a dashboard up, and then lets jump on a screenshare to see what's happening. |
|
In offline conversation @Akshay-Venkatesh mentioned
Then I said In [1]: from dask_cuda.local_cuda_cluster import cuda_visible_devices
In [2]: for i in range(8):
...: print(i, cuda_visible_devices(i, range(8)))
...:
0 0,1,2,3,4,5,6,7
1 1,2,3,4,5,6,7,0
2 2,3,4,5,6,7,0,1
3 3,4,5,6,7,0,1,2
4 4,5,6,7,0,1,2,3
5 5,6,7,0,1,2,3,4
6 6,7,0,1,2,3,4,5
7 7,0,1,2,3,4,5,6Then @Akshay-Venkatesh
Then me again
And here we are |
Ah, so if you mean, can we ask that CuPy use a particular GPU, then yes, cupy has API for that. cuDF may have API for that (actually I don't think that it does) but it would be different. Same with TensorFlow and PyTorch. Unfortunately, the only consistent way to have user code prefer a particular GPU today that is cross-library is to use |
|
I can restrict the visible devices to just one GPU though, maybe that will help? |
@mrocklin I'm pasting what I added here: akvenkatesh@prm-dgx-33:~/ucx-py$ git diff benchmarks/recv-into-client.py
diff --git a/benchmarks/recv-into-client.py b/benchmarks/recv-into-client.py
index bba9eb8..4ef1f15 100644
--- a/benchmarks/recv-into-client.py
+++ b/benchmarks/recv-into-client.py
@@ -134,9 +134,15 @@ async def main(args=None):
import cupy as xp
if args.server:
+ if args.object_type == 'cupy':
+ xp.cuda.runtime.setDevice(0)
+ print(xp.cuda.runtime.getDevice())
await connect(args.server, args.port, args.n_bytes, args.n_iter,
args.recv, xp, args.verbose, args.inc)
else:
+ if args.object_type == 'cupy':
+ xp.cuda.runtime.setDevice(1)
+ print(xp.cuda.runtime.getDevice())
await serve(args.port, args.n_bytes, args.n_iter,
args.recv, xp, args.verbose, args.inc) |
That may work for this test but I assume you'd like to be able to use all GPUs for most workloads. Am I misinterpreting this? |
|
I tried the following spec, which allows only one visible device per process, and didn't see any improvement. spec = {
i: {
"cls": Nanny,
"options": {
"env": {
"CUDA_VISIBLE_DEVICES": str(i), # <<<----- this is the major change
# 'UCX_NET_DEVICES': 'mlx5_%d:1' % (i // 2)
},
"interface": "ib%d" % (i // 2),
"protocol": "ucx",
"ncores": 1,
},
}
for i in range(8)
} |
|
Right, so the challenge with the |
Yes, but I'm comfortable with each dask worker having access to only one of them. I have one dask worker per GPU. Regardless, it didn't seem to solve my immediate problem. |
|
are you seeing these errors or something else? If yes, do you mind trying |
|
Setting |
@Akshay-Venkatesh would it make sense for UCX to respect the preference implied by |
|
I think it's possible. Needs some changes in UCX but I see good reason to
do it now. I'll push a PR to address this.
…On Thu, May 9, 2019, 7:51 PM Matthew Rocklin ***@***.***> wrote:
Right, so the challenge with the xp.cuda.runtime.setDevice approach is
that we probably can't do that for all possible libraries that the user
might want to use. Unfortunately there isn't any standard API for CUDA
itself in Python.
@Akshay-Venkatesh <https://github.com/Akshay-Venkatesh> would it make
sense for UCX to respect the preference implied by CUDA_VISIBLE_DEVICES
where the first value is used by default? This seems to be the strongest
convention on this topic that I've seen so far (though I'm fairly new).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#46 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAYNWAU4ALKNCJLVHPBHA2TPUS2GNANCNFSM4HL4EPQA>
.
|
|
Thanks @Akshay-Venkatesh . It sounds like many of the issues are consistent with that problem. Hopefully that helps to resolve things. If there is a development branch that I can build from (that also includes your other changes) please let me know. I'd be keen to try it out. |
|
I haven't completed the PR yet but I'll definitely post here when the PR is complete. |
1baac6e to
a83d89c
Compare
|
This repository will start depending on the dask/distributed master branch. Is it ok to place this in the ci/cpu/build.sh and ci/gpu/build.sh files? cc @rlratzel , who I think has been thinking about cross-project testing. |
Or maybe @raydouglass knows the answer to this? |
Is the dependency needed just for running tests, or does the library itself require it? If it's only needed for tests, then I believe it's a fairly common pattern to add a If the library itself needs it, then we'd obviously want to ensure it's installed with the library's conda package (meta.yml/requirements.txt). I'm not familiar with dask/distributed, but is that an option if the library depends on it? |
|
The library itself now depends on it. We'll have to release |
|
@rlratzel given the information above, what is the best approach here? |
So, yes, it will depend on the most recent version, which is not currently released. It seems odd to put a dev version in a requirements.txt/meta.yml file. I can though if you prefer. If I don't get a follow up here soonish I'm going to just add a pip install line in the CI script. |
I think this would be necessary though right? Otherwise the conda packages won't install the right dependencies.
When is |
How do we handle other packages on which we depend on master, like cudf?
A few weeks probably (there is a lot of churn in the upcoming release), but before RAPIDS 0.8. Currently this repository won't work with master branch of distributed. This PR will fix those issues. |
|
Any further suggestions @raydouglass ? I think it's ok to merge something so that master works with master, but I'm open to other suggestions. |
The only suggestion I have is to wait which isn't ideal. So maybe make the pip install changes. Please open an issue to remove those changes after distributed is released. That way we'll have a record and won't forget to do it before v0.8 release. |
|
rerun tests |
This didn't suffice. The conda build tests fail. How would you all like this handled? |
|
The pip install has to be in |
|
Ah crap. It looks like this was targetting branch 0.7 when I merged. My apologies. What do I need to do to correct this on the ops end? I can easily move the commit over to 0.8, but I'm concerned that I might have messed with one of your internal systems. |
|
If it was up to me I would probably just remove the commit and force-push, or add a revert commit. |
|
@raydouglass @mike-wendt need you to jump in here to resolve this. |
This currently depends on github.com/mrocklin/distributed@dev