Skip to content

Install Dask + Distributed from main#546

Merged
rapids-bot[bot] merged 2 commits intorapidsai:branch-0.19from
jakirkham:fix_pip_install
Mar 10, 2021
Merged

Install Dask + Distributed from main#546
rapids-bot[bot] merged 2 commits intorapidsai:branch-0.19from
jakirkham:fix_pip_install

Conversation

@jakirkham
Copy link
Member

These recently dropped the master branch and switched to main. So update the install steps to use main instead.

These recently dropped the `master` branch and switched to `main`. So
update the install steps to use `main` instead.
@jakirkham jakirkham requested a review from a team as a code owner March 8, 2021 20:36
@github-actions github-actions bot added the gpuCI gpuCI issue label Mar 8, 2021
@jakirkham jakirkham added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change bug Something isn't working labels Mar 8, 2021
@jakirkham
Copy link
Member Author

@gpucibot merge

@jakirkham jakirkham added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 8, 2021
@jakirkham
Copy link
Member Author

Seems like we are getting some explicit comms test failures. @rjzamora would you be able to take a look? 🙂

@pentschev
Copy link
Member

The issues seem to be GPU-related in CI:

22:30:30 [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
22:30:31 Coverage.py warning: --include is ignored because --source is set (include-ignored)
22:30:31 Coverage.py warning: --include is ignored because --source is set (include-ignored)
22:30:31 Unable to start CUDA Context
22:30:31 Traceback (most recent call last):
22:30:31   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 237, in initialize
22:30:31     self.cuInit(0)
22:30:31   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 300, in safe_cuda_api_call
22:30:31     self._check_error(fname, retcode)
22:30:31   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 335, in _check_error
22:30:31     raise CudaAPIError(retcode, msg)
22:30:31 numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

Rerunning to see if this unblocks.

@pentschev
Copy link
Member

rerun tests

@jakirkham
Copy link
Member Author

jakirkham commented Mar 9, 2021

Seeing this in the log

10:58:40   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/ucp/core.py", line 628, in recv
10:58:40     ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
10:58:40 ucp.exceptions.UCXMsgTruncated: <[Recv #112] ep: 0x7f0e25cde0d8, tag: 0xfa85496d273cdec2, nbytes: 260, type: <class 'numpy.ndarray'>>: length mismatch: 16 (got) != 260 (expected)

Also seeing this

11:05:40   File "/var/lib/jenkins/workspace/rapidsai/gpuci/dask-cuda/prb/dask-cuda-gpu-test/CUDA/10.1/GPU_LABEL/gpu-t4||gpu/OS/ubuntu16.04/PYTHON/3.7/dask_cuda/explicit_comms/dataframe/shuffle.py", line 196, in local_shuffle
11:05:40     out_parts[i] = None
11:05:40 TypeError: 'tuple' object does not support item assignment
11:05:40 FAILED

@jakirkham
Copy link
Member Author

Wondering if that last part is related to PR ( dask/distributed#4531 )

cc @madsbk (in case you have any thoughts here 🙂)

@pentschev
Copy link
Member

Ah sorry @jakirkham , I missed those errors, but I'm seeing them now too on the latest CI run as well. Seems like a potential issue coming from dask/distributed#4531 indeed, so it would be good to have @madsbk looking into it.

To unblock CI, what do you think about xfailing those tests @jakirkham ?

@jakirkham jakirkham requested a review from a team as a code owner March 9, 2021 20:02
@github-actions github-actions bot added the python python code needed label Mar 9, 2021
@jakirkham
Copy link
Member Author

Sure marked as xfail. Raised issue ( #549 ) to track and included this in the xfail message

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jakirkham !

@pentschev
Copy link
Member

@jakirkham seems like

def test_dask_use_explicit_comms():
should also be xfailed. Could you do that as well?

@jakirkham
Copy link
Member Author

Good catch! Thanks Peter 😄

Sorry had missed that earlier. Should be addressed now 🙂

@rapids-bot rapids-bot bot merged commit 46c24e6 into rapidsai:branch-0.19 Mar 10, 2021
@jakirkham jakirkham deleted the fix_pip_install branch March 10, 2021 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working gpuCI gpuCI issue non-breaking Non-breaking change python python code needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants