Skip to content

Fix multithread bug with CUDA driver API#916

Merged
kmaehashi merged 2 commits intocupy:masterfrom
okuta:fix-muthread-bug
Feb 20, 2018
Merged

Fix multithread bug with CUDA driver API#916
kmaehashi merged 2 commits intocupy:masterfrom
okuta:fix-muthread-bug

Conversation

@okuta
Copy link
Copy Markdown
Member

@okuta okuta commented Jan 28, 2018

FIx #72

@okuta okuta added cat:bug Bugs to-be-backported Pull-requests to be backported to stable branch labels Jan 28, 2018
@okuta
Copy link
Copy Markdown
Member Author

okuta commented Jan 28, 2018

Python2 has a thread local storage bug.
Can I skip this test case?

23:57:27 ___________________ TestAllocator.test_reuse_between_thread ____________________
23:57:27 
23:57:27 self = <cupy_tests.cuda_tests.test_memory.TestAllocator testMethod=test_reuse_between_thread>
23:57:27 
23:57:27     def test_reuse_between_thread(self):
23:57:27         def job(self):
23:57:27             cupy.arange(16)
23:57:27             self._error = False
23:57:27     
23:57:27         # Run in main thread.
23:57:27         self._error = True
23:57:27         job(self)
23:57:27         self.assertFalse(self._error)
23:57:27     
23:57:27         # Run in sub thread.
23:57:27         self._error = True
23:57:27         with cupy.cuda.Device(0):
23:57:27             t = threading.Thread(target=job, args=(self,))
23:57:27             t.daemon = True
23:57:27             t.start()
23:57:27             t.join()
23:57:27 >       self.assertFalse(self._error)
23:57:27 E       AssertionError: True is not false
23:57:27 
23:57:27 job        = <function job at 0x7f0bdce38b90>
23:57:27 self       = <cupy_tests.cuda_tests.test_memory.TestAllocator testMethod=test_reuse_between_thread>
23:57:27 t          = <Thread(Thread-2, stopped daemon 139690277222144)>
23:57:27 
23:57:27 tests/cupy_tests/cuda_tests/test_memory.py:504: AssertionError
23:57:27 ----------------------------- Captured stderr call -----------------------------
23:57:27 Exception in thread Thread-2:
23:57:27 Traceback (most recent call last):
23:57:27   File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
23:57:27     self.run()
23:57:27   File "/usr/lib/python2.7/threading.py", line 763, in run
23:57:27     self.__target(*self.__args, **self.__kwargs)
23:57:27   File "/work/cupy/tests/cupy_tests/cuda_tests/test_memory.py", line 489, in job
23:57:27     cupy.arange(16)
23:57:27   File "cupy/creation/ranges.py", line 57, in arange
23:57:27     _arange_ufunc(typ(start), typ(step), ret, dtype=dtype)
23:57:27   File "cupy/core/elementwise.pxi", line 827, in cupy.core.core.ufunc.__call__
23:57:27     kern.linear_launch(indexer.size, inout_args)
23:57:27   File "cupy/cuda/function.pyx", line 160, in cupy.cuda.function.Function.linear_launch
23:57:27     _launch(self.ptr,
23:57:27   File "cupy/cuda/function.pyx", line 129, in cupy.cuda.function._launch
23:57:27     driver.launchKernel(
23:57:27   File "cupy/cuda/driver.pyx", line 184, in cupy.cuda.driver.launchKernel
23:57:27     check_status(status)
23:57:27   File "cupy/cuda/driver.pyx", line 72, in cupy.cuda.driver.check_status
23:57:27     raise CUDADriverError(status)
23:57:27 CUDADriverError: CUDA_ERROR_INVALID_CONTEXT: invalid device context

@kmaehashi
Copy link
Copy Markdown
Member

Thanks for the fix, LGTM for the code! I wasn't aware that calling runtime API once enables contexts for all devices.

Python2 has a thread local storage bug.

Does that mean we no longer support multi-thread app on Python 2.7 to use CuPy?

@okuta
Copy link
Copy Markdown
Member Author

okuta commented Feb 2, 2018

That is rare case. CuPy works correctly in almost case.

@kmaehashi
Copy link
Copy Markdown
Member

Hmm, I think this is not a rare case, especially when user is running an inference task, which tends to reuse the memory block of the same size. Actually that's why I found issue #72 in production.

However, I think this PR is acceptable if the workaround (e.g., call cupy.cuda.runtime.free(0) in Python 2.7 after launching a thread) is documented somewhere. If it is OK, let's ignore the test on Python 2.7. (I can make a PR for documents)

@kmaehashi
Copy link
Copy Markdown
Member

That is rare case. CuPy works correctly in almost case.

OK, I understand that this issue only occur in very rare case. I tested 100 times on Python 2.7 and could not reproduce this.

@kmaehashi
Copy link
Copy Markdown
Member

I think it's OK to skip the test on Python 2.7. Could you please add decorator?
https://github.com/cupy/cupy/blob/v4.0.0b3/tests/cupy_tests/math_tests/test_matmul.py#L31-L32

Also, please resolve the conflicts.

@okuta
Copy link
Copy Markdown
Member Author

okuta commented Feb 14, 2018

I fixed.

@kmaehashi
Copy link
Copy Markdown
Member

Jenkins, test this please.

@kmaehashi kmaehashi merged commit 11f8694 into cupy:master Feb 20, 2018
@kmaehashi
Copy link
Copy Markdown
Member

LGTM!

@kmaehashi kmaehashi added this to the v4.0.0rc1 milestone Feb 20, 2018
kmaehashi added a commit to kmaehashi/cupy that referenced this pull request Feb 20, 2018
Fix multithread bug with CUDA driver API
@okuta okuta deleted the fix-muthread-bug branch June 3, 2018 16:09
@kmaehashi kmaehashi mentioned this pull request Jun 22, 2018
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:bug Bugs to-be-backported Pull-requests to be backported to stable branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants