Make some basic THC operations thread-safe#519
Conversation
Switching the device, setting the stream, and switching BLAS handles is now thread-safe. Some other operations, like reserveStreams, are still not thread-safe.
apaszke
left a comment
There was a problem hiding this comment.
In general looks good to me 👍
|
|
||
| THCState* state = (THCState*)malloc(sizeof(THCState)); | ||
| memset(state, 0, sizeof(THCState)); | ||
| THCState* state = THCState_alloc(); |
There was a problem hiding this comment.
THCState_new would be more consistent with existing APIs.
There was a problem hiding this comment.
It's alloc because there's a separate initialization function. The _new functions initialize their data structure.
|
|
||
| /* Allocator using cudaMallocHost. */ | ||
| THAllocator* cudaHostAllocator; | ||
| THCDeviceAllocator cudaDeviceAllocator; |
There was a problem hiding this comment.
Why is the device allocator a member, while the host allocator is referenced via a pointer? Wouldn't it be more consistent to have both of them point to static structs with function pointers (like in TH)?
There was a problem hiding this comment.
THCDeviceAllocator can't be a pointer to a static struct because it has a state variable. Each THCState has a different THCDeviceAllocator.state.
| state->rngState = (THCRNGState*)malloc(sizeof(THCRNGState)); | ||
| THCRandom_init(state, numDevices, device); | ||
|
|
||
| state->cudaHostAllocator = (THAllocator*)malloc(sizeof(THAllocator)); |
There was a problem hiding this comment.
This can be a static struct.
There was a problem hiding this comment.
Then you would be calling THCAllocator_init() on a static (global) variable. All of THC is per-THCState
There was a problem hiding this comment.
(you could make it a static struct that lives in THCAllocator.c, but this is all unrelated changes to THC. The short answer is that this is how it was before this PR)
| #endif | ||
| } | ||
|
|
||
| void* THCThreadLocal_get(THCThreadLocal local) |
There was a problem hiding this comment.
This is an internal class, and we're using C++, so can we just use the templates to determine the held type. I understand that we might save on one or two malloc/free calls, but I'm not sure if it's worth complicating the code. Let's just store an int* pointing to a single value and return it a properly typed pointer/reference from get. If we really want to avoid the malloc, we can just specialize this class for small types (and hide all intptr_t casts inside this class).
There was a problem hiding this comment.
We're not using C++ here or in THCGeneral.c. Adding templates doesn't seem like it'll reduce code complication, especially since you can't really specialize the POSIX or Windows thread-local API. They both take void*, not arbitrary types.
| } THCState; | ||
| typedef struct THCState THCState; | ||
|
|
||
| THC_API THCState* THCState_alloc(); |
| #ifdef _WIN32 | ||
| typedef DWORD THCThreadLocal; | ||
| #else | ||
| #include <pthread.h> |
There was a problem hiding this comment.
It might be better to add something like this to CMakeLists.txt if using pthread.
Switching the device, setting the stream, and switching BLAS handles is
now thread-safe. Some other operations, like reserveStreams, are still
not thread-safe.