🐛 Bug
Not sure if this is actually a bug, but discovered this when debugging #37790. It seems that callbacks added to the autograd engine with queue_callback (from C++) or Variable._execution_engine.queue_callback (Python) don't respect the current cuda stream and instead execute on the default stream. As a result any cuda kernels executed in the callback would not be placed on the non-default stream.
I don't have too much context, but it looks like support for non-default streams was added in #8354, although it doesn't appear that this support was added for the callbacks.
To Reproduce
Patch the test in test_cuda in this PR: #37858. This test essentially enqueues a callback into the autograd engine that checks if the current stream is equal to the default stream. In backward() in that test, the current stream is NOT the default stream, as expected, however, in the callback, this does not hold true.
Expected behavior
The non-default stream should be respected. Not sure if the autograd callbacks should support this natively or if it the user's responsibility to ensure the callbacks are executed on the non-default cuda stream.
Environment
Latest master
cc @ezyang @ssnl @albanD @zou3519 @gqchen @ngimel
🐛 Bug
Not sure if this is actually a bug, but discovered this when debugging #37790. It seems that callbacks added to the autograd engine with
queue_callback(from C++) orVariable._execution_engine.queue_callback(Python) don't respect the current cuda stream and instead execute on the default stream. As a result any cuda kernels executed in the callback would not be placed on the non-default stream.I don't have too much context, but it looks like support for non-default streams was added in #8354, although it doesn't appear that this support was added for the callbacks.
To Reproduce
Patch the test in
test_cudain this PR: #37858. This test essentially enqueues a callback into the autograd engine that checks if the current stream is equal to the default stream. Inbackward()in that test, the current stream is NOT the default stream, as expected, however, in the callback, this does not hold true.Expected behavior
The non-default stream should be respected. Not sure if the autograd callbacks should support this natively or if it the user's responsibility to ensure the callbacks are executed on the non-default cuda stream.
Environment
Latest master
cc @ezyang @ssnl @albanD @zou3519 @gqchen @ngimel