Skip to content

autograd engine callbacks don't respect non-default cuda streams #37860

@rohan-varma

Description

@rohan-varma

🐛 Bug

Not sure if this is actually a bug, but discovered this when debugging #37790. It seems that callbacks added to the autograd engine with queue_callback (from C++) or Variable._execution_engine.queue_callback (Python) don't respect the current cuda stream and instead execute on the default stream. As a result any cuda kernels executed in the callback would not be placed on the non-default stream.

I don't have too much context, but it looks like support for non-default streams was added in #8354, although it doesn't appear that this support was added for the callbacks.

To Reproduce

Patch the test in test_cuda in this PR: #37858. This test essentially enqueues a callback into the autograd engine that checks if the current stream is equal to the default stream. In backward() in that test, the current stream is NOT the default stream, as expected, however, in the callback, this does not hold true.

Expected behavior

The non-default stream should be respected. Not sure if the autograd callbacks should support this natively or if it the user's responsibility to ensure the callbacks are executed on the non-default cuda stream.

Environment

Latest master

cc @ezyang @ssnl @albanD @zou3519 @gqchen @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: autogradRelated to torch.autograd, and the autograd engine in generalmodule: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions