Skip to content

Contexts not being traced/hooked #140

@cesar-avalos3

Description

@cesar-avalos3

Hello, we are having issues making NVBit recognize more than one context. This is related to the multi-stream issue I opened a while ago #137.
I'm running this example https://github.com/cesar-avalos3/simple_multi_gpu/tree/single_device_multiple_contexts and using mem_trace from the latest 1.7.3 release.
This simple program creates two contexts, each launching a kernel in succession. We only see the first kernel being launched, if we print the cbid event in nvbit_at_cuda_event, we only see one kernel launch, moreover, the program never stops executing in the end (just hangs). If we try the 1.5.5 release of mem_trace, both kernels are traced, and the program successfully returns/exit.
(The following applies only to 1.7.3 mem_trace): Because nvbit_tool_init is only executed once (only for the first context), the second context state never gets initialized. If we add logic to initialize the additional contexts inside nvbit_at_ctx_init (against the very explicit warnings to not allocate anything there), the program no longer hangs at the end, we start seeing the recv func running for the second context, but we still don't see any kernel launches associated with the second context.
Is there a way to trigger another nvbit_tool_init (though the name tool_init suggests it only should run once for the tool) or whatever is calling it to register the second (and additional) context, maybe I'm missing another initialization step?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions